Feb 22, 2026
/
Inference

Type I and Type II Errors in Hypothesis Testing

Every test has two failure modes, and tightening one always loosens the other.

Type I and Type II Errors in Hypothesis Testing

A clinical trial tests whether a new drug lowers blood pressure. After the trial, one of four things is true: the drug works and the trial correctly concludes it works; the drug works but the trial incorrectly concludes it doesn't; the drug doesn't work and the trial correctly concludes it doesn't; or the drug doesn't work but the trial incorrectly concludes it does. The last two cases are errors, and they have formal names with specific mathematical relationships to the test design.

Type I Error: Rejecting a True Null

A Type I error occurs when the null hypothesis is actually true but the test rejects it. In the drug example, this means concluding the drug works when it doesn't. The probability of a Type I error is called alpha, and it is exactly the significance threshold you set before running the test.

If you use alpha = 0.05, then in a world where the drug truly has no effect and you ran this trial 100 times, you would incorrectly conclude the drug works in approximately 5 of those trials just by chance. This is not a flaw, it is a known cost of doing statistical testing. The p-value is the probability of observing data at least as extreme as yours given the null is true, and rejecting at p < 0.05 guarantees a 5% Type I error rate in the long run.

Type II Error: Failing to Reject a False Null

A Type II error occurs when the null hypothesis is actually false but the test fails to reject it. In the drug example, this means the drug genuinely lowers blood pressure but the trial concludes there is no effect. The probability of a Type II error is called beta.

The complement of beta is called statistical power: the probability of correctly detecting a real effect. Power = 1 - beta. A test with 80% power will detect a real effect 80% of the time and miss it 20% of the time.

Power depends on three things: alpha (higher alpha means higher power), the true effect size (larger effects are easier to detect), and sample size (more data means more power). The most common reason studies have low power is insufficient sample size.

The Tradeoff

You cannot simultaneously reduce both error types without collecting more data. Decreasing alpha (being more conservative about false positives) increases the threshold for rejection, which means you will miss more real effects, increasing beta. Increasing alpha to catch more real effects increases the false positive rate.

This tradeoff has practical consequences. In drug approval, regulators set alpha very low (0.05 or lower) because approving an ineffective drug is costly, even at the price of missing some effective ones. In preliminary screening of potential drug compounds, you might accept a higher alpha to avoid discarding candidates that actually work, knowing that later trials will filter false positives.

Calculating Required Sample Size

Given a desired power level, an alpha, and an expected effect size, you can calculate the sample size needed. For comparing two means, the formula involves the ratio of the expected effect to the standard deviation (called Cohen's d) and the z-scores corresponding to alpha and beta.

The practical implication is that researchers should run power calculations before collecting data, not after. A study that starts with n = 30 and finds p = 0.15 has not "shown the drug doesn't work." It has shown it does not have enough data to detect the effect if one exists. Negative results from underpowered studies are nearly uninterpretable.

Mark Leschinsky

Mark Leschinsky

PRESIDENT & FOUNDER

Every test has two failure modes, and tightening one always loosens the other.

Newsletter

Subscribe for cutting-edge AI updates

Lorem ipsum dolor sit amet consectetur at amet felis nulla molestie non viverra diam sed augue gravida ante risus pulvinar diam turpis ut bibendum ut velit felis at nisl lectus.

Thanks for subscribing to our newsletter!
Oops! Something went wrong while submitting the form.
Only one email per month — No spam!