Feb 22, 2026

Type I and Type II Errors in Hypothesis Testing

Every test has two failure modes, and tightening one always loosens the other.

A clinical trial tests whether a new drug lowers blood pressure. After the trial, one of four things is true: the drug works and the trial correctly concludes it works; the drug works but the trial incorrectly concludes it doesn't; the drug doesn't work and the trial correctly concludes it doesn't; or the drug doesn't work but the trial incorrectly concludes it does. The last two cases are errors, and they have formal names with specific mathematical relationships to the test design.

Type I Error: Rejecting a True Null

A Type I error occurs when the null hypothesis is actually true but the test rejects it. In the drug example, this means concluding the drug works when it doesn't. The probability of a Type I error is called alpha, and it is exactly the significance threshold you set before running the test.

If you use alpha = 0.05, then in a world where the drug truly has no effect and you ran this trial 100 times, you would incorrectly conclude the drug works in approximately 5 of those trials just by chance. This is not a flaw, it is a known cost of doing statistical testing. The p-value is the probability of observing data at least as extreme as yours given the null is true, and rejecting at p < 0.05 guarantees a 5% Type I error rate in the long run.

Type II Error: Failing to Reject a False Null

A Type II error occurs when the null hypothesis is actually false but the test fails to reject it. In the drug example, this means the drug genuinely lowers blood pressure but the trial concludes there is no effect. The probability of a Type II error is called beta.

The complement of beta is called statistical power: the probability of correctly detecting a real effect. Power = 1 - beta. A test with 80% power will detect a real effect 80% of the time and miss it 20% of the time.

Power depends on three things: alpha (higher alpha means higher power), the true effect size (larger effects are easier to detect), and sample size (more data means more power). The most common reason studies have low power is insufficient sample size.

The Tradeoff

You cannot simultaneously reduce both error types without collecting more data. Decreasing alpha (being more conservative about false positives) increases the threshold for rejection, which means you will miss more real effects, increasing beta. Increasing alpha to catch more real effects increases the false positive rate.

This tradeoff has practical consequences. In drug approval, regulators set alpha very low (0.05 or lower) because approving an ineffective drug is costly, even at the price of missing some effective ones. In preliminary screening of potential drug compounds, you might accept a higher alpha to avoid discarding candidates that actually work, knowing that later trials will filter false positives.

Calculating Required Sample Size

Given a desired power level, an alpha, and an expected effect size, you can calculate the sample size needed. For comparing two means, the formula involves the ratio of the expected effect to the standard deviation (called Cohen's d) and the z-scores corresponding to alpha and beta.

The practical implication is that researchers should run power calculations before collecting data, not after. A study that starts with n = 30 and finds p = 0.15 has not "shown the drug doesn't work." It has shown it does not have enough data to detect the effect if one exists. Negative results from underpowered studies are nearly uninterpretable.

Mark Leschinsky

PRESIDENT & FOUNDER

Every test has two failure modes, and tightening one always loosens the other.

Type I and Type II Errors in Hypothesis Testing

Type I Error: Rejecting a True Null

Type II Error: Failing to Reject a False Null

The Tradeoff

Calculating Required Sample Size

Mark Leschinsky

Subscribe for cutting-edge AI updates

Related articles

Outliers, Leverage, and Influence in Regression

Ordinary Least Squares Regression and What the Coefficients Mean

Type I and Type II Errors in Hypothesis Testing

Related articles

Outliers, Leverage, and Influence in Regression
When one data point changes everything, and how to know if it should.
Feb 22, 2026
/
Modeling
Outliers, Leverage, and Influence in Regression
When one data point changes everything, and how to know if it should.
Feb 22, 2026
/
Modeling

Ordinary Least Squares Regression and What the Coefficients Mean
The most widely used statistical model in the world, and where it goes wrong.
Mar 5, 2026
/
Modeling
Mar 5, 2026
/
Modeling

Type I and Type II Errors in Hypothesis Testing
Every test has two failure modes, and tightening one always loosens the other.
Feb 22, 2026
/
Inference
Feb 22, 2026
/
Inference