Mar 5, 2026
/
Inference

What is a p-value, really?

You might see "p < 0.05" shows up everywhere in science. Here's what it actually means.

What is a p-value, really?

Here is a sentence you might read in a psychology paper: "The intervention significantly improved test scores (p = 0.03)." Most readers, including many researchers, take this to mean the result is solid, trustworthy, and probably real. But p = 0.03 is doing a lot of work in that sentence, and most people have no idea what it actually means.

A simple example

Suppose you want to know whether a coin is fair. You flip it 10 times and get 7 heads. Is the coin rigged? Maybe. Or maybe you just got lucky. The core problem in statistics is that random data is noisy; any result you observe could, in principle, have happened by chance even if nothing interesting is going on.

This is the tension statistics tries to resolve: how do we tell a real signal from random noise?

The null hypothesis: your starting point

Every hypothesis test begins by assuming the boring explanation is true. This is called the null hypothesis. For the coin, the null hypothesis is that the coin is fair (50/50).

You don't believe the null hypothesis is true. You're actually hoping to disprove it. But you start there because it gives you something concrete to calculate against.

What the p-value actually measures

The p-value answers this question: if the null hypothesis were true, how often would I see data this extreme just by random chance?

Back to the coin. You flipped 10 times and got 7 heads. If the coin were actually fair, how likely is it to get 7 or more heads just by luck? You can work this out with some probability; the answer is about 17%. That's your p-value: 0.17.

That's not very surprising. Getting 7 heads from a fair coin isn't all that rare. So there's not much reason to doubt the null hypothesis.

Now imagine you flipped the coin 100 times and got 70 heads. Same proportion, but now the p-value would be astronomically small: something like 0.000002. Getting 70 or more heads from a fair coin almost never happens by chance. That's suspicious. We'd seriously doubt the coin is fair.

The 0.05 threshold

By tradition, researchers use p < 0.05 as a cutoff. If there's less than a 5% chance of seeing data this extreme under the null, they "reject the null hypothesis" and call the result "statistically significant."

But 0.05 is not a law of nature. Ronald Fisher, the statistician who popularized it in the 1920s, basically picked it as a reasonable rule of thumb. A result with p = 0.049 and one with p = 0.051 are essentially the same; treating them differently because one crosses an arbitrary line is a bit silly.

Modern statistics has been moving away from binary significance cutoffs and toward reporting the exact p-value alongside other measures. But you'll still see the 0.05 threshold everywhere, so it's worth understanding.

Three things a p-value is not

A p-value is not the probability that the null hypothesis is true. p = 0.03 does not mean there's a 3% chance the drug doesn't work. The null hypothesis is either true or it isn't; it doesn't have a probability in the classical framework. The p-value is about your data, not about the hypothesis.

A p-value is not the probability that your result is a fluke. It's calculated assuming the null is true. That's a very different question from "given what I observed, how likely is this real?"

A p-value is not a measure of how important or large the effect is. With a big enough sample, you can get a tiny p-value for an effect so small it's practically meaningless. Statistical significance and practical significance are different things.

An analogy that might help

Imagine your friend claims she can predict coin flips. You test her: she gets 9 out of 10 right. The p-value for this (assuming she's just guessing randomly) is about 1%. That's surprising. You might start to believe she has some ability.

But here's the thing: the p-value doesn't tell you how good her ability is, or whether it's worth anything in practice. It just tells you that random guessing probably isn't the explanation. The rest is up to you to figure out.

The bottom line

A p-value is a measure of surprise. Small p-value: your data would be surprising if nothing were going on. Large p-value: your data is pretty ordinary, nothing to see here. You should pair it with effect sizes, replications, and common sense before drawing any conclusions.

Mark Leschinsky

Mark Leschinsky

PRESIDENT & FOUNDER

You might see "p < 0.05" shows up everywhere in science. Here's what it actually means.

Newsletter

Subscribe for cutting-edge AI updates

Lorem ipsum dolor sit amet consectetur at amet felis nulla molestie non viverra diam sed augue gravida ante risus pulvinar diam turpis ut bibendum ut velit felis at nisl lectus.

Thanks for subscribing to our newsletter!
Oops! Something went wrong while submitting the form.
Only one email per month — No spam!