Effect sizes vs. p-values: the more important number that gets less attention

James Okonkwo

Contributing Editor, Tessera. PhD, Behavioral Economics, LSE

February 12, 20264 min read

A study reports that a new intervention is "statistically significant (p < .05) in reducing anxiety." The headline reads: "New intervention reduces anxiety." A reasonable assumption from the headline: the intervention substantially helps.

The headline can be true while the intervention is essentially useless. The reason is the difference between statistical significance and effect size.

1. What p-values measure

A p-value is the probability of observing data at least as extreme as what was observed, if there were no real effect. A p-value of .05 means: "if the intervention did nothing, there's a 5% chance we'd see data this striking by random fluctuation."

Note what this doesn't tell you:

It doesn't tell you how big the effect is
It doesn't tell you how practically meaningful the effect is
It doesn't tell you whether the effect replicates

P-values shrink with larger samples. A trivially-small effect can produce p < .001 with a sample of 50,000 people. The same effect would produce p > .5 with a sample of 50. The p-value depends on both effect size AND sample size; significance alone tells you neither in isolation.

2. What effect sizes measure

Effect sizes measure how big the effect is, scaled to be comparable across studies. Common metrics:

Cohen's d: standardized mean difference (1.0 = one standard deviation difference between groups)
Pearson's r: correlation strength (0 = none, 1 = perfect)
Odds ratio: how much more likely an outcome is in one group than another

Conventional benchmarks:

d = 0.2: small effect
d = 0.5: medium effect
d = 0.8: large effect
r = 0.1: small
r = 0.3: moderate
r = 0.5: strong

These benchmarks are guides, not laws. What counts as "meaningful" depends on context — a small effect on mortality matters more than a large effect on minor mood.

3. The clinical example

Imagine two anxiety interventions, both with p < .001 in trials.

Intervention A: lowers anxiety by Cohen's d = 0.05 (a tiny effect that's statistically real because the sample was huge). Intervention B: lowers anxiety by Cohen's d = 0.8 (a large clinically meaningful effect).

Both can honestly be reported as "statistically significant." Only one is worth recommending to patients. The p-value alone can't distinguish them.

This is why modern clinical guidelines increasingly emphasize effect size alongside significance. The number of effective evidence-based-medicine projects has been driven by recognizing how much past medical recommendation rested on statistically significant but clinically tiny effects.

4. The news-reading skill

A practical heuristic for reading research news in 2026:

Check whether the article reports an effect size. Good science writing now reports d or r or absolute risk reduction. Articles that report only p-values or "statistically significant" without a size measure are often hiding small effects.

Check the sample size. Very large samples produce significance from trivial effects. Very small samples can miss real effects.

Check whether the finding replicates. Single studies, even with large effects, are weaker evidence than meta-analyses with consistent effects across studies.

Check the confidence interval. A confidence interval of [0.01, 0.50] is much weaker evidence than [0.30, 0.40], even if both produce p < .05.

5. The honest summary

The shift from significance-only reporting to effect-size-also reporting has been one of the quieter improvements in scientific communication over the past decade. It hasn't fully reached popular science writing, where "statistically significant" is still treated as the headline.

For an everyday reader: distrust headlines that report findings without effect sizes. The more useful question isn't "is the effect real?" — it's "is the effect big enough to matter?" Most published research can't answer the second from the first.

References

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum.
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7-29.
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research. Advances in Methods and Practices in Psychological Science, 2(2), 156-168.