Science

The open data revolution: why your data is now public, mostly

Over the past decade, most major journals and funding bodies have begun requiring that research data be made publicly available. The change has been quietly transformative.

James Okonkwo
Contributing Editor, Tessera. PhD, Behavioral Economics, LSE
4 min read

In 2015, fewer than 15% of papers in major psychology journals included publicly available data. By 2024, that figure was approximately 60% at the same journals. The change has been quiet — there was no single dramatic moment — but cumulatively transformative.

This is the open data revolution. It is one of the consequential institutional changes in modern science, and most non-specialists have no idea it's happened.

1. The forcing functions

Several pressures combined to produce the shift:

Major journals began requiring it. Nature, Science, PLoS, and most major psychology journals adopted data-sharing requirements between 2015 and 2020. Authors must deposit raw data in public repositories or justify in writing why they cannot.

Funding bodies followed. The NIH, NSF, Wellcome Trust, and ERC have all incorporated data-sharing requirements into grant terms. Grant recipients must produce data management plans before funds release.

The replication crisis pressure. When the 2015 Open Science Collaboration showed that 60% of psychology findings didn't replicate, the field's defensive crouch became less tenable. Withholding data started to look like covering up.

Tools improved. Repositories like Open Science Framework, Figshare, and Zenodo made depositing data trivially easy. The friction that historically prevented sharing dropped substantially.

2. What changed

Several downstream effects became visible by 2024:

Faster error detection. Public data has allowed third parties to identify errors, fabrications, and methodological issues in papers that would previously have gone unexamined. Multiple high-profile retractions have come from outside researchers spotting problems in newly-shared data.

Reanalysis literature. A small industry of researchers reanalyzes published data, often producing different conclusions than the original authors. This is healthy but also uncomfortable — the same data can produce different stories depending on analytical choices.

More cumulative meta-analysis. Researchers can now pool data from multiple studies in ways that were impractical when each study's data was siloed. The quality of meta-analytic evidence has improved.

Detection of fraud. Outright fabrication is rarer than methodological flaws, but the open-data norm has produced several high-profile fraud detections — notably Brian Wansink (Cornell food behavior research) and Francesca Gino (Harvard Business School honesty research). Both fraud cases were unraveled by reanalysis of disclosed data.

3. What hasn't changed

Open data hasn't eliminated bad science. It's reduced one specific kind — opaque single-team analysis that can't be checked.

Several persistent problems remain:

  • Privacy-protected data (medical records, identifiable survey data) typically can't be fully shared
  • Industry-funded research often retains proprietary clauses
  • "Available on request" remains a fig-leaf for non-sharing in some fields
  • Researchers can selectively share data that supports their interpretation

The reforms have been substantial but partial.

4. The implication for research consumers

For an everyday reader of science news, the open data revolution provides a useful new heuristic. Papers from major journals published after 2020 are increasingly accompanied by publicly available data. Findings that have been independently reanalyzed by outside researchers are more reliable than findings that haven't.

The reanalysis literature is rarely covered in popular science writing. It is where the most rigorous critiques of headline findings often live. Following critics on platforms like X / Twitter or following the Retraction Watch blog provides a window into which findings are being tested and which are falling.

5. The honest summary

The open data revolution is real, substantial, and quieter than its impact deserves. It has not solved scientific reliability but has substantially shifted the cost-benefit calculation for researchers tempted to cut corners.

For research consumers: trust modern open-data papers more than older closed-data papers, controlling for journal quality. The norm is recent enough that the pre-2015 literature still contains substantial unchecked work, much of it cited as authoritative.

This is one of the more positive recent developments in academic science. It hasn't fixed everything. It has fixed something.

References
  1. Hardwicke, T. E., Mathur, M. B., MacDonald, K., et al. (2018). Data availability, reusability, and analytic reproducibility. Royal Society Open Science, 5(8), 180448.
  2. Tedersoo, L., Küngas, R., Oras, E., et al. (2021). Data sharing practices and data availability upon request. Scientific Data, 8(1), 192.
  3. Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726-728.