To Address the Irreproducibility Crisis, Invest in Digital Archiving

<好色先生TV>An obscure scholarly practice may decide how free鈥攁nd accurate鈥攊nformation remains in the future.

[Editor鈥檚 note: This article belongs to a series on the intersection of science, technology, and higher-ed reform. Please click here to read J. Scott Turner on federal research oversight and here for Nathan Schachtman on the need for STEM education for lawyers.]

鈥淢illions of Scholarly Articles Are Not Being Archived,鈥 says the article , and the initial impulse is to roll one鈥檚 eyes and think cynical thoughts. Given the quality of scholarship nowadays, isn鈥檛 that all to the best? And even if we are being generous about the quality, there is such a deluge of scholarship, isn鈥檛 salutary pruning a necessity? If all 92 of Euripides鈥 plays had survived, wouldn鈥檛 we think the worse of him? Archival failure surely will make posterity think better of 21st-century scholarship.

But archival issues do matter, not least because they concern the 鈥攏o, let that wait a little. First let鈥檚 talk about exactly what鈥檚 at issue.

Archival issues matter, not least because they concern the irreproducibility crisis of modern science.Scholarship鈥攑ublication in general鈥攊s most of the way through a vast switch-over from paper format to digital. Most everything is published digitally nowadays; paper copies are ancillary. Millions and millions of works are now being published online. In the that started this discussion, the authors state casually that, 鈥渙f the 7,438,037 works examined, there were 5.9 million copies spread over the [digital] archives used in this work.鈥 A staggering amount of intellectual content is now not only available digitally but produced digitally and intended to be read digitally.

But of course that cannot be done unless scholarship is made easily accessible to potential readers. Librarians and archivists for generations have been elaborating best practices to make sure that the reader can find what they want in a flood of information. What they do includes creating and standardizing metadata鈥Title, Author, Date of Publication, Subject Keywords鈥攁nd then making sure that search engines and algorithms can search efficiently among what also has become a metadata deluge. Preserving, cataloging, and providing access to digital items represents a further level of headache-inducing complexity. How do you describe an electronic file? How do you refer to different editions? How do you guarantee that the file鈥檚 contents haven鈥檛 been changed?

Digital archiving is an essential part of this larger problem of preserving access to digital records. What happens when hardware or software becomes obsolescent, so that you can no longer read a digital file? What happens when a file is preserved on just one server, and a squirrel chews a crucial wire or a lightning bolt strikes an essential circuit? How do you make sure there are up-to-date copies of millions of digital files? How do you ensure that there are multiple archives with responsibility for everything that has been published? How do you ensure that every single archive is able to use a standardized format, in common with every other archive? How do you pay for the necessary computer equipment? How do you get archival staff trained to do the necessary work?

All this must be mind-numbingly boring to the layman. It is a bleak irony that the amount of scholarship devoted to digital archiving is sufficient to augment appreciably the amount of scholarship that needs digital archiving. Yet it is an issue that matters, and we should be grateful that there is a large body of professionals devoted to this issue.

It matters because鈥擨 mentioned before the irreproducibility crisis of modern science. That鈥檚 the combination of groupthink, publication bias, discarding of negative results, and culpably negligent use of statistical analysis that has led to modern science research comprising what . There鈥檚 a great deal that needs to be done to fix the irreproducibility crisis, but you can鈥檛 even begin if you haven鈥檛 properly archived every bit of research in a field鈥攐r, indeed, if you lack the capacity to provide proper archiving for research.

Digital archiving particularly matters for meta-analyses, which analyze entire bodies of individual research findings.Digital archiving particularly matters for , which analyze entire bodies of individual research findings and are an essential tool for gauging the evidentiary value of an individual piece of research. Meta-analyses work only if you can locate and include the entire body of relevant research. Without proper digital archiving, meta-analyses become Garbage In, Garbage Out (GIGO) analyses.

And indeed, one of the solutions to the irreproducibility crisis is to require , as well as generally to require that all research data be publicly accessible prior to and after publication. Not only academic articles but also all scientific research data should be stored in multiple repositories, in a standardized format accessible to every interested reader. The proper archiving of research data is not a fundamentally different challenge, but it does add significantly to the work digital archivists must undertake.

Then there are the national-security implications. The Department of Defense has a formal to organize and integrate its data descriptions. Effective digital archiving must be an essential component of that work. If the basic digital objects lack fixity and distributed preservation, how can the ontology function effectively? In this way, proper digital archiving contributes to national security.

Digital archiving also must face the challenge posed by high-tech censorship. Amazon can now delete a 鈥減roblematic鈥 word or argument from your Kindle book without alerting you. A book company can alter the latest edition of a book; a scientific journal can change or withdraw an article鈥攁ny digital object can be manipulated or removed by high-tech gatekeepers without alerting the public. Perhaps the greatest challenge in the field of digital archiving is to distribute fixed digital objects so broadly that they cannot be censored by private or governmental actors and to provide every individual a quick and secure means of assessing whether a given digital object remains unaltered鈥攁nd a means of locating the unaltered original. This is a task necessary for the preservation of liberty, and it is, fundamentally, a task of digital archiving.

It may not be a task for the institutional profession of digital archivists. They are institutional, necessarily servants of Belial and of Mammon, and are not professionally oriented toward individual liberty and individual archiving. Too many, indeed, are . This is a task for a Peter Thiel or an Elon Musk鈥攖o establish a means of digital archiving that will serve the individual and his liberty and that will provide the means to keep scientists honest.

Digital archiving matters too much to be left to the digital archivists. But having said that, let us also praise them for the good, hard, and (oh, poor archivists!) terribly dull work they are doing.

David Randall is the research director of the National Association of Scholars.