How The Permaweb Can End the Problem of Vanishing Research Data
In 2021, when Kamshad Mohsin, an assistant professor at the Maharishi University of Information and Technology found that one of his research works that was cited by an international journal had disappeared from the internet, the resources, including the time he expended on the research, were not the only consequences he was dissatisfied about, he lost citations which in the world of academic research, has implications on career progress and shows the importance of a research work.
“I found out that the journal had disappeared because I have a habit of checking the citations of my research works,”Mohsin said. “Finding out that you have lost your work unexpectedly could make you feel upset, angry or melancholy. You may be doubting your identity, grieving for everything that you have lost or feeling apprehensive.”
Beyond the devastation researchers feel when the online journals their works published go dark or when offline journals become permanently unavailable without backup, is the retrogressive impact on academia and humanity.
The development of human society and knowledge of how the world works can be attributed to knowledge gathered from years of meticulous research by scholars across different fields.
Scholarly articles may become nonexistent because the publisher of a journal pulled the plug on the journal for reasons including financial constraints or its editorial board members agree to end the publication and links to published works may get broken over a long period of time. When journals die without being digitally preserved, a void in knowledge is created that could take years to recover.
Luiz Brandão, associate professor at PUC-Rio Business School has never experienced this challenge, but he agrees that it could be a major problem in the future. Studies published since 2014 show that scholarly works were already plagued by the challenge and many unpreserved research works are inaccessible because of broken links as they become hoary with age on the internet.
In August 2020, a first of its kind study revealed that in 20 years, 176 open-access journals and papers published in them, have disappeared from the Internet and 900 journals that have stopped publishing papers although still online, might be vulnerable to going dark in the near future.
Digital Archives Struggle to Solve the Problem
The global academia is aware of the challenge and there are varying types of digital preservation solutions for scholarly works.
Lots of Copies Keep Stuff Safe (LOCKSS) was one of the first efforts by innovators to preserve publications and make them available even when publishers no longer exist. The service, which was launched by Stanford Libraries in 1999, replicates contents of the database of participating institutions on different nodes in the network and uses the LOCKSS system to validate the integrity of contents against each other. In addition to meeting other conditions, institutions are required to pay some fees to join the LOCKSS alliance. The preserved contents are available only to members of the LOCKSS network.
Although the LOCKSS prides itself as a decentralised digital archive, a large part of its operation is centralised and Its closed system shuts out institutions with small budgets.
Also operated by the University of Stanford is the Controlled Lots Of Copies Keeps Stuff Safe (CLOCKSS). It is accessible to libraries and publications all over the world and operates 12 nodes in academic institutions to ensure the long-term survival of Web-based scholarly content. The nodes serve as backup archives and contents are only released to the reading public when a trigger occurs.
One of the major downsides to the CLOCKSS is its centralised system of operation which puts a board, including twelve leading academic libraries and twelve leading academic publishers in charge of decision making such as whether to release contents when a trigger occurs.
The issue of cost could impair usage of CLOCKSS. Despite its expanded coverage unlike the LOCKSS, the CLOCKSS requires publishers and libraries to make annual payments of at least $244 and $490 depending on their revenue and budgets respectively. For many journals and libraries especially in poor countries, many of whom operate on low budgets, the yearly payment may be unsustainable.
Portico, a digital preservation project is a centralised hosting archive system administered by an advisory committee of librarians and publishers and is supported by publishers’ contributions and annual library payments. Like the CLOCKSS, Portico is a dark archive which means that contents are accessible only when specific events are triggered but unlike CLOCKSS, contents are accessible by only Portico’s subscribers in standard format, not in publishers original format. Portico’s hosting of data on its server could make it susceptible to technical failures which may result in permanent data loss or alterations. The cost of subscription to the service could deter small scale publishers from archiving their contents.
The Public Knowledge Network Preservation Network, (PKP PN) which is often talked up as the archive for journals that do not have financial capabilities to subscribe to other digital preservation services has downsides including being restricted to only Open Journal System-based journals.
The persistence of the problem of scholarly works vanishing from the internet when journals go offline questions the effectiveness of efforts to solve the challenge.
Enter The Permaweb
Hosting research journals or building scholarly publication archives on Arweave’s permanent web which is bolstered by Blockweave could solve global academia’s persistent problem. The permaweb is already solving related problems including link rot and disappearing online contents.
Unlike the traditional internet where articles tend to become unavailable after some years, the permaweb link gives each data a unique identity that is independent of the Arweave domain name. The unique identity could be used to access the content in the event the Arweave domain goes offline.
Every content stored in the permaweb is backed up by over a thousand nodes spread across the world which makes lose of data impossible. The Arweave system makes stored contents on the permaweb immutable as snapshots of the original during the time of archiving are taken and stored.
Content owners are saved from monthly and annual payments to store their contents through a pay once arrangement that guarantees storage for two centuries.
Research Data Vanishing, Major Setback
As scientists and researchers turn out data to provide an understanding of global problems and ultimately solve them, the academia needs a digital preservation solution that ends the existing risk of scholarly journals vanishing. Existing archive services’ struggle to solve the problem, occasions an opportunity to explore the permaweb solution.
“Effectively managing research data is a persistent issue for many researchers at various stages of their careers,” Mohsin said.
“As one might expect, the consequences of data loss to academia is huge. Many research are almost hard to replicate due to a lack of appropriate data.”