Where Will Ethereum’s Historical Data Go After “The Purge” of EIP-4444?
“Ethereum was never designed to be a permanent data storage system.” – EIP contributor Micah Zoltu
On the beacon chain‘s birthday, Ethereum founder Vitalik Buterin tweeted a detailed roadmap for the protocol. After the transition to proof of stake, the implementation of sharding and a move towards statelessness comes The Purge: eliminating historical data.
According to this roadmap item’s proposal document, EIP-4444, Ethereum clients will be obliged to discard data over one year in age. The Purge improves Ethereum in a number of ways:
- Reduces hardware requirements for nodes
- Allows clients to remove code that deals exclusively with legacy transactions
- Reduces bandwidth on the network – clients need to sync less data
The hardware requirement reduction improves decentralization of the network by making it easier for consumer-grade equipment to run a full node. As we’ve seen with chains like BSC and Solana, high transaction throughput over a long period of time either leads to centralization or necessitates creative solutions.
Similarly, faster clients and a lightweight sync process reduces strain on the network and its nodes, making the protocol better at its core job of processing transactions at the very tip of the chain.
Separating the jobs of writing new data and reading historical records into two systems is a smart choice from the perspective of technical debt and future scalability, but it comes with one big problem to solve:
If the data isn’t on chain, where is it? And how can we trust it?
Whether Ethereum data is one year old or one day old, there’s a huge ecosystem of apps that rely on it. Block explorers, on-chain analysis tools, DAO voting protocols – anything that isn’t only concerned with processing transactions in the here and now. Where will these apps source blockchain history? And if it’s not validated by a chain or network of nodes with something to lose, how can we be sure it hasn’t been tampered with?
In the EIP-4444 document, the authors suggest using either torrent magnet links or IPFS. While both options are decentralized, neither one provides guarantees the data will be around in the long term. Millions of torrents are unseeded and will stay that way, and IPFS merely guarantees that the content may be available at the given hash provided that the unincentivized person responsible for storing it held up their end of the bargain.
Other suggestions include Portal Network which appears to leverage Swarm and states “we don’t provide any guarantee regarding file availability on the network”, and Filecoin, which offers temporary decentralized storage on a subscription model.
With all of these options discarded, the only solution for storing something as important as the full Ethereum blockchain is Arweave. Arweave offers guaranteed permanent storage and currently has enough $AR in the storage endowment to fund miners for almost 1,000 years – assuming the cost of hard drive space never goes down, and no $AR is ever paid into the endowment again. With Arweave’s incentive model, it offers the closest thing to permanent storage the world has ever seen.
In fact, Arweave already stores the full blockchains of Solana, Avalanche, Cosmos, Moonriver, Celo, and NEAR. KYVE is a PoS network built on Arweave which incentivizes nodes to fetch and validate data streams of any kind, including the JSON-RPC format used by Ethereum.
Once stored on Arweave, the data can be queried like any other permaweb data – using GraphQL. Or, apps looking to use historical Ethereum data could also take advantage of The Graph‘s powerful API to handle queries.
When it comes to storing a body of knowledge as vast and culturally significant as the entire history of Ethereum, the foundation should explore options that make that data’s availability just as decentralized as the network itself.