Understanding Arweave's Consensus Mechanism Iteration Journey in One Article
Abstract: Everything For permanent storage!
Author: Arweave Oasis @ Contributor for PermaDAO
Translator: Tina XU @ Contributor for PermaDAO
Reviewer: Marshal Orange @ Contributor for PermaDAO
Since its launch in 2018, Arweave has consistently been regarded as one of the most valuable networks in the decentralized storage space. However, due to its technical dominance over five years, many people are both familiar and unfamiliar with Arweave. This article aims to provide an overview of Arweave's technological development history since its inception, enhancing everyone's understanding of Arweave.
Over the past 5 years, Arweave has undergone over a dozen major technical upgrades, with its core objective being the transition from a compute-driven mining mechanism to a storage-driven one.
Figure 1: The curve graph depicting the size variation and version iterations of the Arweave Network over the past 6 years.
Most of the content in this article is sourced from the Arweave whitepaper and technical analysis videos from the Arweave ecosystem by @DMacOnArweave.
Arweave 1.5:The launch of Mainnet
The Arweave mainnet was launched on November 18, 2018. At that time, the size of the weave network was only 177 MiB. Early Arweave shared some similarities with the present, such as a block time of 2 minutes and a maximum of 1000 transactions per block. However, there were notable differences, such as the transaction size limit being only 5.8 MiB. Additionally, it utilized a mining mechanism called Proof of Access.
So, what exactly is Proof of Access (PoA)?
In simple terms, PoA requires miners to demonstrate their ability to access historical blocks within the blockchain to generate new ones. It functions by randomly selecting a historical block from the chain and mandating miners to include this selected block as a recall block within the current block they are attempting to generate. This recall block serves as a complete backup of the chosen historical block.
The concept behind PoA was to eliminate the need for miners to store all blocks, instead requiring them only to prove access to participate in the mining competition. (Dmac uses the analogy of a car race in his video to make it easier to understand, so I'll quote it here.)
In this racing analogy, there's a finish line that adjusts based on the number of participants or mining speed, ensuring that the race always concludes in approximately two minutes. This is the reason behind the two-minute block time.
Next, this competition consists of two parts:
The first part, often referred to as the qualification round, requires miners to demonstrate their ability to access historical blocks. Once a miner has obtained the designated block, they can proceed to the finals. If miners haven't stored the blocks themselves, they can still access them from their peers and participate in the competition.
The second part, akin to the finals following the qualification round, involves pure proof-of-work mining, where miners utilize hashing computational power to compete. Essentially, this entails expending energy to compute hashes and ultimately winning the competition.
Once a miner crosses the finish line, the competition concludes and the next one begins. With mining rewards going to a single winner, the competition becomes exceptionally fierce. Consequently, Arweave experienced rapid growth during this period.
Figure 2: Mining Process under the Proof of Access (PoA) Mechanism. Car 3 falls behind Car 1 because it needs to retrieve a recall block from other nodes to complete the "qualification round," resulting in a slight delay compared to Car 1, which has already stored the recall block.
Arweave 1.7:RandomX
The early concept of Arweave operated on a simple and straightforward mechanism. However, it didn't take long for researchers to realize a potential undesirable outcome, which they termed as "degenerate strategies."
The issue arose because some miners, without storing the designated blocks for fast access, were compelled to access blocks from others. This put them at a disadvantage compared to miners who had stored blocks, resulting in a lag from the starting line. However, the solution was relatively straightforward. By stacking a large number of GPUs and harnessing significant computational power, miners could compensate for this drawback. Consequently, they could even surpass miners in storing blocks and maintaining fast access. If this strategy became predominant, miners would cease storing and sharing blocks altogether, instead focusing on optimizing computational equipment and consuming substantial energy to gain a competitive edge. The ultimate consequence would be a significant decline in the network's utility, leading to the gradual centralization of data. This deviation represents a clear departure from the intended purpose of a storage network.
To address this issue, Arweave version 1.7 appeared.
The most significant feature of this version is the introduction of a mechanism called RandomX. It is a hash formula that is very difficult to run on GPU or ASIC devices, prompting miners to abandon stacking GPU computational power and instead rely solely on general-purpose CPUs to participate in the hash power competition.
Arweave 1.8/1.9:10 MiB Transaction Size and SQL lite
For miners, besides proving their ability to access historical blocks, there is another important issue to address: processing transactions submitted by users to Arweave.
All new user transaction data must be packaged into new blocks, which is the minimum requirement for any public chain. In the Arweave network, when a user submits transaction data to a miner, the miner not only includes the data in the block they are about to submit but also shares it with other miners so that all miners can include this transaction data in their respective upcoming blocks. Why do they do this? There are at least two reasons:
They are economically incentivized to do so because each transaction data included in a block increases the reward for that block. Miners sharing transaction data ensures that no matter who wins the block creation rights, they all receive the maximum reward.
To prevent the network from entering a death spiral of development. If user transaction data is frequently not included in blocks, there will be fewer users, and the network will lose its value, resulting in reduced earnings for miners, which is undesirable for everyone.
So miners choose to maximize their own interests in this mutually beneficial way. However, this poses a challenge in data transmission, becoming a bottleneck for network scalability. The more transactions, the larger the blocks, and the 5.8 MiB transaction limit proved ineffective. As a result, Arweave gained some relief by hard forking and increasing the transaction size to 10 MiB.
Figure 3: Transaction Data Synchronization Mechanism Among Miner Nodes
However, despite this, the issue of transmission bottleneck remains unresolved. Arweave operates as a globally distributed network of miner nodes, all of which need to synchronize their states. Additionally, each miner node has varying connection speeds, resulting in an average connection speed across the network. To ensure the network generates a new block every two minutes, the connection speeds must be fast enough to upload all data intended for storage within this timeframe. If the data uploaded by users exceeds the network's average connection speed, congestion can occur, reducing the network's utility. This could become a stumbling block for Arweave's development. Therefore, subsequent updates, such as version 1.9, have utilized foundational infrastructures like SQL lite to enhance network performance.
Arweave 2.0:SPoA
In March 2020, the update to Arweave 2.0 introduced two significant enhancements to the network, thereby unlocking scalability constraints and pushing the boundaries of data storage capabilities on Arweave.
The first enhancement is Succinct Proof, which is built on the encrypted structure of Merkle trees. It allows miners to prove that they have stored all bytes of a block by providing a simple Merkle-tree-based compressed branch path. The change it brings is that miners only need to include a concise proof, which is less than 1 KiB in size, in a block, eliminating the need to include a recall block that could be up to 10 GiB in size.
The second update is called "Format 2 Transactions." This version optimizes the format of transactions to slim down the blocks transmitted between nodes. In contrast to "Format 1 Transactions," where transaction headers and data are added to blocks simultaneously, "Format 2 Transactions" allows the separation of transaction headers and data. In the data-sharing transmission between miner nodes, all transactions, except for the concise proof of recall blocks, only require the transaction headers to be added to blocks, with transaction data added to blocks after the competition ends. This significantly reduces the transmission requirements when synchronizing transactions within blocks among miner nodes.
The outcome of these updates is the creation of lighter and more easily transferable blocks than in the past, freeing up excess bandwidth within the network. Miners now utilize this surplus bandwidth to transmit data for "Format 2 Transactions," as this data will become recall blocks in the future. As a result, the scalability issue has been addressed.
Arweave 2.4:SPoRA
Have all the issues in the Arweave network been resolved so far? The answer is clearly no. Another problem has arisen due to the new SPoA mechanism.
Similar to the mining strategy of stacking GPU power by miners, another strategy has emerged. Although this time it's not about the centralization problem of GPU power stacking, it brings about a potentially more computation-centric mainstream strategy. It's the emergence of fast-access storage pools. All historical blocks are stored in these storage pools, and when a recall block is generated by the access proof, they can quickly produce the proof and synchronize among miners at an extremely fast speed.
While this may not seem like a significant issue at first glance, data can still receive an adequate amount of backup and storage in such a strategy. However, the problem is that this strategy subtly shifts miners' focus. Miners no longer have the incentive to gain high-speed access to data because transferring proofs has become very easy and fast. As a result, they will invest most of their efforts into the hash calculations of the Proof-of-Work, rather than data storage. Isn't this another form of depraved strategy?
Figure 4: Emergence of Storage Pools
After undergoing several feature upgrades, such as data indexing iteration, wallet list compression, and V1 transaction data migration, Arweave finally ushered in another major version iteration — SPoRA, succinct proofs of random access.
SPoRA truly ushered Arweave into a new era, shifting miners' focus from hash calculations to data storage through mechanism iteration.
So, what makes succinct proofs of random access different?
It has two prerequisites:
Indexed Dataset: Thanks to the iterative indexing feature introduced in version 2.1, each data chunk in the weave network is marked with a global offset, allowing for quick access to each chunk. This sets the core mechanism of SPoRA — continuous retrieval of data chunks. It's important to note that the term "data chunk" here refers to the smallest unit of data after large files are split, with a size of 256 KiB, not the concept of a block.
Slow Hash: This hash is used to randomly select candidate chunks. With the introduction of the RandomX algorithm in version 1.7, miners cannot use hash power stacking to gain an advantage and can only use CPUs for computation.
Based on these two prerequisites, the SPoRA mechanism consists of five steps:
Step 1: Generate a random number and use it with previous block information to generate a slow hash via RandomX.
Step 2: Use this slow hash to calculate a unique recall byte (the global offset of the data chunk).
Step 3: Miners use this recall byte to search for the corresponding data chunk in their storage space. If the miner does not have the data chunk stored, they return to Step 1 and restart the process.
Step 4: Use the slow hash generated in Step 1 to perform a fast hash with the newly found data chunk.
Step 5: If the calculated hash result is greater than the current mining difficulty value, complete the block mining and distribution. Otherwise, return to Step 1 and restart the process.
So, from here, it can be seen that this greatly incentivizes miners to store data as much as possible on hard drives that are connected to their CPUs via very fast buses, rather than in distant storage pools. This shift transforms the mining strategy from computation-oriented to storage-oriented.
For further details, you can refer to 《ANS-103: Succinct Proofs of Random Access》https://github.com/ArweaveTeam/arweave-standards/blob/master/ans/ANS-103.md
Arweave 2.5:Packing and Data Explosion
SPoRA has led miners to start storing data frantically because it's the lowest-hanging fruit to improve mining efficiency. So what happens next?
Some savvy miners realized that the bottleneck under this mechanism is actually how quickly they can retrieve data from hard disk drives. The more data blocks they can retrieve from the hard drive, the more succinct proofs they can compute, the more hash operations they can perform, and the higher their chances of mining a block.
So if a miner spends ten times the cost on hard disk drives, such as using faster SSDs for storing data, their hashing power increases tenfold. Of course, this can lead to a kind of arms race similar to GPU power. Storage forms that are faster than SSDs, such as RAM drives with even faster transmission speeds, may also emerge. However, this entirely depends on the cost-benefit ratio.
Now, the fastest speed at which miners can generate hashes is the read/write speed of an SSD hard drive, setting a lower limit on energy consumption similar to the PoW mode, thus making it more environmentally friendly.
Is this perfect then? Certainly not yet. Technicians believe there's still room for improvement.
To enable the upload of larger volumes of data, Arweave 2.5 introduced the Data Bundle mechanism. While not a true protocol upgrade, it has always been a crucial part of the scalability plan, leading to explosive growth in network size. It breaks the limit we initially discussed of 1000 transactions per block, as Data Bundles occupy only one of these 1000 transactions. This lays the foundation for Arweave 2.6.
Figure 5:Packing Mechanism Emergence Leads to Significant Growth in Weave Network Data Scale
Arweave 2.6
Arweave 2.6 marks a significant version upgrade following SPoRA. It takes a further step towards its vision by making Arweave mining more cost-effective, thereby promoting a more decentralized distribution of miners.
So, what sets it apart? Due to space constraints, only a brief overview is provided here. Detailed insights into the mechanism design of Arweave 2.6 will be presented in the future.
In essence, Arweave 2.6 is a throttled version of SPoRA. It introduces a verifiable cryptographic clock that ticks once per second, known as the Hash Chain, to SPoRA.
With each tick, it generates a Mining Hash
Miners select an index of the data partition they store to participate in mining
Combining this Mining Hash with the partition index, a recall range is generated within the selected stored data partition of the miner, consisting of 400 recall blocks available for mining. In addition to this recall range, another recall range 2 is randomly generated in the Weave. If miners store enough data partitions, they can obtain range 2, providing an additional opportunity to mine with another 400 recall blocks, thereby increasing the chances of winning. This effectively incentivizes miners to store copies of enough data partitions.
Miners iteratively test each data block within the recall range. If the result exceeds the current given network difficulty, they win the right to mine. If not satisfied, they move on to test the next data block.
Figure 6: Mechanism of Arweave 2.6
This means that the maximum number of hashes generated per second is fixed, and version 2.6 limits this number to a range that can be handled by ordinary mechanical hard drives. This turns the ability to generate thousands or even hundreds of thousands of hashes per second, which was previously achievable with SSD hard drives, into something of little use, now limited to competing at a speed of several hundred hashes per second alongside mechanical hard drives. It's like a race where a Lamborghini competes with a Toyota Prius at a speed limit of 60 kilometers per hour, greatly restricting the advantage of the Lamborghini. Therefore, the amount of data sets stored by miners now contributes the most to mining performance.
These are some important iterative milestones in Arweave's development journey. From PoA to SPoA to SPoRA, and now to the speed-limited version of SPoRA in Arweave 2.6, it has always followed its original vision. On December 26, 2023, the Arweave team released the whitepaper for version 2.7, which made significant adjustments to these mechanisms, evolving the consensus mechanism to SPoRes. As this is the latest update, it will be thoroughly discussed in a dedicated topic.
For those interested in learning more about Arweave in the future, you can follow X's account at @ArweaveOasis. We will explore the detailed content of Arweave and the AO computing platform there.
“Debug" Program: If you find errors in this article, including typos, grammatical mistakes, incorrect descriptions, ambiguous meanings, redundant descriptions, or other problems, you can give us feedback and we will be rewarded with incentives. Click "here" to give feedback.
🔗 More about PermaDAO :Website | Twitter | Telegram | Discord | Medium | Youtube
💡 Initiated by everVision and sponsored by Forward Research (Arweave Official), PermaDAO is a "Cobuilding Community" focus on the theme of Arweave consensus storage. All contributions from PermaDAO contributors form the bedrock of data consensus. Let's embark on a journey starting with data consensus and delve into a novel paradigm for decentralized collaboration - Decentralized Autonomous Organizations (DAOs)!