Evolution of the EVM Pt. 1
Understanding how and why the Ethereum Virtual Machine (EVM) came into existence is crucial towards understanding further nuance as to the ethos and principles of Ethereum. Equally so, it is essential when developing a new project or designing a protocol to understand the limitations and intentions of the EVM, and to decide if such a system is optimal for use. When Ethereum was initially proposed as a conduit for distributed finance (DeFi) through the support of smart contracts in an alternative to Bitcoin, little was clear about how it would evolve. Since that time the EVM has taken upon a form greatly mirroring that of traditional virtual machines, allowing for computational capabilities that may bring and has already brought about functionality such as zero-knowledge proofs (ZKPs), data-availability, and larger transaction throughput.
Contents:
FAQ
Purpose of VM
Creation of Opcodes & Precompiles
EIPs & Forks
Vulnerabilities of the EVM
Conclusion
FAQ
What are opcodes and precompiles?
Both opcodes and precompiles are analogous to the aims of instruction sets that exist with any computational resource. These instruction sets are typically used to perform some action that requires processing power to execute and will impact either memory or storage as a result. In virtual machines such as the EVM these instructions are represented in this manner as mini-programs that can be run by any operators or users of it across any system. As such it allows for the execution of smart contracts and other cryptographic actions (usually represented as precompiles) that allow Ethereum to take upon its form. There are over a hundred opcodes and nine precompiles currently, and any increase to this number will require hard forks (though decreases to these numbers will only require soft forks). The primary distinction between opcodes and precompiles is that more so of a mental model used by developers based on a theorised usage level.
How do virtual machines such as the EVM work?
Virtual machines (VMs) are digital representations (usually in the form of a program) of a physical machine. In other words programs are used to attempt to emulate the memory, storage, and/or processing power of a physical computer. There are a few subcategories of virtual machines but, like the Java Virtual Machine (JVM), the EVM is used as a benchmark and a standardised template for how nodes will use their own physical resources to participate in the blockchain. There is no one model for the EVM and anyone can replicate it as long as they abide by the specifications cited within the Ethereum yellow paper. Other categories of VMs would usually consist of a program allowing remote access to computational resources from across a network, but the centralisation involved in these categories mean they are not well suited to the aims of the EVM.
Why do ‘forks’ occur in Ethereum?
Whenever any version change is made to the EVM, typically in the form of an Ethereum Improvement Proposal (EIP), some form of fork must occur to allow for the change to take effect. Additionally, when any ‘alternative’ history of the blockchain begins to develop through the form of a divergent chain of blocks off from the main chain there is the possibility of a fork occurring. All forks can be divided into three categories: temporary (ommer blocks), soft, and hard. All forks begin as temporary forks, but can transition to soft or hard forks which are the occasions when the EVM undergoes a significant change. With soft forks it is possible for nodes that have not updated their EVM to participate in the chain still (as changes are backwards compatible), but a hard fork requires an upgrade to be able to participate (as changes are not backwards compatible).
What are ommer (uncle) blocks?
Ommer blocks, which can be conceived as temporary forks, are blocks which have been propagated across the network at a similar time as another block of which is validated to be canonical at some point. When another block with the same parent block in this way becomes canonical it results in all other blocks with the same parent becoming ommer blocks (or uncle blocks). Once six generations have passed since a block was appended to the chain it is considered finalised in its state as either canonical or ommer, with the block with the ‘longest’ (more correctly, most difficult) chain being accepted as the former and the others considered the latter. In an attempt to avoid any race of resources between nodes attempting to grow their chain in contrast to others, a possibility of rewards exist for the proposers of ommer blocks if their block is referenced by a canonical block. Each canonical block can reference up to two ommer blocks and each ommer block can only be referenced once, with both the referring and the referred parties earning a portion of the block reward.
What are common vulnerabilities of Ethereum?
There are many potential vulnerabilities to Ethereum known by different names. One of the most distinctive is the majority attack which involves a malicious actor attempting to assume a majority of the resources of a blockchain network to bring about consensus in their favour or to even reorganise older blocks in the chain. Most other attacks are typically refined versions of this general attack, with opportunities to ‘steal’ old MEV opportunities or perform double-spend attacks (receiving multiples services for only one portion of spending) being among the most profitable. Eclipse attacks are a much rarer and more so theorised form of attack that would involve exploiting peer-to-peer protocols to isolate one or more nodes from the rest of the network and to by extension use their resources to advance a malicious actor’s goal. There are also attacks that are conducted for personal rather than financial benefit, including the ice age attack which can be performed on smaller blockchain networks such as Ethereum Classic to perform mass censorship at great cost to the malicious actor.
Purpose of VM
Typically computational resources exist in the form of a single unique (more or less) copy. These copies, due to the nature of being a singular implementation, are extremely restrictive in which individuals and programs can access it. Generally access is restricted only to those with physical access to the device (where device can be anything from a storage disc to a program), or those that have received permission from an entity of such a category to connect through a network. In some ways these devices can be envisioned as non-fungible tokens (NFTs); only the true owner(s) of the genuine copy of the computational device can use it without restriction. To allow permission for many entities to share a copy of a single computational resource is an important tenant of distributed networks, and it is this form of digital access that will often be implemented through a virtual machine (VM). Virtual machines are programs that have been calibrated to allow individuals and programs that lack physical access to some computational resource to access said resource. In short it is the simplification of the need to share permission by the owner(s) of the device. Despite the mysticism associated with virtual machines there is little more beyond this that defines them.
One historical instance of the production of a virtual machine was that of the Java Virtual Machine (JVM). Distributed to any users of Java applications, the JVM ensured that these programs could be operated without needing to be involved with the compilation process, garbage collection, or thread control. While these three properties are accepted as standard across modern programming languages there was a time when running an application would require careful management. Providing these services in addition to a set of memory, storage, and processing power provided to the user from their own computer but isolated from the requirements of other programs, allows for significantly more complex services to be built and provided to wider audiences. To provide this capacity for abstraction of computational requirements is one particularly powerful tool that virtual machines such as the JVM have been utilised to bring into being. Easing of the necessary resources to connect with a blockchain network is perhaps even more important in that the security of said networks are entirely reliant on the magnitude and diversity of the user base.
When a virtual machine is hosted on a distributed network it is possible to allow all members of said network to share the resources of a single device. This open-source manner is roughly akin to transforming the instance of a “non-fungible token” into a “fungible token”; there is no distinction between each entity regardless of their current resources (to a degree) or their order of access. Under some perspectives it can be conceived how this transforms the existence of computational devices into a resource or a currency that is held by a blockchain network. As tokens and native currencies are vital to guaranteeing the continued cryptonomic existence of blockchains, virtual machines are vital too. However, it is important to emphasise that, as with tokens and native currencies, there always remains some degree of centralised control over the production of virtual machines or their distribution among entities. This control rests with the owner(s) of the original device itself. In a similar regard to the JVM, the Ethereum Virtual Machine (EVM) allows entities to bootstrap access to the Ethereum blockchain network and to participate; however the EVM attempts to decentralise control of the virtual machine further. There is no ‘one true copy’ of the EVM, but instead there is a yellow paper that sets out the specifications for any developers desiring to produce their own virtual machine to access the network. Many have responded to this implied call-to-arms, building a diverse portfolio of EVM clients (implementations of the specifications of the EVM) that provide a decentralised abstraction for other entities to employ. Hence there are many EVMs but this article will solely refer to the specification for the EVM that all these clients must follow.
There are several functions the EVM has intended to serve through isolation of an entity’s computational resources, including: hosting a directed acyclic graph (DAG) in memory, keeping track of a recent history of the blockchain and state of the blockchain in storage, and performing various validation checks before using network ports for propagating transactions.
Depiction of the ten checks each EVM node (implementation of a client) must perform of a transaction before accepting it into their mempool and propagating them to other nodes - each requires processing power to perform.
This is a brief glimpse of what an EVM client does, and what it could do given adjusted specifications (as occurs through the course of soft and hard forks in Ethereum). To interact with the computational resources isolated by a virtual machine, much like accessing the resources of a physical device, requires use of instruction set architectures. These consist of utilising a number of ‘methods’ or ‘commands’ to achieve certain functions. Ethereum calls these instructions opcodes, and on occasion precompiles to group together commonly used opcodes.
Creation of Opcodes & Precompiles
Opcodes
All machines, virtual or physical, require instructions as stipulated previously. When Ethereum was first being developed many of the currently existing opcodes were implemented into existence. Such commands to be implemented include typical Turing-complete actions such as addition, multiplication, and modulo, and other necessities of computing such as temporary or permanent storage, retrieving timestamps, and hash functions. There are more than 100 opcodes in total registered by the yellow paper, with differences in number between clients dependent on what ‘version’ of the blockchain is being accessed by said client (with ‘versions’ being dependent on hard forks of the blockchain). The number of opcodes that exist (at a maximum) directly influence the memory and storage requirements needed for processing a command (currently a byte is provided for each opcode). In total there is a capacity for up to 256 opcodes, though that could also be changed through a hard fork (in contrast, decreasing the number of opcodes would only require a soft fork). To understand the specifications of soft and hard forks will be explained later within this article. Originally there were less opcodes in use, but as calls for more complex computations and smart contracts have increased the need for more opcodes (and precompiles, mentioned later) have grown.
However, for any instruction that is employed by a computational device processing power will be required to execute completely. Therefore, apart from simply specifying what opcodes exist, the yellow paper must also specify the cost of executing any such command. These costs are referred to as gas prices, and exist to ensure that entities cannot create smart contracts or transactions that never cease execution (also known as the halting problem). If there existed no specification as to gas prices it would be possible to either maliciously or accidentally perform a denial-of-service (DoS) attack on the chain. The creator of the transaction or smart contract must pay the gas costs, and are encouraged to use the minimum number of opcodes (expense-wise) as they can. In initial development, Ethereum developers agreed to some rough benchmark tests to use to determine the gas cost of each command. Running each command on a ‘mid-range’ (expense-wise) computational device, the average number of microseconds each took to complete execution were recorded. If on average the addition opcode took 3 microseconds to execute then this instruction would be assigned a gas cost of 3.
Measuring an ‘average’ time for an instruction even at one point in history is a difficult task considering the drastically different architectures between each computational device; it has only grown harder since to claim that there remains an ‘average’ alignment between cost and execution time. There have also been modifications of the opcodes performed to change gas costs often for cryptonominc rather than purely scientific rationales, when Ethereum developers have realised the possibility for particular ‘exploits’ or unintended behaviours. Despite these factors there has not been a significant enough misalignment, between costs and necessary computational resources to execute an opcode, to result in a possibility of new DoS attack vectors or censorship of particular transactions (at least not yet). In tracking these factors and bringing together a project to more succinctly map all the opcodes as specified by the Ethereum yellow paper a periodic table was created. This table conceptualises a comparison between each command as to their gas costs and execution times on the most common client/operating system (OS) setup.
Demonstration of most of the opcodes in Ethereum in the form of a ‘periodic table’.
Having used Geth on a 64-bit Linux system (similar to that used by more than 90% of nodes on Ethereum) the above data was collated. Left-to-right symbolises increasing execution times for each opcode, and top-to-bottom symbolise increasing gas costs. For commands symbolising multiple together (such as SHA3) the most common ‘average’ gas cost and ‘average’ execution time of all these commands together were used. Any opcodes along the diagonal axis from top-left to bottom-right (excluding those in the bottom two rows) follow a linear pattern in terms of the contrast between execution times and gas costs (comparatively). Instructions above this diagonal take longer to execute than their gas costs would imply, and any below would be more expensive to execute than their execution time implies (once again excluding the bottom two rows which are actually above the diagonal). Despite any divergences from the diagonal however, it must be emphasised that there is no significant difference to the level required to have an impact on the cryptonomic security of the EVM. Additionally, some opcodes have not been cited in the above table due to their incredibly inconsistent gas costs or infrequent usage. However, despite the insignificance in divergences it can be seen that there is a general pattern of greater differences in expected execution times in comparison to varying classes of gas costs. This is likely as a result of an intention to simplify any workings with gas costs on behalf of developers, though it does lead to particular situations where an opcode can cost the same but require up to seven times the processing power to execute.
Looking past the comparisons of execution time and gas costs, the above periodic table can be referred to as a succinct version of the table of all existing opcodes. Numbers in the top-right hand corner of each opcode signify the average gas costs, and the colour of each signifies the group they are categorised within (with the table taking upon a slightly different grouping scheme to that taken by Ethereum canonically). Additionally the hexadecimal representations of each instruction have been included, signifying the exact byte used to refer to any particular opcode. Each command has also been referred to with a two or one letter code to fit into the periodic table theme - though this has little association with any current EVM standard it may assist researchers in the future with simplifying references to each instruction. Importantly it must also be emphasised that the periodic table designed above, much like the periodic table of elements, is a rough design that has abstracted some complexity and differentiations between opcodes away for a simplification of the visual illustration of their basic properties. Also, none of the above data or visualisation of opcodes are ‘condoned’ by the Ethereum Foundation and are as such an independent interpretation of their meanings. Below is presented a variation of the periodic table significantly more aligned with that intended by the Ethereum yellow paper, separating opcodes into columns based on their “schedule” (a concept used to differentiate opcodes into categories of cost).
Demonstration of most of the opcodes in Ethereum in the form of a ‘schedule table’.
Precompiles
Similar in design and purpose to the concept of opcodes, precompiles are optimised smart contracts located at fixed addresses that can be called upon by any other smart contract to perform computation. Tending to represent larger operations, specifically of the cryptographic variety, they are able to perform particular commands at fixed or variable gas prices that would be lower than when rewriting the same functionality through opcodes in a new smart contract. Essentially operations that are critical in importance to the network but are perhaps too niche to represent an entire opcode can be conducted in a more affordable manner through precompiles. They can be called in a similar manner to opcodes by citing the address the contract is located at.
PUSH 8 PUSH 32 MSTORE DUP1 PUSH 32 CALL 0x02 // Precompile for SHA2-256
Currently there are 9 precompiles on Ethereum, most of which are used to support hash operations or elliptic curve (EC) arithmetic. To add any additional as with opcodes would require a hard fork, and reducing this number could be done with a soft fork. As demand increases there will likely be an increased need for precompiles which are generally more flexible to develop than opcodes. The impending Merge event wherein Ethereum will move to a proof-of-stake model will also likely require more such commands to enact cryptographic primitives including Boneh-Lynn-Shacham (BLS) signatures. As it stands currently though, the below table details the existing precompiles and calculations for their gas costs.
Precompiles | Address | Gas cost | Description |
---|---|---|---|
ecRecover | 0x01 | $3000$ | Recovers EC public-keys on demand. |
SHA2-256 | 0x02 | $60 + 12 \times \mathrm{data\_word\_size}$ | Performs a SHA2-256 hash function. |
RIPEMD-160 | 0x03 | $600 + 120 \times \mathrm{data\_word\_size}$ | Performs a RIPEMD-160 hash function. |
identity | 0x04 | $15 + 3 \times \mathrm{data\_word\_size}$ | Returns the input, typically for copying memory. |
modexp | 0x05 | $\mathrm{Max}(200, \frac{\mathrm{multiplication\_complexity}^2 \times \mathrm{iteration\_count}}{3})$ | Performs an exponential function on some value given some modulo. |
ecAdd | 0x06 | $150$ | Adds any two points on an EC. |
ecMul | 0x07 | $6000$ | Multiplies a point by a scalar on an EC. |
ecPairing | 0x08 | $45000$ | Allows the finding of bilinear maps on an EC. |
blake2f | 0x09 | $\mathrm{rounds}$ | Performs a BLAKE2F hash function. |
Primarily gas costs can be seen to either be static, or dependent on data_word_size
, multiplication_complexity
, or iteration_count
which are explained in greater detail through other resources (explaining each individual component that underlies the cost of these contracts will distract from the purpose of the article). To exemplify the significance of savings that can be acquired through using precompiles over attempting to implement even simplistic cryptographic operations in opcodes an example of EC addition is provided below. While far from being the most efficient form of writing a smart contract to achieve this task the distinction is apparent enough despite attempts to avoid using any memory or otherwise (commands that are typically expensive). In total the below operation would cost on average 79685 gas to run, far exceeding the 150 gas that the same precompile operation would have approximately cost (not counting saving to memory).
pragma solidity ^0.8.0;
contract ecAdd {
uint[] res;
constructor() public { }
function add() public {
uint lambda = (8 - 16) / (4 - 6);
res.push(lambda ** 2 - 6 - 4);
res.push(lambda * (6 - 4) - 8);
}
}
EIPs & Forks
EIPs
Overtime any technical project must be allowed to evolve to accommodate either growing demand, changing technology, or different markets. Ethereum is no different and since its inception has relied on the implementation of Ethereum improvement proposals (EIPs) to bring about such evolution. Much of this change has seen the introduction of new features with the intention of directing the chain towards a particular roadmap that has recently been better visualised in the short and medium term future as the consensus layer. Therefore it can be said that for any change to occur to the EVM an EIP must be instituted. In this line of thinking it can be observed that when adding new (or removing old) opcodes or precompiles, EIPs will typically be relied upon.
There are currently over 700 EIPs that have ever been implemented or have at the very least been proposed; there is also a category of EIPs known as Ethereum requests for comment (ERC) that have a slightly different purpose and aren’t referred to here. To complete the implementation of one of these proposals involves in some regard bringing about a fork. More elaboration is provided below but for the most part a fork involves bringing about a change to the EVM to accommodate for any changes that may occur where possible. Clients and by extension nodes might at these occasions need to update their implementations of the EVM to accommodate new blocks or new mempool rules among other possibilities to continue to observe and participate on the chain.
Forks
Ommer Blocks (Temporary Forks)
When any number of nodes discover more than one block at a particular moment and propagate these blocks across the network at varying speeds and efficiencies, there will usually result in a case of more than one block being proposed as the next block in the chain across the network. As there can only ever be one block at a time that is the ‘next block’ in the chain, the EVM must have some mechanism through which to contend with this possibility and to ensure that the chain does not ‘break’. These instances can be thought of as ‘temporary forks’ wherein there is a possibility for the chain to accept one block or another, completely changing the eventual history of the chain. Though it must emphasised that not all temporary forks are intended to bring about EIP change, it is possible for a temporary fork to be used to create a soft or hard fork to then bring about EIP change. In the case of Ethereum the mechanism to confront this has come about through the conception of ommer (or uncle, as they are often known) blocks that are acknowledged to exist by the chain but are not canonical.
Ommer blocks can be defined to be a block which shares a common ancestor (an older block) with a block on the chain, but which is not an ancestor of said block on the chain. This may appear non-intuitive and a vague definition, but it is constructed as such to account for any case where there is a block that exists that could have been on the chain but another block instead ‘filled its place’. To decide which block is the canonical block and which blocks are the ommer blocks requires a process of finalisation. This involves allowing the network to come to consensus by choosing the block that will be ‘validated’ (mined or attested to) through use of one’s resources. On average more than half of the resources of a network would ensure that a given block is ‘validated’. The remaining blocks would then be ommer blocks.
There is a six block period used to judge which status blocks are finalised to due to the computational infeasibility of then being able to reverse this finalisation later. It would involve an actor needing to hold enough of a majority of the resources that they had a chance of finalising an alternative block from six or more generations ago instead. Assuming everything else remains stable an actor would need almost 90% of the total resources of a network to have a 50% chance of reversing such finalisation. Ethereum uses rules that are roughly similar to a ‘longest chain rule’ (but is actually currently dependent on the ‘most difficult chain’ which typically aligns most of the time) to determine that the ‘longest chain’ to be the canonical one. Hence, more than half of the network’s resources would usually guarantee that any new blocks are descended from the current canonical block, in turn increasing the chance that the canonical chain remains canonical.
To further encourage that no such attempts are made to reverse any finalisation there are additional reward mechanisms for ommer blocks to compensate for the resources devoted to their production and to encourage remaining with the canonical chain. Any block can reference at most two ommer blocks that have occurred within the last six generations to receive such a reward, scaled depending on how long ago the ommer block was published. Where $0 < i < 7$ is how many generations ago the ommer block was published, $b$ is the standard block reward for proposing a canonical block, and $r$ is the final reward, the following equation holds.
$$ \begin{equation*} r_o = \frac{b(8 - i)}{8} \end{equation*} $$
Hence, the maximum reward that can be acquired is $\frac{7}{8}b$, and the minimum is $\frac{2}{8}b$ on behalf of the proposer of a block that is not canonical itself but is referenced by a canonical block. Additionally, to encourage that any node would not avoid referring ommer blocks for reward, the canonical block proposer themself also earns an additional reward for each block they reference (though as mentioned earlier only two ommer blocks can be referenced per a block, and a unique ommer block cannot be referenced more than once).
$$ \begin{equation*} r_c = \frac{b}{32} \end{equation*} $$
Therefore, referencing the maximum number of ommer blocks can lead to an additional $\frac{1}{16}^{\mathrm{th}}$ of the block reward for a block proposer, bring their total reward for proposing a block to $\frac{17}{16}b$ where $b$ is the typical block reward (without considering additional fees for the transactions included within a block).
Soft Forks
Soft forks occur upon any EVM update that happens to remain backwards compatible. One example of a backwards compatible update would be if the number of opcodes were decreased (i.e. an opcode was rendered obsolete). Any blocks following these new ‘rules’ will be accepted by both nodes that have and have not upgraded to accommodate the new EVM version as a block that contains transactions that only rely on a subset of existing opcodes will meet the standards of both the new and old ‘rules’. In extension any block that conforms to the old ‘rules’ but not the new (i.e. relies upon an obsolete opcode) will only be accepted by the nodes that have not upgraded. Nodes are not necessarily required to upgrade in the case of a soft fork as they are equally able to participate in the operation of the chain regardless, though they will likely upgrade over time to keep up with the most up to date version of the EVM.
Simplistic illustration of the implications of a soft fork.
Hard Forks
In contrast hard forks are complete divergences from the previous environment that EVM facilitated, and it occurs when an EIP requires changes to be introduced that are not backwards compatible. Unlike a soft fork, all nodes must upgrade their version of the EVM to continue to participate in and observe the chain from the point of the hard fork onwards. If a node that isn’t upgraded observes a block that is following the ‘rules’ of the new EVM version they will reject said block or perhaps not even be able to see it. Additionally, in much a similar manner to soft forks, the two ‘blockchains’ (the old and the new) can coexist separately with no interaction between each. Upon the DAO Attack Ethereum required a hard fork to reverse the damages caused, leaving behind a chain that became Ethereum Classic that has shared no interaction since.
Simplistic illustration of the implications of a hard fork.
Vulnerabilities of the EVM
While a variety of the following attacks and vulnerabilities upon the EVM are not specific to the EVM or typified by any such design choice, they are presented to exemplify them in an Ethereum-centric perspective. By understanding the nuanced connections between specific elements of the EVM and these attacks it is possible to come to a new understanding and appreciation as to how they can be avoided or what has been done to avoid them.
Double Spend
Such an attack is easy to exploit and easy to prevent. By spending the same currency twice, a possibility that is unique to digital currencies such as cryptocurrencies, it is possible to purchase multiples services/goods without spending any additional currency. In a double spending attack, the attacker will usually submit two conflicting transactions to the network within a short period of time. One transaction will send money to a merchant as payment for services/goods, and the other transaction will be sending the same money to another merchant. Propagated across the network, different nodes will receive both transactions at different times and begin to add them into proposed blocks to be validated and appended to the chain. It is possible that the transactions will end up in separate blocks, meaning that one is included in an ommer block and the other is in a canonical block.
When this happens, there is a chance that the merchant will provide the services/goods to the attacker on account of the new canonical block before waiting for finalisation of the blocks (which is why it is recommended that no goods/services are offered prior to finalisation). After receiving this, the attacker might then use their resources (or encourage others to use their resources) to instead ensure the chain on which the ommer block is contained within becomes the canonical chain instead. When this alternative chain becomes canonical, the transaction that existed in the ommer block becomes ‘valid’ and as such the attacker would be able to receive their new goods/services. Only the currency spent on a canonical chain is deducted from one’s wallet, and so the attacker would have been able to receive multiple goods/services for one payment. It must be re-emphasised though that This vulnerability only really exists if the merchant provides service/goods without waiting for any confirmed blocks (i.e. waiting six blocks before confirming).
Demonstrating how the manipulation of ‘weak’ finalisation and ommer blocks can be used to perform a type of majority attack: a double spending attack.
Eclipse
An eclipse attack is the practise of isolating a specific node from the network to deceive said node. This is made possible because, for efficiency, Ethereum does not allow each node to connect to every other node. Instead, each node connects to a small group of other nodes who in turn have their own small group of connections. In Ethereum each nodes connects to thirteen other nodes that they have discovered through peer-to-peer protocols included within the EVM. The theoretical attacker would attempt to find, via trial and error primarily though perhaps also through a deep understanding of the peer-to-peer protocols, the nodes that are connected to the intended victim. The attacker would need to control a host of nodes to complete the attack and wait for the victim node to temporarily disconnect from the network. Once the victim node reconnects to the network and begins the process of looking for other nodes to connect to, there is a possibility (not guaranteed by any means) that the attacker will be in control of all of the victim’s connections. An attacker can use this isolation to deceive the targeted node, and theoretically could make a majority attack more likely if they can redirect the victim node’s resources to ‘validate’ the attacker’s blocks. Forms of deception may also include convincing the victim that the chain is in some state different to its true state, or supplying false public-keys that may convince the victim to direct transactions towards the attacker unwittingly.
Demonstration of how a malicious actor may attempt to manipulate peer-to-peer protocols to isolate a victim node.
Majority attacks may also be made more possible by instead blocking the propagation of blocks and/or transactions from a victim node such that they cannot contribute their resources to the continuation of the chain. All peer-to-peer protocols that encourage propagation of these structures rely on at least one of the nodes a node is connected to being honest. In the case that there aren’t any, there are no additional safeguard mechanisms (though an attacker achieving this goal is far from guaranteed). While the probability of a successful attack would decrease given any increase in the number of nodes a node seeks to connect to upon connecting to Ethereum, any increase would also result in greater network congestion as there will be an increase in ‘duplicate’ propagations of blocks and transactions. As it stands currently even a malicious actor with a clear majority (>90%) control of all resources, and by extension nodes, on a network would only have a 25% chance of successfully executing an eclipse attack on another node (assuming the peer-to-peer protocol is truly random).
In fact, no matter how much resources the attacker controls, it will always be more possible to undergo a majority rather than eclipse attack. Eclipse attacks instead serve as a coincidence of convenience rather than the direct goal of an attack. From an incentives point-of-view it is likely an attacker would only be motivated to undergo such a scheme if they could use it to bring about a majority attack or ‘steal’ an MEV opportunity discovered by the victim node. More details about the latter are provided in the analysis of the journey of Ethereum transactions. Though it is also a low-cost form of censorship as well assuming there is a malicious intent with the motivation to block another node from submitting a transaction or a block. This latter form of attack would however be less likely to occur due to questionable opportunity for reward involved, and as such would often involve personal rather than financial motivations as uncovered more so in the following section.
Ice Age
Rather unknown as a style of attack potentially referred to by different names and often not considered when designing the EVM, the ice age attack is not a typical one. Primarily the application of such an attack is to proof-of-work networks such as Ethereum prior to the Merge. It involves the practise of joining a network with a very large amount of computational power, thus increasing the difficulty to mine a block over the network greatly. The attacker would then leave as suddenly as they entered the network after mining enough blocks to bring difficulty up to a sufficient level, leaving remaining miners with insufficient computational resources to mine the next block. Hence the blockchain would grind to a halt without block production, and mining the difficulty down without being able to mine any blocks would become difficult and perhaps infeasible. Such an attack would be rather counter-intuitive as an attacker performing this attack would not be doing so out of financial gain, but rather to destroy a network. They would require an overwhelming majority of computational power, meaning they would need to resist performing a majority attack which could be profitable to the attacker. This attack would most likely only be able to occur on less popular cryptocurrencies, but it is a great reminder of how vulnerabilities can be created by assuming that attackers only ever act logically or for financial rather than personal gain.
Majority
Attempting to gain a majority of the resources of a network (be that computational, monetary, or otherwise), attackers may try to exert authoritative control over the blockchain to reform consensus decisions in their favour. If an attacker was to possess such authority on the network, they would theoretically be able to ensure that their proposed blocks were always canonical and ‘validated’, and they might eventually even be able to reverse previously canonical blocks in their favour. Examples of favourable outcomes might include taking advantage of a previously cited MEV opportunity that another node discovered, or censoring a transaction that might have caused them a loss. Majority control of resources is not necessarily required to execute a successful attack, however it wouldn’t be statistically feasible otherwise and hence this attack vector continues to be denoted as a ‘majority’ attack. For example if the malicious actor controlled 10% of the resources of the network, and a merchant waits the recommended six blocks for finalisation before providing a service (as in the case of the double spend attack previously), the actor would only have an approximate $10^{-4}%$ chance of overcoming the six block deficit and being accepted as the true chain. Considering the immense cost and the low chance of success, it is unlikely a node would attempt this feat; most studies suggest the potential reward in this case would need to be more than one million times larger than the investment for any attempt to be made.
It is the decentralised nature and intention of blockchain networks including Ethereum that allows these attack vectors to feasibly exist. Ethereum intends that consensus is made in a trustless manner as an alternative to typical models of trusted consensus (or centralised consensus) which would encourage respect in an authority to make decisions. In most modern financial systems there are a number of regulatory, corporate, and government organisations that are imbued with authority to bring about consensus. However, doing so brings about the vulnerability of corruption or ‘malicious’ action (either accidental or deliberate). By spreading consensus across some wide social network under the assumption that there is no trust between each of this community towards each other, the ‘Byzantine Generals Problem’ provides consideration to some observations that can be made. An adaptation of the earlier ‘Two Generals Problem’ that uncovered significant findings for the field of computer science, it has been found now though that to ensure Byzantine Fault-Tolerance (BFT) in a network the following must hold true.
Censorship Resistance
As the first condition, malicious actors must not control any more than one-third of the entire resources of a network. Otherwise it is possible (though unlikely) to be able to direct decision-making in a network in such a manner that the remaining nodes cannot be sure of any consensus. Assuming every one in three nodes are malicious, each one of the remaining two honest nodes in this ‘trio’ can never be certain as to which node is honest or malicious and therefore would be unable to commit to any consensus. Therefore, while controlling more than one-third of the resources of a network would not on its own allow manipulation of consensus to one’s favour, it would allow censorship of transactions or the operations of the blockchain.
As observed with regards to the eclipse attack there is no guarantee that one more than one-third of resources (where resources can be considered as proportional to number of nodes) would be sufficient for malicious censorship. The uneven number of connections per a node means that there is no opportunity for a node to be forced into inaction; instead there will always be commitment to either the consensus of the malicious actors or honest actors, dependent on who controls at least seven of the connections. The more above one-third of resources that a malicious actor controls however lends to the increased probability of this sub-type of majority attack occurring.
Interestingly, if an honest actor controls more than one-third of resources they are unable to ‘censor’ the consensus of the malicious actor as nodes under their control will not have the same ‘confusion’ as to which consensus is ‘correct’. Therefore, in summary, it is necessary for any BFT system to ensure that honest actors, where possible, control at least two-thirds of all resources on said system.
Majority Control
As the second condition, and perhaps obvious given the content so far, malicious actors must not control any more than one-half of the resources of a network. Otherwise it is possible to direct consensus across the network. BFT systems rely on the concept that a majority of nodes voting in one form or another is the basis to ascertaining which consensus is ‘correct’, and as such if the majority is in control of malicious actors they will be able to perform this action. Departing from this condition is a significantly more dangerous risk than departing from the censorship resistance condition denoted above. To this extent departing from this condition could be seen as departing from a form of ‘weak censorship resistance’ in contrast to a form of ‘strong censorship resistance’ when departing from the previous condition.
While Ethereum generally is too large (i.e. requires too many resources to have a majority control) to be the victim of a majority attack, an older version of Ethereum known as Ethereum Classic (the original prior to the fork caused by the the DAO Attack) has been on a number of occasions. This is the result of no specific flaw in their EVM implementation beyond the increased ease to perform a majority attack on a network that has a decreased number of participants. Therefore Ethereum is heavily reliant on a large number of active participants all dedicating resources to the network to avoid the possibility of a majority attack.
Conclusion
There has been significant development throughout the history of the EVM to bring it to the point it exists in now. Adaptation to vulnerabilities and further consideration of problems from fields of computer science, cryptography, and cryptonomics have brought about the greatest of these changes. Given all of this the future goals of the EVM have become much more clearly focused on two specific goals of innovation to bring about a new phase of Ethereum: decentralisation and scalability. In future weeks more thought will be dedicated to the future of the EVM, and analysing how the history of it will align with any further transformations. For the current time though and prior to the occurrence of the Merge the above investigation and definition of the EVM can be used to understand how Ethereum has evolved and why it has undergone this path.