This is a read note of Mastering Ethereum Ch06: Transactions. Transactions are signed messages originated by an externally owned account (EOA), transmitted by the Ethereum network, and recorded on the Ethereum blockchain. Transactions can trigger a change of state, or cause a contract to execute in the EVM.

1 The Structure of a Transaction

A transaction is a serialized binary message that contains the following data:

  • Nonce: A sequence number, issued by the originating EOA, used to prevent message replay.
  • Gas price: The amount of ether (in wei) that the originator is willing to pay for each unit of gas.
  • Gas limit: The maximum amount of gas the originator is willing to buy for this transaction.
  • Recipient: The destination Ethereum address.
  • Value: The amount of ether (in wei) to send to the destination.
  • Data: The variable-length binary data payload.
  • (v,r,s): The three components of an ECDSA digital signature of the originating EOA

The transaction message’s structure is serialized using the Recursive Length Prefix (RLP) encoding scheme, which was created specifically for simple, byte-perfect data serialization in Ethereum. All numbers in Ethereum are encoded as big-endian integers, of lengths that are multiples of 8 bits.

There is no “from” data in the address identifying the originator EOA. That is because the EOA’s public key can be derived from the `(v,r,s) components of the ECDSA signature. The address can, in turn, be derived from the public key. Other metadata frequently added to the transaction by client software includes the block number (once it is mined and included in the blockchain) and a transaction ID (calculated hash). Again, this data is derived from the transaction, and does not form part of the transaction message itself.

2 The Transaction Nonce

The Yellow Paper defines nonce as:

nonce: A scalar value equal to the number of transactions sent from this address or, in the case of accounts with associated code, the number of contract-creations made by this account.

The nonce serves two purposes: making each transaction unique and seting priority.

In practical terms, the nonce is an up-to-date count of the number of confirmed (i.e., on-chain) transactions that have originated from an account. When you create a new transaction, you assign the next nonce in the sequence. But until it is confirmed, it will not count toward the getTransactionCount total.

The Ethereum network processes transactions sequentially, based on the nonce. That means that if you transmit a transaction with nonce 0 and then transmit a transaction with nonce 2, the second transaction will not be included in any blocks. It will be stored in the mempool, while the Ethereum network waits for the missing nonce to appear. All nodes will assume that the missing nonce has simply been delayed and that the transaction with nonce 2 was received out of sequence. If you then transmit a transaction with the missing nonce 1, both transactions (nonces 1 and 2) will be processed and included (if valid, of course). Once you fill the gap, the network can mine the out-of-sequence transaction that it held in the mempool.

A transaction can create an inadvertent “gap” in the nonce sequence because it is invalid or has insufficient gas. To get things moving again, you have to transmit a valid transaction with the missing nonce. You should be equally mindful that once a transaction with the “missing” nonce is validated by the network, all the broadcast transactions with subsequent nonces will incrementally become valid; it is not possible to “recall” a transaction!

The nonce may cause concurrency problems.

3 Transaction Gas

Gas is the fuel of Ethereum. Gas is not ether. It’s a separate virtual currency with its own exchange rate against ether. Ethereum uses gas to control the amount of resources that a transaction can use. Gas is separate from ether in order to protect the system from the volatility that might arise along with rapid changes in the value of ether, and also as a way to manage the important and sensitive ratios between the costs of the various resources that gas pays for (namely, computation, memory, and storage).

The gasPrice field in a transaction allows the transaction originator to set the price they are willing to pay in exchange for gas. The web3 interface offers a getGasPrice suggestion, by calculating a median price across several blocks.

gasLimit gives the maximum number of units of gas the transaction originator is willing to buy in order to complete the transaction. If your transaction’s destination address is a contract, then the amount of gas needed can be estimated but cannot be determined with accuracy. That’s because a contract can evaluate different conditions that lead to different execution paths, with different total gas costs.

When you transmit your transaction, one of the first validation steps is to check that the account it originated from has enough ether to pay the gasPrice * gasLimit. But the amount is not actually deducted from your account until the transaction finishes executing. You are only billed for gas actually consumed by your transaction, but you have to have enough balance for the maximum amount you are willing to pay before you send your transaction.

4 Transaction Recipient, Value and Data

The transaction recipient contains a 20-byte Ethereum address. The address can be an EOA or a contract address. The Ethereum protocol does not validate recipient addresses in transactions. You can send to an address that has no corresponding private key or contract, thereby “burning” the ether, rendering it forever unspendable. Validation should be done at the user interface level. There is a special burn address 0x000000000000000000000000000000000000dEaD.

The main “payload” of a transaction is contained in two fields: value and data. Transactions can have both value and data, only value, only data, or neither value nor data. All four combinations are valid.

  • A transaction with only value is a payment.
  • A transaction with only data is an invocation.
  • A transaction with both value and data is both a payment and an invocation.
  • A transaction with neither value nor data—well is just a waste of gas!

Payment transactions behave differently depending on whether the destination address is a contract or not.

  • For EOA addresses, or rather for any address that isn’t flagged as a contract on the blockchain, Ethereum will record a state change, adding the value you sent to the balance of the address. If the address has not been seen before, it will be added to the client’s internal representation of the state and its balance initialized to the value of your payment.
  • If the destination address (to) is a contract, then the EVM will execute the contract and will attempt to call the function named in the data payload of your transaction. If there is no data in your transaction, the EVM will call a fallback function and, if that function is payable, will execute it to determine what to do next. If there is no code in fallback function, then the effect of the transaction will be to increase the balance of the contract, exactly like a payment to a wallet. If there is no fallback function or non-payable fallback function, then transaction will be reverted. A contract can reject incoming payments by throwing an exception immediately when a function is called, or as determined by conditions coded in a function. If the function terminates successfully (without an exception), then the contract’s state is updated to reflect an increase in the contract’s ether balance.

When your transaction contains data, it is most likely addressed to a contract address. Ethereum protocol ignored the data sent to an EOA – it is up to the wallet to interprete the data. The data sent to a contract address will be interpreted by the EVM as a contract invocation. Most contracts use this data more specifically as a function invocation, calling the named function and passing any encoded arguments to the function. The data payload sent to an ABI-compatible contract (which you can assume all contracts are) is a hex-serialized encoding of:

  • A function selector: the first 4 bytes of the Keccak-256 hash of the function’s prototype. This allows the contract to unambiguously identify which function you wish to invoke.
  • The function arguments: the function’s arguments, encoded according to the rules for the various elementary types defined in the ABI specification.

5 Special Transaction: Contract Creation

Contract creation transactions are sent to a special destination address called the zero address; the to field in a contract registration transaction contains the address 0x0. This address represents neither an EOA (there is no corresponding private–public key pair) nor a contract. It can never spend ether or initiate a transaction. It is only used as a destination, with the special meaning “create this contract”.

A contract creation transaction need only contain a data payload that contains the compiled bytecode which will create the contract. The only effect of this transaction is to create the contract. You can include an ether amount in the value field if you want to set the new contract up with a starting balance, but that is entirely optional. If you send a value (ether) to the contract creation address without a data payload (no contract), then the effect is the same as sending to a burn address—there is no contract to credit, so the ether is lost.

It is good practice to always specify a to parameter, even in the case of zero-address contract creation, because the cost of accidentally sending your ether to 0x0 and losing it forever is too great. You should also specify a gasPrice and gasLimit.

6 The Elliptic Curve Digital Signature Algorithm (ECDSA)

A digtial signature serves three purposes in Ethereum: authorization, non-repudiation, and integrity. The ECDSA signs the hash of a message m with a private key k. The result is a pair of (r, s). To verify the signature, one needs the signature (r, s), the message m, and the public key.

The signature algorithm first generates an ephemeral (temporary) private key in a cryptographically secure way. This temporary key is used in the calculation of the r and s values to ensure that the sender’s actual private key can’t be calculated by attackers watching signed transactions on the Ethereum network. From the ephemeral private key q, there is a corresponding public key Q. The The r value of the digital signature is then the x coordinate of the ephemeral public key Q. The algorithm calculates the s value of the signature, such that: s ≡ q ** (-1) (Keccak256(m) + r * k) (mod p), here the p is the prime order of the elliptic curve.

Verification is the inverse of the signature generation function, using the r and s values and the sender’s public key to calculate a value Q, which is a point on the elliptic curve (the ephemeral public key used in signature creation). The steps are as follows:

  • Check all inputs are correctly formed
  • Calculate w = s ** (-1) mod p
  • Calculate u1 = Keccak256(m) * w mod p
  • Calculate u2 = r * w mod p
  • Finally, calculate the point on the elliptic curve Q ≡ u1 * G + u2 * K (mod p). The K is the signer’s (EOA owner’s) public key and the G is the elliptic curve generator point.
  • If the x coordinate of the calculated point Q is equal to r, then the verifier can conclude that the signature is valid.

Note that in verifying the signature, the private key is neither known nor revealed.

7 Signing and Offline Signing

In Ethereum, “sign the transaction” actually mean “sign the Keccak-256 hash of the RLP-serialized transaction data.” The signature is applied to the hash of the transaction data, not the transaction itself. A transaction data structure containing nine fields: nonce, gasPrice, gasLimit, to, value, data, chainID, 0, 0. EIP-155 adds three fields to the main six fields of the transaction data structure, namely the chain identifier, 0, and 0. These three fields are added to the transaction data before it is encoded and hashed.

Signing appends the ECDSA signature’s computed v, r, and s values to the transaction. The special signature variable v indicates two things: the chain ID and the recovery identifier to help the ECDSArecover function check the signature. It is calculated as either one of 27 or 28, or as the chain ID doubled plus 35 or 36. The recovery identifier is used to indicate the parity of the y component of the public key.

Separating the functions of signing and transmitting and performing them on different machines (on an offline and an online device, respectively) is called offline signing and is a common security practice.

Depending on the level of security you need, your “offline signing” computer can have varying degrees of separation from the online computer, ranging from an isolated and firewalled subnet (online but segregated) to a completely offline system known as an air-gapped system.

8 Multiple-Signature (Multisig) Transactions

Ethereum’s basic EOA value transactions have no provisions for multiple signatures; however, arbitrary signing restrictions can be enforced by smart contracts with any conditions you can think of, to handle the transfer of ether and tokens alike.

To take advantage of this capability, ether has to be transferred to a “wallet contract” that is programmed with the spending rules desired, such as multisignature requirements or spending limits (or combinations of the two). The wallet contract then sends the funds when prompted by an authorized EOA once the spending conditions have been satisfied.