This is a read note of Mastering Bitcoin Chapter 08: The Bitcoin Network. Bitcoin is structured as a peer-to-peer network architecture on top of the internet. Bitcoin is a P2P digital cash system by design, and the network architecture is both a reflection and a foundation of that core characteristic. We use the term “extended Bitcoin network” to refer to the overall network that includes the bitcoin P2P protocol, pool-mining protocols, the Stratum protocol, and any other related protocols connecting the components of the Bitcoin system.

1 Introduction

A Bitcoin node is a collection of four functions: routing (N), the blockchain database (B), mining (M), and wallet (W) services.

All nodes include the routing function to participate in the network and might include other functionality. All nodes validate and propagate transactions and blocks, and discover and maintain connections to peers.

Full nodes (B + N) maintain a complete and up-to-date copy of the blockchain and can autonomously and authoritatively verify any transaction without external reference.

SPV nodes or lightweight nodes (W + N). maintain only a subset of the blockchain and verify transactions using a method called simplified payment verification, or SPV.

Mining nodes (M + B + N) compete to create new blocks by running specialized hardware to solve the Proof-of-Work algorithm. Some mining nodes are also full nodes, maintaining a full copy of the blockchain, while others are lightweight nodes participating in pool mining and depending on a pool server to maintain a full node.

User wallets might be part of a full node, as is usually the case with desktop Bitcoin clients. Increasingly, many user wallets, especially those running on resource-constrained devices such as smartphones, are SPV nodes.

2 The Extended Bitcoin Network

The main Bitcoin network, running the bitcoin P2P protocol, consists of between 5,000 and 8,000 listening nodes running various versions of the bitcoin reference client (Bitcoin Core) and a few hundred nodes running various other implementations of the bitcoin P2P protocol.

Attached to the main bitcoin P2P network are a number of pool servers and protocol gateways (Stratum servers) that connect nodes running other protocols.

A Bitcoin Relay Network is a network that attempts to minimize the latency in the transmission of blocks between miners. It was create in 2015 by core developer Matt Corallo. The network consisted of several specialized nodes hosted on the Amazon Web Services infrastructure around the world and served to connect the majority of miners and mining pools. It was replaced in 2016 with the introduction of the Fast Internet Bitcoin Relay Engine (FIBRE) , also created by Matt Corallo.

3 Network Discovery

How does a new node find peers? The first method is to query DNS using a number of “DNS seeds,” which are DNS servers that provide a list of IP addresses of Bitcoin nodes. Some of those DNS seeds provide a static list of IP addresses of stable bitcoin listening nodes. Some of the DNS seeds are custom implementations of BIND (Berkeley Internet Name Daemon) that return a random subset from a list of Bitcoin node addresses collected by a crawler or a long-running Bitcoin node. The Bitcoin Core client contains the names of nine different DNS seeds. Alternatively, a bootstrapping node that knows nothing of the network must be given the IP address of at least one Bitcoin node, after which it can establish connections through further introductions.

To connect to a known peer, nodes establish a TCP connection, usually to port 8333 (the port generally known as the one used by bitcoin), or an alternative port if one is provided. Upon establishing a connection, the node will start a “handshake”.

A node must connect to a few different peers in order to establish diverse paths into the Bitcoin network. Paths are not persistent—nodes come and go—and so the node must continue to discover new nodes as it loses old connections as well as assist other nodes when they bootstrap. Only one connection is needed to bootstrap, because the first node can offer introductions to its peer nodes and those peers can offer further introductions.

4 Exchanging “Inventory”

The first thing a full node will do once it connects to peers is try to construct a complete blockchain. If it is a brand-new node and has no blockchain at all, it only knows one block, the genesis block, which is statically embedded in the client software.

The process of syncing the blockchain starts with the version message, because that contains BestHeight, a node’s current blockchain height (number of blocks).

The peer that has the longer blockchain has more blocks than the other node and can identify which blocks the other node needs in order to “catch up.” It will identify the first 500 blocks to share and transmit their hashes using an inv (inventory) message. The node missing these blocks will then retrieve them, by issuing a series of getdata messages requesting the full block data and identifying the requested blocks using the hashes from the inv message.

SPV nodes download only the block headers and do not download the transactions included in each block. The resulting chain of blocks, without transactions, is 1,000 times smaller than the full blockchain. SPV nodes verify transactions using a slightly different method that relies on peers to provide partial views of relevant parts of the blockchain on demand. A bloom filter is a probabilistic search filter that offers an efficient way to express a search pattern while protecting privacy. They are used by SPV nodes to ask their peers for transactions matching a specific pattern, without revealing exactly which addresses, keys, or transactions they are searching for.

the original implementation of bitcoin communicates entirely in the clear. There are two solutions that provide encryption of the communications: Tor Transport and P2P Authentication and Encryption with BIP-150/151.

5 Transaction Pool

Almost every node on the Bitcoin network maintains a temporary list of unconfirmed transactions called the memory pool (mempool), or transaction pool. Nodes use this pool to keep track of transactions that are known to the network but are not yet included in the blockchain. As transactions are received and verified, they are added to the transaction pool and relayed to the neighboring nodes to propagate on the network.

Some node implementations also maintain a separate pool of orphaned transactions. If a transaction’s inputs refer to a transaction that is not yet known, such as a missing parent, the orphan transaction will be stored temporarily in the orphan pool until the parent transaction arrives.

Both the transaction pool and orphan pool (where implemented) are stored in local memory and are not saved on persistent storage; rather, they are dynamically populated from incoming network messages. When a node starts, both pools are empty and are gradually populated with new transactions received on the network.