This is a read note of IPFS docs. IPFS is a distributed system for storing and accessing files, websites, applications, and data. IPFS knows how to find data by its contents and get it from multiple sources that have the data. And, when you use IPFS, you don’t just download files from someone else — your computer also helps distribute them.

1 Introduction

IPFS is a peer-to-peer (p2p) storage network. Content is accessible through peers located anywhere in the world, that might relay information, store it, or do both. IPFS knows how to find what you ask for using its content address rather than its location. IPFS supports a resilient internet, makes it harder to censor content and can speed up the web when you’re far away or disconnected.

You can also make content available more permanently by pinning it, which saves it to your computer and makes it available on the IPFS network until you decide to unpin it. If you want to make sure one of your own files is permanently shared on the internet today, you might use a for-pay file-sharing service like Dropbox. Some people have begun offering similar services based on IPFS called pinning services.

There are three fundamental principles to understanding IPFS:

  • Unique identification via content addressing
  • Content linking via directed acyclic graphs (DAGs)
  • Content discovery via distributed hash tables (DHTs)

1.1 Content addressing

Every piece of content that uses the IPFS protocol has a content identifier, or CID, that is its hash. Identifying a data object (like a Merkle DAG node) by the value of its hash is content addressing. It means:

  • A new address if there is a change. For permanent link, use IPNS, Mutable File System (MFS), and DNSLink.
  • Address is fixed if the content doesn’t change.

Many distributed systems make use of content addressing through hashes as a means for not just identifying content but also linking it together — everything from the commits that back your code to the blockchains that run cryptocurrencies leverage this strategy. However, the underlying data structures in these systems are not necessarily interoperable. This is where the Interplanetary Linked Data (IPLD) project comes in. IPLD translates between hash-linked data structures allowing for the unification of the data across distributed systems.

IPFS follows particular data-structure preferences and conventions. The IPFS protocol uses those conventions and IPLD to get from raw content to an IPFS address that uniquely identifies content on the IPFS network.

1.2 Directed acyclic graphs (DAGs)

IPFS uses Merkle DAGs, which are DAGs where each node has a unique identifier that is a hash of the node’s contents. IPFS uses a Merkle DAG that is optimized for representing directories and files, but you can structure a Merkle DAG in many different ways.

IPFS often splits content into blocks. Splitting it into blocks means that different parts of the file can come from different sources and be authenticated quickly. Another useful feature of Merkle DAGs and breaking content into blocks is that if you have two similar files, they can share parts of the Merkle DAG, i.e., parts of different Merkle DAGs can reference the same subset of data.

1.3 Distributed hash tables (DHTs)

A distributed hash table is one where the table is split across all the peers in a distributed network. To find content, you ask these peers.

The libp2p project is the part of the IPFS ecosystem that provides the DHT and handles peers connecting and talking to each other. What makes libp2p especially useful for peer to peer connections is connection multiplexing. One connection is all you need to perform different tasks.

You use DHT to find the peers stroing the contents, and again find current location of those peers (routing).

Now, you need to connect to that content and get it (exchange). To request blocks from and send blocks to other peers, IPFS currently uses a module called Bitswap.

1.4 Privacy and Encryption

While IPFS traffic between nodes is encrypted, the metadata those nodes publish to the DHT is public. Nodes announce a variety of information essential to the DHT’s function — including their unique node identifiers (PeerIDs) and the CIDs of data that they’re providing.

All traffic on IPFS is public, including the contents of files themselves, unless they’re encrypted. When you use IPFS to retrieve a particular CID, your node queries the DHT to find the closest nodes to you with that item — and by default also agrees to re-provide that CID to other nodes for a limited time until periodic “garbage collection” clears your cache of content you haven’t used in a while. You can also “pin” CIDs that you want to make sure are never garbage-collected — either explicitly using IPFS’s low-level pin API or implicitly using the Mutable File System (MFS) — which also means you’re acting as a permanent reprovider of that data.

Using a public IPFS gateway is one way to request IPFS-hosted content without revealing any information about your local node — because you aren’t using a local node.

1.5 Nodes

Participants in the IPFS network are called nodes. Protocol Labs manages two primary implementations of the IPFS spec: Go-IPFS and JS-IPFS. There are different types of IPFS node.

  • Preload
  • Relay
  • Bootstrap: A Bootstrap Node is a trusted peer on the IPFS network through which an IPFS node learns about other peers on the network.
  • Delegate routing