js-ipfs pinning performance #2197
Description
ipfs.add()
performance degrades severely once the number of pins exceeds 8192
Background
Users can pin a file or a block to prevent it from being garbage collected.
The pinning module maintains two sets of pins:
- direct: the CID of the block that is pinned
- recursive: the CID of the root node of a DAG of blocks
These pin sets are stored in the block store with the following structure:
- if the number of pins is less than
8192
, create a node with256
links pointing to an empty block- a link pointing to each pinned block
- if the number of pins is greater than
8192
- distribute pins deterministically amongst
256
buckets - each bucket is a node with one or more pins, with the same structure described above (ie buckets with
> 8192
pins distribute them into sub-buckets etc)
- distribute pins deterministically amongst
Performance
A pin set with less than 8192
pins is stored in a single DAG node. Once there are more than 8192
pins, they are distributed between 256
buckets, each with its own DAG node. Each time a new pin is added to the set, the distribution across the group of buckets is calculated and written to the block store. The distribution is deterministic, so in reality only one bucket changes each time a new pin is added.
For example, if we simplify and say there are 8 buckets, with 5 pins (A
- E
):
[] [D] [] [EA] [] [C] [] [B]
When we add pin F
only one bucket changes:
[] [D] [] [EA] [] [C] [] [BF]
We can improve performance by adding a cache that remembers the structure of the pin sets, and only write nodes that change to the block store (instead of writing all nodes each time a pin is added or removed). This improves ipfs.add()
performance dramatically once we exceed 8192
pins:
Memory usage
The pinner uses fnv1a to distribute pins. fnv1a outputs a number (8 bytes) so if we also use it for cache keys each key will be 8 bytes. Each pin is represented by a DAG link pointing to the pinned CID. The DAG link has
- a name (the empty string)
- a size (number)
- a cid
- version (number)
- codec (eg 'sha2-256')
- multihash (the hash itself)
- multibaseName (eg 'base58btc')
So rounding up, a DAG link requires about 128 bytes of memory. eg 10k pins requires about 1MB memory for the cache.
Note: Storing the DAGLink object (rather than just the CID as a Buffer) saves us having to re-create a lot of JavaScript Objects but uses about twice the memory. However this memory would need to be reserved anyway each time a pin is added.
Note: The cache is not used if there are less than 8192
pins
Command line
When invoking ipfs add
from the command line, with the daemon running, we need to load the http api each time. This can be several times slower than the add operation itself, so we should look at optimizing it.