Skip to content
This repository was archived by the owner on Feb 12, 2024. It is now read-only.
This repository was archived by the owner on Feb 12, 2024. It is now read-only.

js-ipfs pinning performance #2197

Closed
Closed
@dirkmc

Description

@dirkmc

ipfs.add() performance degrades severely once the number of pins exceeds 8192

Background

Users can pin a file or a block to prevent it from being garbage collected.

The pinning module maintains two sets of pins:

  • direct: the CID of the block that is pinned
  • recursive: the CID of the root node of a DAG of blocks

These pin sets are stored in the block store with the following structure:

  • if the number of pins is less than 8192, create a node with
    • 256 links pointing to an empty block
    • a link pointing to each pinned block
  • if the number of pins is greater than 8192
    • distribute pins deterministically amongst 256 buckets
    • each bucket is a node with one or more pins, with the same structure described above (ie buckets with > 8192 pins distribute them into sub-buckets etc)

Performance

A pin set with less than 8192 pins is stored in a single DAG node. Once there are more than 8192 pins, they are distributed between 256 buckets, each with its own DAG node. Each time a new pin is added to the set, the distribution across the group of buckets is calculated and written to the block store. The distribution is deterministic, so in reality only one bucket changes each time a new pin is added.

For example, if we simplify and say there are 8 buckets, with 5 pins (A - E):
[] [D] [] [EA] [] [C] [] [B]
When we add pin F only one bucket changes:
[] [D] [] [EA] [] [C] [] [BF]

We can improve performance by adding a cache that remembers the structure of the pin sets, and only write nodes that change to the block store (instead of writing all nodes each time a pin is added or removed). This improves ipfs.add() performance dramatically once we exceed 8192 pins:

pinning-perf

Memory usage

The pinner uses fnv1a to distribute pins. fnv1a outputs a number (8 bytes) so if we also use it for cache keys each key will be 8 bytes. Each pin is represented by a DAG link pointing to the pinned CID. The DAG link has

  • a name (the empty string)
  • a size (number)
  • a cid
    • version (number)
    • codec (eg 'sha2-256')
    • multihash (the hash itself)
    • multibaseName (eg 'base58btc')

So rounding up, a DAG link requires about 128 bytes of memory. eg 10k pins requires about 1MB memory for the cache.
Note: Storing the DAGLink object (rather than just the CID as a Buffer) saves us having to re-create a lot of JavaScript Objects but uses about twice the memory. However this memory would need to be reserved anyway each time a pin is added.
Note: The cache is not used if there are less than 8192 pins

Command line

When invoking ipfs add from the command line, with the daemon running, we need to load the http api each time. This can be several times slower than the add operation itself, so we should look at optimizing it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    exp/wizardExtensive knowledge (implications, ramifications) requiredexplorationkind/supportA question or request for supportstatus/readyReady to be worked

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions