Skip to content

Benchmarks / Scaling requirements analysis for boostd-data state service and PieceDirectory #615

@nonsense

Description

@nonsense

We've done initial PoC work necessary to scale Boost beyond one instance that holds all markets/deals state in embedded datastores. The PR is at #573


Next steps in order to release / merge this work is to do an analysis and run benchmarks of the implementation. We need to define storage and retrieval requirements for various storage providers that would be running boost and make sure that the new service accommodates for them.

At the moment we have an upper-bound of about 40 TiB of deals a storage provider can onboard per day.

Therefore I think it would be a good idea to define the following parameters for different types of SPs:

  • small-size SP - total power of 1 PiB (2500 of 4000 SPs) ; 40 TiB * 25 days onboarding of new deals assuming 1 PiB of CC sectors (4Gbps connection / bandwidth)
  • mid-size SP - total power of 5 PiB (1000 of 4000 SPs) ; 200 TiB * 25 days onboarding of new deals assuming 5 PiB of CC sectors (20Gbps connection / bandwidth)
  • large-size SP - total power of 50 - 150 PiB (50 of 4000 SPs) ; 500 TiB * 200 days onboarding of new deals assuming 100 PiB of CC sectors (50Gbps connection / bandwidth)

sofiaminer current has 30TiB raw byte power and is hosting around 100 GiB worth of deals.

piecestore
108M  datastore

dagstore
104M  datastore
45M   index

total markets/boost state: 260MB for 100 GiB worth of deals

Extrapolating for:

  • small SP: 2.7 TB — keep backend as embedded datastore, such as LevelDB
  • mid SP: 13.6 TB — use Couchbase as backend or another key/value store that provides and implementation of the boostd-data service
  • large SP: 270 TB to 400 TB — use Couchbase as backend or another key/value store that provides and implementation of the boostd-data service

Open questions:

  1. How large would the state grow (i.e. PieceMeta / Store), in order to accommodate 1 / 5 / 100 PiB worth of deals? Are the numbers mentioned above ballpark accurate?
  2. Define requirements for retrieval in terms of latency / upper-bound for requests per second:
  • how many GiB worth of deal is a small, mid and large storage provider going to serve per day?
  • what is the access pattern for the PIece Meta / Store in this case?

Metadata

Metadata

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions