Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexer-alt: try_for_each_spawned #20379

Merged
merged 3 commits into from
Nov 22, 2024
Merged

indexer-alt: try_for_each_spawned #20379

merged 3 commits into from
Nov 22, 2024

Conversation

amnn
Copy link
Member

@amnn amnn commented Nov 21, 2024

Description

Create a try_for_each_spawned which works similarly to try_for_each_concurrent but spawns a new tokio task for each unit of work that it schedules based on values being pulled from the stream (but no more than the limit set on the max concurrency.

This does introduce some slight overhead if we run it with a limit of 1 (i.e. no concurrency) but otherwise will give the tokio runtime an opportunity to schedule each unit of work on a different thread (assuming the runtime is multi-threaded).

The interface is almost a drop-in replacement for
try_for_each_concurrent but with the following slight changes:

  • The input stream does not need to be a Result itself.
  • The future and returned error type must have 'static lifetimes because they are being passed to the tokio runtime and we can't easily determine that they will not outlive the caller.

This addresses one of the main remaining differences between the existing indexer and sui-indexer-alt (sui-data-ingestion-core's worker pool works roughly like this new abstraction, but without the support for streams etc).

Test plan

New unit tests:

sui$ cargo nextest run -p sui-indexer-alt -- task::tests

Running the indexer locally also makes sure the system is still operable with the change.

Finally, I created some benchmarks in a repository to test that try_for_each_concurrent does not take advantage of parallelism and try_for_each_spawned does:

https://gist.github.com/amnn/6c6f198693d46d1f6d30bd7ef7be001d

Benchmarking results show that on a job counting the primes up to i for 2 <= i < 50,000, the concurrent and sequential implementations complete in 2.7s.

try_for_each_spawned without parallelism does the same work in 3.2s (slightly slower because of the overhead creating new tasks), and try_for_each_spawned with parallelism of 16 completes in about 400ms (an 8x improvement).

The most expensive individual task costs about 3ms to run on its own (counting the primes up to 50,000), which is roughly in line with the cost of processing.

The benchmarks also include an implementation that does the same thing as try_for_each_spawned but built out of existing stream extension libraries (StreamExt::map, and StreamExt::buffered) which have similar performance characteristics but don't have the same behaviour w.r.t. propagating panics and cancellations.


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

## Description

Create a `try_for_each_spawned` which works similarly to
`try_for_each_concurrent` but spawns a new tokio task for each unit of
work that it schedules based on values being pulled from the stream (but
no more than the `limit` set on the max concurrency.

This does introduce some slight overhead if we run it with a limit of
`1` (i.e. no concurrency) but otherwise will give the tokio runtime an
opportunity to schedule each unit of work on a different thread
(assuming the runtime is multi-threaded).

The interface is almost a drop-in replacement for
`try_for_each_concurrent` but with the following slight changes:

- The input stream does not need to be a `Result` itself.
- The future and returned error type must have `'static` lifetimes
  because they are being passed to the tokio runtime and we can't easily
  determine that they will not outlive the caller.

This addresses one of the main remaining differences between the
existing indexer and `sui-indexer-alt` (`sui-data-ingestion-core`'s
worker pool works roughly like this new abstraction, but without the
support for streams etc).

## Test plan

New unit tests:

```
sui$ cargo nextest run -p sui-indexer-alt -- task::tests
```

Running the indexer locally also makes sure the system is still operable
with the change.

Finally, I created some benchmarks in a repository to test that
`try_for_each_concurrent` does not take advantage of parallelism and
`try_for_each_spawned` does:

https://gist.github.com/amnn/6c6f198693d46d1f6d30bd7ef7be001d

Benchmarking results show that on a job counting the primes up to i for
2 <= i < 50,000, the concurrent and sequential implementations complete
in 2.7s.

`try_for_each_spawned` without parallelism does the same work in 3.2s
(slightly slower because of the overhead creating new tasks), and
`try_for_each_spawned` with parallelism of `16` completes in about 400ms
(an 8x improvement).

The most expensive individual task costs about 3ms to run on its own
(counting the primes up to 50,000), which is roughly in line with the
cost of processing.

The benchmarks also include an implementation that does the same thing
as `try_for_each_spawned` but built out of existing stream extension
libraries (`StreamExt::map`, and `StreamExt::buffered`) which have
similar performance characteristics but don't have the same behaviour
w.r.t. propagating panics and cancellations.
@amnn amnn self-assigned this Nov 21, 2024
Copy link

vercel bot commented Nov 21, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 22, 2024 0:16am
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 0:16am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 0:16am
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 0:16am

@amnn amnn merged commit 9fa49dc into main Nov 22, 2024
52 checks passed
@amnn amnn deleted the amnn/idx-for-each branch November 22, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants