Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON-RPC: performance problem with chainHead_v1_storage queries using descendantValues #5589

Closed
josepot opened this issue Sep 4, 2024 · 3 comments · Fixed by #5741
Closed
Assignees
Labels
I3-annoyance The node behaves within expectations, however this “expected behaviour” itself is at issue. T3-RPC_API This PR/Issue is related to RPC APIs.

Comments

@josepot
Copy link

josepot commented Sep 4, 2024

We’ve encountered a performance issue when executing chainHead_v1_storage queries with the descendantValues option in the new JSON-RPC API.

When performing such a query, the RPC node sends an operationStorageItems notification containing only 5 items. This is immediately followed by a waitingForContinue notification. Upon receiving this, we immediately respond with a chainHead_v1_continue request, and this cycle repeats.

This results in certain queries taking an excessively long time to resolve. For example, requesting the descendant values of NominationPools.PoolMembers on the Polkadot relay-chain can take 6 to 10 minutes to complete using the new JSON-RPC API, while the same request takes only a few seconds with the legacy RPC API.

Expected Behavior

The node should efficiently return a larger number of items per operationStorageItems notification, especially when there is no sign of back-pressure (e.g., if the chainHead_v1_continue response is received promptly). Ideally, the node could send hundreds of items at once, and dynamically adjust the number of items sent based on the system's responsiveness.

Current Behavior

The node currently sends only 5 items per operationStorageItems notification (and always requesting a waitingForContinue notification), significantly slowing down the resolution of large storage queries.

Proposed Solution

  • Increase the number of items sent in each operationStorageItems notification, potentially sending several hundred at a time.
  • Adapt the number of items sent based on the system’s responsiveness (e.g., if the chainHead_v1_continue response is received quickly, send more items in the next notification).

Logs

slowOperationStorageItems.log

@github-actions github-actions bot added the I10-unconfirmed Issue might be valid, but it's not yet known. label Sep 4, 2024
@jsdw
Copy link
Contributor

jsdw commented Sep 4, 2024

Just to copy in my thought on approach too:

Given that we have backpressure we could also just never emit the waitingForContinue; the node could internally put storage messages into a queue, draining it as fast as the client can accept messages from it, and if the queue fills then the node won't internally try to fetch more storage values until it has spaces again. This would hopefully allow it to be pretty quick!

@josepot
Copy link
Author

josepot commented Sep 4, 2024

Just to copy in my thought on approach too:

Given that we have backpressure we could also just never emit the waitingForContinue; the node could internally put storage messages into a queue, draining it as fast as the client can accept messages from it, and if the queue fills then the node won't internally try to fetch more storage values until it has spaces again. This would hopefully allow it to be pretty quick!

works for me!!

@niklasad1 niklasad1 added I3-annoyance The node behaves within expectations, however this “expected behaviour” itself is at issue. T3-RPC_API This PR/Issue is related to RPC APIs. and removed I10-unconfirmed Issue might be valid, but it's not yet known. labels Sep 4, 2024
@niklasad1 niklasad1 self-assigned this Sep 4, 2024
@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-api-updates-thread/7685/15

github-merge-queue bot pushed a commit that referenced this issue Oct 3, 2024
Close #5589

This PR makes it possible for `rpc_v2::Storage::query_iter_paginated` to
be "backpressured" which is achieved by having a channel where the
result is sent back and when this channel is "full" we pause the
iteration.

The chainHead_follow has an internal channel which doesn't represent the
actual connection and that is set to a very small number (16). Recall
that the JSON-RPC server has a dedicate buffer for each connection by
default of 64.

#### Notes

- Because `archive_storage` also depends on
`rpc_v2::Storage::query_iter_paginated` I had to tweak the method to
support limits as well. The reason is that archive_storage won't get
backpressured properly because it's not an subscription. (it would much
easier if it would be a subscription in rpc v2 spec because nothing
against querying huge amount storage keys)
- `query_iter_paginated` doesn't necessarily return the storage "in
order" such as
- `query_iter_paginated(vec![("key1", hash), ("key2", value)], ...)`
could return them in arbitrary order because it's wrapped in
FuturesUnordered but I could change that if we want to process it
inorder (it's slower)
- there is technically no limit on the number of storage queries in each
`chainHead_v1_storage call` rather than the rpc max message limit which
10MB and only allowed to max 16 calls `chainHead_v1_x` concurrently
(this should be fine)

#### Benchmarks using subxt on localhost

- Iterate over 10 accounts on westend-dev -> ~2-3x faster 
- Fetch 1024 storage values (i.e, not descedant values) -> ~50x faster
- Fetch 1024 descendant values -> ~500x faster

The reason for this is because as Josep explained in the issue is that
one is only allowed query five storage items per call and clients has
make lots of calls to drive it forward..

---------

Co-authored-by: command-bot <>
Co-authored-by: James Wilson <james@jsdw.me>
niklasad1 added a commit that referenced this issue Oct 17, 2024
Close #5589

This PR makes it possible for `rpc_v2::Storage::query_iter_paginated` to
be "backpressured" which is achieved by having a channel where the
result is sent back and when this channel is "full" we pause the
iteration.

The chainHead_follow has an internal channel which doesn't represent the
actual connection and that is set to a very small number (16). Recall
that the JSON-RPC server has a dedicate buffer for each connection by
default of 64.

- Because `archive_storage` also depends on
`rpc_v2::Storage::query_iter_paginated` I had to tweak the method to
support limits as well. The reason is that archive_storage won't get
backpressured properly because it's not an subscription. (it would much
easier if it would be a subscription in rpc v2 spec because nothing
against querying huge amount storage keys)
- `query_iter_paginated` doesn't necessarily return the storage "in
order" such as
- `query_iter_paginated(vec![("key1", hash), ("key2", value)], ...)`
could return them in arbitrary order because it's wrapped in
FuturesUnordered but I could change that if we want to process it
inorder (it's slower)
- there is technically no limit on the number of storage queries in each
`chainHead_v1_storage call` rather than the rpc max message limit which
10MB and only allowed to max 16 calls `chainHead_v1_x` concurrently
(this should be fine)

- Iterate over 10 accounts on westend-dev -> ~2-3x faster
- Fetch 1024 storage values (i.e, not descedant values) -> ~50x faster
- Fetch 1024 descendant values -> ~500x faster

The reason for this is because as Josep explained in the issue is that
one is only allowed query five storage items per call and clients has
make lots of calls to drive it forward..

---------

Co-authored-by: command-bot <>
Co-authored-by: James Wilson <james@jsdw.me>
niklasad1 added a commit that referenced this issue Oct 18, 2024
Close #5589

This PR makes it possible for `rpc_v2::Storage::query_iter_paginated` to
be "backpressured" which is achieved by having a channel where the
result is sent back and when this channel is "full" we pause the
iteration.

The chainHead_follow has an internal channel which doesn't represent the
actual connection and that is set to a very small number (16). Recall
that the JSON-RPC server has a dedicate buffer for each connection by
default of 64.

- Because `archive_storage` also depends on
`rpc_v2::Storage::query_iter_paginated` I had to tweak the method to
support limits as well. The reason is that archive_storage won't get
backpressured properly because it's not an subscription. (it would much
easier if it would be a subscription in rpc v2 spec because nothing
against querying huge amount storage keys)
- `query_iter_paginated` doesn't necessarily return the storage "in
order" such as
- `query_iter_paginated(vec![("key1", hash), ("key2", value)], ...)`
could return them in arbitrary order because it's wrapped in
FuturesUnordered but I could change that if we want to process it
inorder (it's slower)
- there is technically no limit on the number of storage queries in each
`chainHead_v1_storage call` rather than the rpc max message limit which
10MB and only allowed to max 16 calls `chainHead_v1_x` concurrently
(this should be fine)

- Iterate over 10 accounts on westend-dev -> ~2-3x faster
- Fetch 1024 storage values (i.e, not descedant values) -> ~50x faster
- Fetch 1024 descendant values -> ~500x faster

The reason for this is because as Josep explained in the issue is that
one is only allowed query five storage items per call and clients has
make lots of calls to drive it forward..

---------

Co-authored-by: command-bot <>
Co-authored-by: James Wilson <james@jsdw.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I3-annoyance The node behaves within expectations, however this “expected behaviour” itself is at issue. T3-RPC_API This PR/Issue is related to RPC APIs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants