Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime_api subsystem queues requests internally and likely does redundant work #829

Open
eskimor opened this issue May 10, 2022 · 6 comments
Assignees

Comments

@eskimor
Copy link
Member

eskimor commented May 10, 2022

Use the queue Luke

Instead of imposing back pressure on the incoming channel. Conceptually it is rather pointless to internally queue requests unbounded, while receiving messages over a bounded channel. The subsystem at least files a warning when that internal buffer reaches a certain size, still backpressuring on the receiving channel would the very least help in making issues visible immediately.

We now have metrics for ToF of messages through a channel, we also monitor channel sizes and will soon have a metric for whenever a channel gets full (and thus imposes back pressure) - we would like this tooling to expose issues as well as possible, so for this reason alone we should fix that.

A similar issue exists for PVF execution, it also uses internal unbounded channels.

Be lazy

What we also noticed, that we will execute requests in parallel and have a cache. The issue here is, that we will be doing unnecessary work if identical requests arrive at roughly the same time: Because we will spawn each one of them, instead of only spawning one and then serving the others from the cache once that one succeeds.

What we should be doing, is keep track of the requests that are currently being executed and not spawn identical requests, while they are running, but instead just wait for the result hitting the cache and then serve all of them.

@drahnr
Copy link

drahnr commented May 12, 2022

The short term fix to get insight here, is to use the metered::* types internally as well. That would not solve the in principal issue, but at least shed some light.

@eskimor
Copy link
Member Author

eskimor commented May 12, 2022

I think that is hardly less work than the proper fix. From looking at the code, this should be a pretty trivial change.

@eskimor eskimor changed the title runtime_api subsystem queues requests internally runtime_api subsystem queues requests internally and likely does redundant work May 13, 2022
@eskimor
Copy link
Member Author

eskimor commented May 13, 2022

@sandreim once we have results on the cache misses we have because of this, please add them here so we get an idea how important this is. Also do we have data how much faster the cache is versus the execution? I am guessing the difference will be quite significantly, but do we know?

@sandreim sandreim self-assigned this May 13, 2022
@sandreim
Copy link
Contributor

sandreim commented May 17, 2022

Burned in paritytech/polkadot#5541 in Versi and data shows that there are indeed cache misses due to parallelizing same requests. ToF went up by 2.5x so and we need to move back to 4 parallel requests to see how much requests take on average to execute. With this data we can decide if it is worthy to prioritize the cache optimization sooner rather than later.

Screenshot 2022-05-17 at 18 49 30

Screenshot 2022-05-17 at 18 47 53

@sandreim
Copy link
Contributor

sandreim commented May 18, 2022

The runtime calls are on average very cheap as it can be seen in the picture below. I think we should not implement this cache improvement now, but keep it for later, when we are out of big opportunities for optimisation.

Screenshot 2022-05-18 at 12 05 19

@sandreim
Copy link
Contributor

Created #822 for the cache improvement.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
* base script

* basic script

* revert pnpm lock

* use the version of cli

* base script

* fixed scripts

* tabs to space

* updated cumulus

* renamed file

* fixed packages

* updated dependencies

* use ethereum

* update cumulus
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
* Add script for releasing polkadot branch

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

* Apply review suggestions

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

* Fix

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

Signed-off-by: koushiro <koushiro.cqx@gmail.com>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Status: To do
Development

No branches or pull requests

6 participants
@drahnr @eskimor @ordian @sandreim @the-right-joyce and others