AppRpcProvider calculation is misleading at best in the naive case #7371

sambacha · 2023-09-24T16:42:59Z

RPC Performance Methodology is naive at best

This appears to provide a way to gauge RPC provider performance. However, there are issues with the methodology that do not translate into providing useful information regarding RPC provider performance.

It is not recommended for trying to approximate a Ethereum RPC provider's latency using ping time or any other RPC service as a measure of the actual performance of RPC services for the following reasons:

RPC services, are not designed as ICMP network testing services.
Many networks rate limit ICMP requests.
ICMP ping or traceroute traffic can be discarded or delayed en-route to RPC provider.
The termination point of the TCP/UDP session may not represent the full network path between a user and the service.
User requests may be served from locations other than the destination of the initial IP termination point.
Even a complete lack of response to ICMP traffic may not reflect any sort of issue with RPC service providers' performance.


    <------------->   Wider confidence interval
                      High variance and/or low sample size

         <--->   Narrower confidence interval
                 Low variance and/or high sample size

 |---------|---------|---------|---------|
-1%      -0.5%       0%      +0.5%      +1%

The way to shrink confidence intervals is by increasing the sample size.
The central limit theorem means that, even when we have high variance data, and even when that data is not normally distributed, as we take more and more samples, we'll be able to calculate a more and more precise estimate of the true mean of the data.


      <------------------------------->     n=50  X -10% X +10%
                <------------------>        n=100 ✔️ -10% X +10%
                    <----->                 n=200 ✔️ -10% ✔️ +10%

  |---------|---------|---------|---------| difference in runtime
-20%      -10%        0       +10%      +20%

n    = sample size
<--> = confidence interval for percent difference of mean runtimes
✔️    = resolved condition
X    = unresolved condition

In this example, by n=50 we are uncertain whether A is faster or slower than B by more than 10%.
By n=100 we have ruled out that B is faster than A by more than 10%, but we're still not sure if it's slower by more than 10%.
By n=200 we have also ruled out that B is slower than A by more than 10%, so we stop sampling.

Note that we still don't know which is absolutely faster, we just know that whatever the difference is, it is neither faster nor slower than 10% (and if we did want to know, we could add 0 to our conditions).

Fallback Logic

Consider adding a parameter like stallTimeout which provides a timeout cutoff (in ms). This is a parameter I use in web3-rpc-failover in addition to providing priority and a weight.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AppRpcProvider calculation is misleading at best in the naive case #7371

AppRpcProvider calculation is misleading at best in the naive case #7371

sambacha commented Sep 24, 2023 •

edited

Loading

AppRpcProvider calculation is misleading at best in the naive case #7371

AppRpcProvider calculation is misleading at best in the naive case #7371

Comments

sambacha commented Sep 24, 2023 • edited Loading

RPC Performance Methodology is naive at best

Fallback Logic

sambacha commented Sep 24, 2023 •

edited

Loading