Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query deduplication #285

Merged
merged 13 commits into from
Feb 16, 2022
Merged

Query deduplication #285

merged 13 commits into from
Feb 16, 2022

Conversation

Geal
Copy link
Contributor

@Geal Geal commented Dec 17, 2021

related:

This is an experiment around query deduplication. It is done crudely by having a mutex and a wait map in the subgraph fetcher (as is done in the cache), and hashing over a JSON serialization of the request (since Hash is not implemented on serde_json::Value).

In my local tests, using the products and reviews subgraphs from the perf test project, assigning 2 cores to the router, with 100 concurrent clients, I see:

  • main: 3500rps, both router cores at 100%, subgraphs using each 50% of a core
  • this PR: 6300rps, both router cores at 80%, subgraphs using each maximum 5% of a core

So even a naïve implementation gets a 1.8x performance boost, with a 10x reduction of CPU time in subgraphs.

Current issues:

  • this will deduplicate for all queries, even mutations, but we do not want to deduplicate mutations
  • this is a bit similar to caching, so we'll have to look closely at what we hash for deduplication, once we start looking at things like authorization headers
  • both cores are stuck at 80%, so I suspect some contention over the mutex artefact of the benchmarks I was doing, I can saturate the cores easily now
  • serializing the json value instead of a Hash implementation is a hack fixed with Store JSON strings in a Bytes instance #284

@Geal
Copy link
Contributor Author

Geal commented Dec 17, 2021

in the perf tests ( https://github.com/apollographql/router/pull/285/checks?check_run_id=4562777771 ):

  • main: 7032rps, subgraphs CPU usage going from 15-20% up to 50%
  • PR: 12977rps, subgraphs CPU usage under 20%

@Geal
Copy link
Contributor Author

Geal commented Jan 7, 2022

this should get faster once #284 is merged, because serde_json_bytes::Value implements Hash, so we'll get rid of this step:

https://github.com/apollographql/router/pull/285/files#diff-866ee3b4571238b6af1680a2392a6b50068c10efe0cc8d94770a57fd572e2564R99-R101

@Geal
Copy link
Contributor Author

Geal commented Jan 14, 2022

a few local benchmarks. The router has to cores, 300kreqs done by 1000 concurrent clients by "hey":

main 0146576 (using serde_json_bytes)

Summary:
Total: 83.7811 secs
Slowest: 3.8284 secs
Fastest: 0.0055 secs
Average: 0.2674 secs
Requests/sec: 3580.7585

Total data: 3576900000 bytes
Size/request: 11923 bytes

Response time histogram:
0.006 [1] |
0.388 [218905] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.770 [47952] |■■■■■■■■■
1.152 [23364] |■■■■
1.535 [7241] |■
1.917 [1878] |
2.299 [511] |
2.682 [123] |
3.064 [22] |
3.446 [1] |
3.828 [2] |

Latency distribution:
10% in 0.0495 secs
25% in 0.0593 secs
50% in 0.0729 secs
75% in 0.5265 secs
90% in 0.7868 secs
95% in 0.9182 secs
99% in 1.5053 secs

bc14a79 query deduplication without serde_json_bytes

Summary:
Total: 51.6889 secs
Slowest: 0.3400 secs
Fastest: 0.0068 secs
Average: 0.1715 secs
Requests/sec: 5803.9553

Total data: 3576900000 bytes
Size/request: 11923 bytes

Response time histogram:
0.007 [1] |
0.040 [105] |
0.073 [237] |
0.107 [628] |
0.140 [15595] |■■■
0.173 [88979] |■■■■■■■■■■■■■■■■■■■
0.207 [191595] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.240 [2526] |■
0.273 [27] |
0.307 [145] |
0.340 [162] |

Latency distribution:
10% in 0.1438 secs
25% in 0.1632 secs
50% in 0.1771 secs
75% in 0.1819 secs
90% in 0.1870 secs
95% in 0.1909 secs
99% in 0.2063 secs

1102ccf query deduplication with serde_json_bytes

Summary:
Total: 38.2763 secs
Slowest: 0.2781 secs
Fastest: 0.0059 secs
Average: 0.1270 secs
Requests/sec: 7837.7488

Total data: 3576900000 bytes
Size/request: 11923 bytes

Response time histogram:
0.006 [1] |
0.033 [207] |
0.060 [291] |
0.088 [753] |
0.115 [66459] |■■■■■■■■■■■■
0.142 [217544] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.169 [13379] |■■
0.196 [974] |
0.224 [202] |
0.251 [71] |
0.278 [119] |

Latency distribution:
10% in 0.1063 secs
25% in 0.1233 secs
50% in 0.1307 secs
75% in 0.1345 secs
90% in 0.1384 secs
95% in 0.1419 secs
99% in 0.1568 secs

90cd1de query deduplication with serde_json_bytes and using the Request as key in the wait map (couldn't do that before because serde_json::Value does not implement Hash)

Summary:
Total: 37.5145 secs
Slowest: 0.2403 secs
Fastest: 0.0061 secs
Average: 0.1244 secs
Requests/sec: 7996.9130

Total data: 3576900000 bytes
Size/request: 11923 bytes

Response time histogram:
0.006 [1] |
0.030 [159] |
0.053 [153] |
0.076 [429] |
0.100 [6998] |■
0.123 [71569] |■■■■■■■■■■■■■
0.147 [218928] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.170 [1631] |
0.193 [95] |
0.217 [29] |
0.240 [8] |

Latency distribution:
10% in 0.1044 secs
25% in 0.1208 secs
50% in 0.1284 secs
75% in 0.1320 secs
90% in 0.1355 secs
95% in 0.1380 secs
99% in 0.1441 secs

@Geal Geal marked this pull request as ready for review January 14, 2022 16:45
Copy link
Contributor

@cecton cecton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks familiar with the caching mechanism Gary made... 🤔 the difference is that we don't store in cache here but other than that we do queue identical. Am I wrong? Maybe with a bit of refactoring it would be possible to get more parts of the code in common.

@Geal
Copy link
Contributor Author

Geal commented Jan 19, 2022

you're right, I copied that code and removed the storage part 😁
Caching will require a bit more thought, especially about how to do invalidation, and where the data is stored. In the meantime, query deduplication is a cheap way to get a great performance boost, that can then evolve to a proper cache once we have a good design

@cecton
Copy link
Contributor

cecton commented Jan 19, 2022

Caching will require a bit more though

Ah!! But I didn't mean you should do caching! I meant we could generalize our implementation more so it supports this scenario (0 length cache?) too

Copy link
Contributor

@cecton cecton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so as I understand it's a temporary boost until we get also caching on this. Looks good!

Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those performance improvement figures are impressive.

Comment on lines 96 to 137
async fn dedup(
&self,
request: graphql::Request,
) -> Result<graphql::Response, graphql::FetchError> {
loop {
let mut locked_wait_map = self.wait_map.lock().await;
match locked_wait_map.get_mut(&request) {
Some(waiter) => {
// Register interest in key
let mut receiver = waiter.subscribe();
drop(locked_wait_map);

match receiver.recv().await {
Ok(value) => return value,
// there was an issue with the broadcast channel, retry fetching
Err(_) => continue,
}
}
None => {
let (tx, _rx) = broadcast::channel(1);
locked_wait_map.insert(request.clone(), tx.clone());
drop(locked_wait_map);

let res = self.fetch(request.clone()).await;

{
let mut locked_wait_map = self.wait_map.lock().await;
locked_wait_map.remove(&request);
}

// Let our waiters know
let broadcast_value = res.clone();
// Our use case is very specific, so we are sure that
// we won't get any errors here.
tokio::task::spawn_blocking(move || {
tx.send(broadcast_value)
.expect("there is always at least one receiver alive, the _rx guard; qed")
}).await
.expect("can only fail if the task is aborted or if the internal code panics, neither is possible here; qed");
return res;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely beginning to feel there is some kind of generic functionality which we can extract for common use. I might try and prototype something up...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that could be generalized

@Geal
Copy link
Contributor Author

Geal commented Jan 24, 2022

I'm waiting on apollographql/federation#1423 before I continue on this

not compiling for now, will be reimplemented as a layer
@o0Ignition0o
Copy link
Contributor

@Geal can you please ping me when i can review this ? 🙏

@Geal
Copy link
Contributor Author

Geal commented Feb 11, 2022

it's far from ready 😅

@o0Ignition0o
Copy link
Contributor

Ah I m going to mute notifications, i dunno why but github keeps asking me to review it 😂

@Geal Geal changed the base branch from main to geal-all-along-the-tower-service February 11, 2022 14:29
let broadcast_value = res
.as_ref()
.map(|response| response.clone())
.map_err(|e| e.to_string());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use BoxError all over the code (mainly because Buffer enforces BoxError), and BoxError is not Clone. But here we want to copy the response to all of the waiting queries, even if we got an error. Since we cannot access the underlying type, the best we can do is convert to a String and pass it around

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok as a String? Would it be better as a json Value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could serialize a JSON value to a string and pass it as error. Unfortunately, the only way to interact with std::error::Error is through strings: https://doc.rust-lang.org/nightly/std/error/trait.Error.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we convert to String and preserve the error as source()? If we did, would it deliver any value?
(I'm just trying to avoid lowest common denominator of error type, but I guess we don't have many options.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we discussed (with @BrynCooke I think?) using FetchError as the error type for all subgraph services and layers, and convert to BoxError just before passing it back to the Buffer layer, but deferred that to later as that would be a big change introduced by that PR

@@ -92,9 +92,19 @@ pub struct SubgraphRequest {
pub http_request: http_compat::Request<Request>,

pub context: Context,

pub operation_kind: OperationKind,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding the operation kind here for now, as I am not sure yet if it should be in the Context object: this is data that's only needed for this request and does not need to be shared with other request, and there's no place in the HTTP request to put it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's interesting, we have it deep in the query planner, I use a rough variant of this in the http get related PR but this doesn't fit your usecase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be in the query planner, and from there it's used in subgraph queries. So both use cases are linked

name.to_string(),
subgraph.routing_url.clone(),
let dedup_layer = QueryDeduplicationLayer;
let mut subgraph_service = BoxService::new(dedup_layer.layer(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the dedup layer here has less impact on the types than making the subgraph a BoxCloneService then adding the layer then boxing it again

@Geal Geal requested a review from garypen February 14, 2022 15:09
Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good generally. Some questions and when you have time to look at them, let me know how you want to proceed and I can approve the PR from there.

apollo-router-core/Cargo.toml Outdated Show resolved Hide resolved
let broadcast_value = res
.as_ref()
.map(|response| response.clone())
.map_err(|e| e.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok as a String? Would it be better as a json Value?

apollo-router-core/src/query_planner/mod.rs Outdated Show resolved Hide resolved
Comment on lines +73 to +77
// this assumes headers are in the same order
for (name, value) in self.inner.headers() {
name.hash(state);
value.hash(state);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike HeaderMap. We can't even sort it...

I suppose the downside of not spotting duplicates is that our query can't be de-duplicated. Not a bug, but defeats the purpose of de-duplicating.

We can do something a bit crafty here. HeaderValue does implement Ord and HeaderName always convert to &str, so:

        // Map Header names into &str so we can sort
        let mut tmp: Vec<(&str, &HeaderValue)> = self
            .inner
            .headers()
            .iter()
            .map(|(k, v)| (k.as_str(), v))
            .collect();
        tmp.sort();
        for (name, value) in tmp {
            name.hash(state);
            value.hash(state);
        }

would give us a consistent ordering for hashing purposes. I think we could do the same for Eq as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should refrain from sorting the headers, since their order can have an impact. Example: the Accept header can have multiple values, which can come either as comma separated in one header value, or as multiple separated Accept headers. In that second case, if we sort the headers, that might reorder the values and change the behaviour.

The assumption for the cache key here is that similar queries coming from the same client will have the same shape (same user agent, same list of headers in the same order...).

What we could do though is decide on which headers we consider for the cache. Once we do that, we would get stronger guarantees, and could expose in the docs the issues around ordering. Could we explore that in another PR though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only sorting our copy, the original headers are un-impacted. I agree that we can't touch the original headers, which we couldn't sort anyway since HeaderName doesn't implement Ord.

Does the original order of headers matter for hashing purposes? i.e.: don't we want the ordering to be consistent to help improve our de-duplication chances?

I'm fine with moving this discussion to a follow-up. I think it's important to make the comparison less fragile, but it doesn't need to be decided before merging to the tower branch. If you don't want to put my suggestion in, perhaps preserve it in the follow-up issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's definitely a good idea to revisit it and find a robust solution

apollo-router-core/src/services/mod.rs Outdated Show resolved Hide resolved
@Geal
Copy link
Contributor Author

Geal commented Feb 15, 2022

we will land this after #429

@o0Ignition0o
Copy link
Contributor

LGTM, I'd favor serde camelCase directives over "rename" but it's just a nit

@Geal
Copy link
Contributor Author

Geal commented Feb 15, 2022

alright, now that #429 is merged, I could remove the manual deserialization for OperationKind, this should be good to go

@Geal Geal changed the base branch from geal-all-along-the-tower-service to main February 15, 2022 14:23
@Geal Geal merged commit c562d84 into main Feb 16, 2022
@Geal Geal deleted the query-deduplication branch February 16, 2022 15:07
@abernix abernix mentioned this pull request Feb 18, 2022
@abernix abernix added this to the v0.1.0-alpha.6 milestone Feb 18, 2022
tinnou pushed a commit to Netflix-Skunkworks/router that referenced this pull request Oct 16, 2023
lennyburdette pushed a commit that referenced this pull request Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants