Remove concatMap in lookupRoute to improve throughput #2977
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What I DO
In lookupRoute, translate route to Mono will lead into context propagation and additional operators.
When the number of routes increases, it becomes the bottleneck of throughput.
So the PR removes the concatMap in lookupRoute.
1. Background:
After the modification of [PR]#2884, I found that the throughput did not meet expectations.
But there's no suspicious on Spring Cloud Gateway; So turned attention to Reactor.
2. Flame Graph and Cause
By testing 2k routes that predicate is



asyncPredicate(s -> Mono.just(Boolean))
,we found the context propagation in MonoFilterWhen.onNext and operators in the concatMap function is the bottleneck.3. Change Details
So in lookupRoute, I hope to remove the concatMap and use filterWhen instead it.

For the delay error mentioned in the [PR]#427, filterWhen function doesn't return MonoError, so i think no exceptions and logs will be swallowed.
4.Throughput Improvement
After modification, the bottleneck is solved, not too much context propagation and operators.

I have tested the throughput before and after the modification under different routing quantities.

I used
wrk -t 1 -c 10 -d 10s http://localhost/test
on a 8 core MacBook Pro M1. Below is the result.From the test results, it can be seen that the modified throughput is significantly improved.