-
-
Notifications
You must be signed in to change notification settings - Fork 91
Remove Mustermann leaf matcher to improve performance #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit addresses a second issue identified and fixed by @kyleplump in #279: using a regular expression to split route and path strings is much slower than using a string ("/"). With this commit, operating speed is faster than 2.1, but start up time and memory usage are unchanged from the previous commit. Commit label: 2.2b |
This commit is the proof of concept for removing Mustermann. Basic functionality is implemented (see below). Performance is improved on all metrics. Some tests are failing, as expected at this point. But this proof of concept also comes with a serious reality check. DiscussionThe graphs below show the performance benefits of eliminating Mustermann:
The price of these increases, however, is features. This commit implements "segment-sized" dynamic variables, such as However, I now see from most of the currently failing tests that the specs do document more exotic usage patterns, not documented in the guides or README, that are available through the use of Mustermann. This includes, for example, dynamic variables embedded within a segment, such as: I'm not sure if I can duplicate all of these usage patterns using my current approach, but I want to do more thinking about this. This raises the question of what features we want to support? Are we willing to sacrifice some of the flexibility of Mustermann for better performance? These are our current options, as I see it:
I am happy to elaborate on my current approach, if anyone would like to help me think through alternatives. Unless the discussion here pushes me in a different directions, I plan to implement constraints next (via regular expressions), and URI decoding of all variable segments (interestingly, a core feature of Mustermann, but not exercised in the current test suite). Commit label: 2.2c |
@kyleplump @cllns @timriley I am tagging you all for a temperature check as to the direction I am going. No rush to respond. I will be traveling for nine days starting Tuesday, so I thought this would be a good time for reflection and conversation. It may also be a good time to broadcast this exploration to the wider community for feedback. I doubt I will do any coding while traveling, but I may be able to comment. It might be possible to add basic constraints before I go, but no promises. Constraints would have no effect on this benchmark, since the benchmark doesn't exercise them. Way Point No. 1Let's see where we stand so far. Here are three graphs comparing:
I will say again that r10k is not a great tool for reflecting real-world usage (at all), but it may be useful for gross comparisons. It is clear from these graphs that Hanami router in any configuration occupies a sweet spot in terms of performance and developer experience: developer experience on par with Rails, and performance approaching that of Roda. This is Luca's achievement and should be recognized and appreciated. 🙇 Rails routing performance isn't close on any measure. This is food for thought. Questions for the community
|
This commit adds basic constraints using regular expressions. There is a variation on regular expression constraints available with Mustermann that allows POSIX-like notations. I added this as a feature in the Not Implemented list. Let me know if this should be a stretch goal. At this point, we have essentially matched the feature set described in the Hanami router guide. However, I don't think we have full parity until we have URI decoding in place (a feature of Mustermann). Even though this is not mentioned in the guide, I see it as basic functionality. I will try to implement that next. As of now, only 16 tests are failing (down from 35 in the prior commit). Four of these are unit tests that I need to re-work, remove, or skip. The other 12 are "exotic" applications of Mustermann, like optional segments, multiple dynamic variables in a segment, and escaped characters. Benchmark performance is indistinguishable from 2.2c. Commit label: 2.2d
Next is fixing unit tests and considering skipping remaining failing tests as undocumented behavior that is not representative of current requirements. |
Thank you for your work on this, @dcr8898! It is exemplary as always. I'll get to looking at the code soon, but I did want to answer this question of yours:
If I look at the README overview for the Rails flavour of Mustermann, I'm interested in us supporting the features exercised in all three of the example routes: pattern = Mustermann.new('/:example', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => false
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => nil
pattern = Mustermann.new('/:example(/:optional)', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar", "optional" => nil }
pattern.params("/foo/bar") # => { "example" => "foo", "optional" => "bar" }
pattern = Mustermann.new('/*example', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => { "example" => "foo/bar" }
As for get "/guides/:org/:version/:slug", to: "guides.show"
get "/guides/:org/:version/:slug/*path", to: "guides.show" A route of If we can support these three things, then I think we'll be able to express most routes well enough. It also means we'll largely maintain compatibility with the routing support we've offered historically. I'm not sure if this throws a spanner in your works, but I'm keen to hear your response to the above, @dcr8898 😄 |
@timriley Thanks for the guidance! It's invaluable! Quick overview of the router's function:
To address your three feature requests out of order:
pattern = Mustermann.new('/:example', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => false
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => nil
pattern = Mustermann.new('/*example', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => { "example" => "foo/bar" }
pattern = Mustermann.new('/:example(/:optional)', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar", "optional" => nil }
pattern.params("/foo/bar") # => { "example" => "foo", "optional" => "bar" } TakeawaysMy takeaway, since two of your requests should be working now.
Further thoughtsThe current test suite includes what I've been calling "exotic" usages of Mustermann. For example, multiple dynamic variables in a segment or dynamic variables that are a portion of a segment. I believe that these usage patterns could be compatible with my current approach by . . . using Mustermann for these specific cases. We should be able to detect these "exotic" usages and only use Mustermann for the segments that are affected. I like this approach of matching power to demand. The majority of use cases would be unaffected. This was the primary performance problem of the 2.2 router: invoking the full power of Mustermann for all routes, when it is not needed for the most common use cases. |
I'm not sure what you mean by this bit:
The splat doesn't capture the leading / in ( |
This commit refactors tests to gain clarity before moving forward. At the start of the spike there were 37 skipped tests. Prior to this commit, 16 tests were failing and 41 were skipped (I skipped a few more unit tests along the way). This commit does the following:
Thankfully, CI failures should now be gone too. 🥳 Benchmark performance is indistinguishable from 2.2d. Commit label: 2.2e Next up is support for optional clauses. |
This commit implements optional route clauses. Nested optional clauses and consecutive optional clauses are supported. This is implemented by defining routes for every possible permutation of optional clauses. This concludes the basic features that @timriley requested, although some more polish is probably needed, like automatic URI decoding. This is a good place for another way point and a meaningful conversation about where to go from here, especially as pertains to Mustermann. However, before we do that, I am curious to pursue a few more performance tweaks. Benchmark performance is indistinguishable from 2.2e. Commit label: 2.2f Next up is some performance experiments. |
I have exhausted my ideas for performance improvements. I feel that this commit concludes this spike. I will declare a way point below and invite a discussion as to whether this spike should be pursued as an actual PR, and what further changes would be needed. ImprovementsThe changes I made here centered mostly around replacing calls to Next, I converted the "globbed route" and "optional route" detection in Router from a Regexp to a String method. However, this did not seem to have much effect. I find this a little odd, even though these types of routes are not exercised by this benchmark, because these checks are made for every route when compiling the Tries. In fact, it seems to me that the rps benchmarks may have been slightly lower after this change for the 10, 100, and 1000 route cases. Weird. Another change I attempted was to use When using A keen eye will show that RPS and Runtime are slightly improved from 2.2f. Commit label: 2.2g |
@timriley @cllns @kyleplump , once again I am tagging you to request your input. I believe the exploratory phase of this spike is over, and we should discuss whether to pursue this refactor of the router in earnest. Way Point No. 2Functionality:
Goals AchievedThis spike demonstrates that Mustermann is not needed for the majority of the Router's operation, and that its removal will actually improve performance. However, Mustermann usage is not gone completely: it is still used for all "globbed" routes and, more importantly, Mustermann objects are still used as "expanders" to generate relative and absolute paths for all named routes. This is an important observation: if the r10k benchmark used all named routes, as is common practice in the real world, then current start-up time would still be in the 10-second range for 10K routes. This would be true of the 2.1 router as well, and the 2.2b router would probably be in the 20-second range. Performance:
ReflectionsIf all routes were fixed, with no dynamic variables, then every router would use a Hash for lookup and enjoy O(1) performance. However, the presence of dynamic variables makes matching requests to routes much more complicated. Sinatra checks every defined route, one after another, until it finds a match. Rails must optimize this strategy to some extent because it performs better than Sinatra, but is still far slower than Hanami or Roda on every measure. This is the paradox of Mustermann: it is wonderfully powerful, but expensive to instantiate, and inherently limits the strategies that can be used to speed up routing. In creating the 2.0/2.1 router, Luca chose to chose to pursue performance by striking a compromise: using Tries to improve route matching performance while still using Mustermann to match dynamic segments and to provide path "expanders." This worked, but imposed significant start-up time requirements for the creation of Mustermann "patterns," as they are referred to within that library. The 2.2 implementation corrected a perceived bug in the operation of 2.1 by creating a Mustermann pattern for each "variable" route. This led to increased start-up time and now seems wasteful in light of the creation of another, separate pattern for every named route. The use of Tries in the 2.x routers works because it reduces the problem space. By focusing on segments in paths, much of the complexity baked into Musterman patterns is avoided and much greater performance is possible. But this has the side effect of restricting the feature set of the router. Some of these features could be re-implemented later, but they are likely to lead to reduced performance compared to the present state. This raises the question: how much performance is enough? I love that the Hanami 2.x routers offer a feature set and developer experience that rival that of Rails, while achieving performance comparable to Roda. However, Rails' poor router performance doesn't seem to be a vocal pain-point for Rails users. DiscussionWhere do we go from here? Below are the most plausible paths:
Next Steps on the Current PathIf we choose to convert this spike to earnest work, I would do the following:
I am eager to hear your opinions, and those of the community at large. It may be helpful to broadcast this conversation more widely for community input. Thank You!Thank you for letting me pursue this, and for your input along the way! |
I suspect this is because calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you so much for continuing work on this @dcr8898! This is a huge lift.
I made a couple tiny suggestions, but overall this is good with me, based on your description of the changes. I'm fine with removing support for features that were only documented via specs. If those features breaking is a problem for users, they can revert to an earlier version for now, and we can figure out how to add support for them, or provide a workaround.
I renamed this PR to be more accurate, since we're not actually removing all uses of Mustermann, just one important use of it.
Curious what @kyleplump and @timriley have to say as well!
lib/hanami/router.rb
Outdated
[ | ||
+EMPTY_STRING << match_data.pre_match << match_data[1] << match_data.post_match, | ||
+EMPTY_STRING << match_data.pre_match << match_data.post_match | ||
].each do |new_path| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating this intermediary array to iterate over it, could we repeat the conditional below for each new_path we create? Less DRY but possibly more performant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try this. It makes sense.
This point also highlights the difference between writing speed-sensitive library code and writing apps: apps are optimized for maintainability, while the library code is optimized for performance. That often requires abandoning a lot of code hygiene practices that we usually promote (like DRY in this case). Many of those good practices are based on indirection, like using constants to label magic values, but that indirection impacts performance. This was the basis of my question above: How much performance is enough? I'm not sure it's worth in-lining all of the constants in the Router, for example, but others might disagree.
It's also hard to know if these micro-optimizations have an effect when we are using gross measurements like r10k. With that said, I'm curious to try this one and see what happens. 🤓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a fresh look at this, since it only applies to optional routes, which are uncommon and not exercised by this benchmark at all, I wonder if this change is worthwhile. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Fast Ruby, interpolation should be slightly faster than the shovel method. Do you think this change is worth making?
[
-"#{match_data.pre_match}#{match_data[1]}#{match_data.post_match}",
-"#{match_data.pre_match}#{match_data.post_match}"
].each do |new_path|
This is what I was getting at with the "Question for All." I like searching for performance tweaks, and I think a router should be as performant as possible, but I agonize over the loss of readability and maintainability. In this case, I think switching to interpolation is probably okay, because the code remains understandable (I think). But I hesitate to un-DRY the code for what is a rare usage (optional clauses).
What do you all think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is honestly an insane amount of work @dcr8898, and all of the commendations are well deserved 🎉 💪
(made a few code level comments)
in terms of philosophy, I agree with a lot of what you brought up. I think about this problem from two points of view and I think you've landed at ideal solutions for both (in my opinion):
-
'Hanami in a bubble': if Hanami was the only framework in the world, I think the important metric to capture with this change is to make the common use case as fast as possible, while providing clear instructions on the 'exotic' cases. The 'common case' is something Tim defined, and you've achieved 'as fast as possible' so big win there ❤️ . for the 'edge / exotic' use cases I agree with what Sean said: we can provide workarounds / documentation on how to accomplish these cases, and even support them using a slower Mustermann based approach. all good on this point, 10/10
-
'The real world': unfortunately Hanami does not exist in a bubble and will be constantly compared to rails, whether that's something we collectively like or not. Since that comparison will be at the forefront of developers (especially newcomers) minds, we should honestly lean into the comparison. your observation here: "... offer a feature set and developer experience that rival that of Rails, while achieving performance comparable to Roda", I think is a huge selling point of this approach / the Hanami router generally. it's easy to get stuck optimizing against yourself, but it's important to remember the broader context in which the framework exists. your work widens the gap with Rails and I think makes this spike super appealing
in summary: 'LGTM 👍'
excellent as always @dcr8898 , I'm very much in favor of the change as long as @timriley and @cllns are
lib/hanami/router.rb
Outdated
[ | ||
+EMPTY_STRING << match_data.pre_match << match_data[1] << match_data.post_match, | ||
+EMPTY_STRING << match_data.pre_match << match_data.post_match | ||
].each do |new_path| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@cllns The fastest strategy I found was the one you suggested. I called it "SneakyArray." To keep it faithful to our use case, I still used It may be related to the fact that re-implementing return unless path[1..].split("/") do |segment|
node = node.get(segment, param_values)
break false if node.nil?
end This made the benchmark about 10% slower in the 10K route case. 🤔 |
Correction. If I inline the splitting exactly as above in However, when I do the same in Question for allIs this performance increase (about 2% with 10K routes, less with fewer routes) worth the increased complexity of this change? |
To answer your question in general: we should optimize performance for 10-100 routes, never 10k routes. The reason I had an issue with 10k routes in a previous implementation (what started this saga 😅 ) is that it was causing the startup time to be so high (~10 seconds) that it was essentially un-usable (since it would add so much time to starting the server + for integration tests). It makes sense that performance degrades at the upper limit, in terms of execution time. We need to be careful to not let it explode, but a small increase is fine. Sorry, I didn't realize you were already operating on a string. Since we use frozen string literals, we'll need to get copies instead of mutating the object anyway. Also, based on a quick benchmark, This code below is really hard for me to parse. Is there a way to simplify it? return unless split iterate break false. Could we just iterate over the segments and return if the node is nil?
|
With the above changes there is a noticeable performance increase--about 5% in the 10K route case. Memory use and start-up time are not noticeably affected. I don't know how far we want to go in pursuing further performance gains. This implementation is already markedly better than 2.1, 2.2, or 2.2b (which was the starting baseline for this exploration). Performance isn't the current goal, per se, but more of a baseline. But it is fun to see how far we can go. 🚀 The Big QuestionsIt seems like this is a viable path. Some "features" that were exercised in tests are not currently implemented, but I am confident most or all of them could be implemented in the future (although some them could impact performance). With that said:
Commit label: 2.2h |
Interesting! I will try it! 🤓 The docs say |
Hi @dcr8898, I just want to thank you so much for this fantastic work. Whatever direction we take, this has been a hugely informative investigation that I'm sure we'll refer back to in the future. This coming week I'm in the midst of talk writing (Baltic Ruby keynote coming up). This will take most of my focus. As soon as that's done I'll get onto responding to this PR :) |
Thank you! I agree, this has been an important learning journey, especially since Luca has stepped back and this was totally his creation. I look forward to whatever path we choose! It's all upwards from here for Hanami! 💮
No rush! I need to get back to looking for a job anyway! 🤓 |
Yeah, that's awful. 🙄 I'm going to chalk this up to tunnel-vision refactoring. I changed it to this: def find(path)
node = @root
param_values = []
path[1..].split(SEGMENT_SEPARATOR) do |segment|
node = node.get(segment, param_values)
break if node.nil?
end
node&.match(param_values)&.then { |found| [found.to, found.params] }
end Is that better? Any more indirection will reverse the performance gain I think. |
This seems like a good time to pause and wait for @timriley's input. I believe that the results so far show that this is a viable path. It's faster than using Mustermann because this approach shrinks the problem space. Mustermann is a powerful library for string pattern recognition, extraction, and expansion. However, our purpose is limited to path string parsing, not all string parsing. This is a smaller problem space. In this space, we can take advantage of the fact that path strings have a defined structure: segments, separated by I want to reiterate that we remain reliant on Mustermann for both globbed routes and all named route expanders (all or most routes in most apps), neither of which are exercised with this benchmark tool. This means our start-up time concerns remain unless we use an alternative for path string expansion. I think this is doable. The graphs below include 2.1 and 2.2b as reference points for our progress. Commit label: 2.2i |
Router Improvement
This will be a long-running, exploratory spike intended to remove the router's dependency on Mustermann and improve performance overall. Router performance is defined by two metrics: start up time and requests-per-second, both measured using the r10k benchmarking tool. I feel that this tool, r10k, is problematic for a number of reasons, but it is readily available and offers a general sense of performance changes as the spike progresses.
I invite any and all comments as this goes along. I am tagging the following contributors to request their input:
@kyleplump @cllns @timriley
Goals
Mustermann is a powerful string matching library that offers many features, most of which our router does not use. In fact, some of the work done by Mustermann is duplicated by the work the current router does in splitting route and path strings into segments in order to utilize trie data structures. Since Mustermann pattern objects are complex and resource intensive to create, eliminating this dependency should improve router startup time, performance in production, and reduced memory usage. I hope. 🙄
I believe that further improvements may be possible on these same metrics through a more general refactor of the router's operation. I would like to explore these ideas as well.
Checklists
Following are a series of checklists to guide progress on this spike. They should be considered "living lists" that will change during the process. Please feel free to edit the lists (or suggest edits) if they are not complete or detailed enough.
Current features
The Hanami Routing guide commits to supporting the following features (checked items have been re-implemented so far without Mustermann):
get "/books(/:id)"
.Features NOT implemented
These are undocumented--so far as the Hanami guides and the router README are concerned--features of the 2.1 and 2.2 routers that are powered by Mustermann and technically possible, but possibly can't be duplicated with 1-1 parity using my current approach.
"/test\\:variable"
"/test.:format"
.
character, like"/hey.:greed.html"
(captures:greed
).Edge cases
I am aware of the following edge cases that aren't specifically covered in the guides, but merit discussion. It would be good to establish explicit rules for them.
/books/:id
andbooks/:id/
as distinct, matchable routes (because it uses Mustermann to match the complete, original route definition).String#split
on path strings, which ignores the final character if it is the#split
parameter.Stretch goals
I believe these goals are possible with my current approach. I will implement them if there is consensus on their value.
#match?
, for example, and receive a string argument).Starting Point
The current 2.2 router introduced a performance regression compared to 2.1 (see #278 ). This commit (2.2a) corrects the core issue causing that regression, as identified and fixed by @kyleplump (see #279) and restores near parity to 2.1's speed, while increasing start up time (primarily due to Mustermann object creation during start up). This is our starting point, as shown in the following graphs.
Benchmark machine: AMD Ryzen 7 laptop with 24GB of RAM and no attempt to restrict background processes (in other words, YMMV).
Commit label: 2.2a
Test status: 544 examples, 0 failures, 37 pending
expand to see graphs