Skip to content

Remove Mustermann leaf matcher to improve performance #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

dcr8898
Copy link
Contributor

@dcr8898 dcr8898 commented Mar 27, 2025

Router Improvement

This will be a long-running, exploratory spike intended to remove the router's dependency on Mustermann and improve performance overall. Router performance is defined by two metrics: start up time and requests-per-second, both measured using the r10k benchmarking tool. I feel that this tool, r10k, is problematic for a number of reasons, but it is readily available and offers a general sense of performance changes as the spike progresses.

I invite any and all comments as this goes along. I am tagging the following contributors to request their input:

@kyleplump @cllns @timriley

Goals

Mustermann is a powerful string matching library that offers many features, most of which our router does not use. In fact, some of the work done by Mustermann is duplicated by the work the current router does in splitting route and path strings into segments in order to utilize trie data structures. Since Mustermann pattern objects are complex and resource intensive to create, eliminating this dependency should improve router startup time, performance in production, and reduced memory usage. I hope. 🙄

I believe that further improvements may be possible on these same metrics through a more general refactor of the router's operation. I would like to explore these ideas as well.


Checklists

Following are a series of checklists to guide progress on this spike. They should be considered "living lists" that will change during the process. Please feel free to edit the lists (or suggest edits) if they are not complete or detailed enough.

Current features

The Hanami Routing guide commits to supporting the following features (checked items have been re-implemented so far without Mustermann):

  • Match segment-length dynamic variables without constraints.
  • Constraints on dynamic variables (expressed as regular expressions in the route definition).
  • Optional route clauses, like get "/books(/:id)".
  • Globbing & Catch all routes - this is unchanged from earlier versions (2.0 - 2.1). It is implemented through special handling in the router (uses Mustermann, but separate from routes stored in tries).

Features NOT implemented

These are undocumented--so far as the Hanami guides and the router README are concerned--features of the 2.1 and 2.2 routers that are powered by Mustermann and technically possible, but possibly can't be duplicated with 1-1 parity using my current approach.

  • Constraints using POSIX-like notation.
  • Escaped characters in route definition, like "/test\\:variable"
  • Sub-segment-length dynamic variables, like "/test.:format"
  • Sub-segment-length dynamic variables that are also delimited by . character, like "/hey.:greed.html" (captures :greed).

Edge cases

I am aware of the following edge cases that aren't specifically covered in the guides, but merit discussion. It would be good to establish explicit rules for them.

  • Trailing slashes
    • the 2.1 router did not handle trailing slashes correctly, leading to ambiguous results.
    • the 2.2 router does handle trailing slashes correctly and treats /books/:id and books/:id/ as distinct, matchable routes (because it uses Mustermann to match the complete, original route definition).
    • the 2.2c+ routers developed during this spike no longer handle trialing slashes correctly. This is because we use String#split on path strings, which ignores the final character if it is the #split parameter.

Stretch goals

I believe these goals are possible with my current approach. I will implement them if there is consensus on their value.

  • Conversion of URI-encoded values in path strings (not mentioned in the routes, but a basic necessity).
  • User defined constraint objects (must respond to #match?, for example, and receive a string argument).

Starting Point

The current 2.2 router introduced a performance regression compared to 2.1 (see #278 ). This commit (2.2a) corrects the core issue causing that regression, as identified and fixed by @kyleplump (see #279) and restores near parity to 2.1's speed, while increasing start up time (primarily due to Mustermann object creation during start up). This is our starting point, as shown in the following graphs.

Benchmark machine: AMD Ryzen 7 laptop with 24GB of RAM and no attempt to restrict background processes (in other words, YMMV).

Commit label: 2.2a
Test status: 544 examples, 0 failures, 37 pending

expand to see graphs

graph comparing 2.1 with 2.2a in requests per second. 2.2a is slightly slower as the number of defined routes increases

graph comparing 2.1 with 2.2a in start up time. 2.2a is much slower to start up as defined routes pass 1000

graph comparing 2.1 with 2.2a in memory usage. 2.2a uses increasingly higher memory in comparison as the number of defined routes increases.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Mar 27, 2025

This commit addresses a second issue identified and fixed by @kyleplump in #279: using a regular expression to split route and path strings is much slower than using a string ("/").

With this commit, operating speed is faster than 2.1, but start up time and memory usage are unchanged from the previous commit.

Commit label: 2.2b
Test status: 544 examples, 0 failures, 37 pending

expand to see graphs

graph comparing 2.1 with 2.2b. 2.2b is faster in all instances

graph comparing 2.1 with 2.2b. 2.2b is much slower to start up as defined routes pass 1000

graph comparing 2.1 with 2.2b. 2.2b uses increasingly higher memory in comparison as the number of defined routes increases.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Mar 29, 2025

This commit is the proof of concept for removing Mustermann. Basic functionality is implemented (see below). Performance is improved on all metrics. Some tests are failing, as expected at this point.

But this proof of concept also comes with a serious reality check.

Discussion

The graphs below show the performance benefits of eliminating Mustermann:

  • Requests per second is increased by about 10% across the board, now plainly outpacing 2.1.
  • Start up time is now half that of 2.1, and a seventh of 2.2a.
  • Memory usage is virtually identical to 2.1

The price of these increases, however, is features. This commit implements "segment-sized" dynamic variables, such as :id in "/books/:id/new". It does not implement constraints. While this commit probably covers 80% (or more) of the common usage of the router, it is not yet on 1-1 parity with the 2.2a router (as documented in the Hanami guides).

However, I now see from most of the currently failing tests that the specs do document more exotic usage patterns, not documented in the guides or README, that are available through the use of Mustermann. This includes, for example, dynamic variables embedded within a segment, such as: "/hey.:greed.html" (captures :greed). While these usage patterns are not discussed in the guides, their presence in the specs is evidence that they may be in use in the wild. Removing them could be considered a breaking change.

I'm not sure if I can duplicate all of these usage patterns using my current approach, but I want to do more thinking about this.

This raises the question of what features we want to support? Are we willing to sacrifice some of the flexibility of Mustermann for better performance?

These are our current options, as I see it:

  • 2.1 router - fast but not "correct" (unless we define correct differently--always an option).
  • 2.2b router - correct and fast, except for startup time.
  • 2.2c+ router - fast and correct, based on a smaller feature set (my current strategy).
  • Something else?

I am happy to elaborate on my current approach, if anyone would like to help me think through alternatives.

Unless the discussion here pushes me in a different directions, I plan to implement constraints next (via regular expressions), and URI decoding of all variable segments (interestingly, a core feature of Mustermann, but not exercised in the current test suite).

Commit label: 2.2c
Test status: 544 examples, 35 failures, 41 pending

expand to see graphs

graph comparing 2.1 with 2.2c in requests per second for up to 10000 defined routes. 2.2c is faster in all cases

graph comparing 2.1 with 2.2c in start up time. 2.2c is faster in all cases.

graph comparing 2.1 with 2.2c in memory usage. 2.2c uses almost identical memory in all cases.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Mar 29, 2025

@kyleplump @cllns @timriley I am tagging you all for a temperature check as to the direction I am going. No rush to respond. I will be traveling for nine days starting Tuesday, so I thought this would be a good time for reflection and conversation. It may also be a good time to broadcast this exploration to the wider community for feedback.

I doubt I will do any coding while traveling, but I may be able to comment. It might be possible to add basic constraints before I go, but no promises. Constraints would have no effect on this benchmark, since the benchmark doesn't exercise them.

Way Point No. 1

Let's see where we stand so far. Here are three graphs comparing:

  • Hanami 2.1
  • Hanami 2.2c
  • Roda 3.90.0 (current)
  • Rails 8.0.2 (current)

I will say again that r10k is not a great tool for reflecting real-world usage (at all), but it may be useful for gross comparisons.

It is clear from these graphs that Hanami router in any configuration occupies a sweet spot in terms of performance and developer experience: developer experience on par with Rails, and performance approaching that of Roda. This is Luca's achievement and should be recognized and appreciated. 🙇

Rails routing performance isn't close on any measure. This is food for thought.

Questions for the community

  • What are the features we need for the Hanami router to be successful?
  • What are our performance goals for Hanami router?
expand to see graphs

comparing hanami 2.1, hanami 2.2c, roda, and rails based on requests per second

comparing hanami 2.1, hanami 2.2c, roda, and rails based on start up time

comparing hanami 2.1, hanami 2.2c, roda, and rails based on memory usage

@dcr8898
Copy link
Contributor Author

dcr8898 commented Mar 30, 2025

This commit adds basic constraints using regular expressions. There is a variation on regular expression constraints available with Mustermann that allows POSIX-like notations. I added this as a feature in the Not Implemented list. Let me know if this should be a stretch goal.

At this point, we have essentially matched the feature set described in the Hanami router guide. However, I don't think we have full parity until we have URI decoding in place (a feature of Mustermann). Even though this is not mentioned in the guide, I see it as basic functionality. I will try to implement that next.

As of now, only 16 tests are failing (down from 35 in the prior commit). Four of these are unit tests that I need to re-work, remove, or skip. The other 12 are "exotic" applications of Mustermann, like optional segments, multiple dynamic variables in a segment, and escaped characters.

Benchmark performance is indistinguishable from 2.2c.

Commit label: 2.2d
Test status: 544 examples, 16 failures, 41 pending

expand to see graphs

graph comparing 2.1 with 2.2d in requests per second for up to 10000 defined routes. 2.2d is faster in all cases

graph comparing 2.1 with 2.2d in start up time. 2.2d is faster in all cases.

graph comparing 2.1 with 2.2d in memory usage. 2.2d uses almost identical memory in all cases.

Next up is URI decoding.

Next is fixing unit tests and considering skipping remaining failing tests as undocumented behavior that is not representative of current requirements.

@timriley
Copy link
Member

Thank you for your work on this, @dcr8898! It is exemplary as always.

I'll get to looking at the code soon, but I did want to answer this question of yours:

This raises the question of what features we want to support? Are we willing to sacrifice some of the flexibility of Mustermann for better performance?

If I look at the README overview for the Rails flavour of Mustermann, I'm interested in us supporting the features exercised in all three of the example routes:

pattern = Mustermann.new('/:example', type: :rails)
pattern === "/foo.bar"     # => true
pattern === "/foo/bar"     # => false
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => nil

pattern = Mustermann.new('/:example(/:optional)', type: :rails)
pattern === "/foo.bar"     # => true
pattern === "/foo/bar"     # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar", "optional" => nil   }
pattern.params("/foo/bar") # => { "example" => "foo",     "optional" => "bar" }

pattern = Mustermann.new('/*example', type: :rails)
pattern === "/foo.bar"     # => true
pattern === "/foo/bar"     # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => { "example" => "foo/bar" }

:name (capture everything except /) is a mainstay in routing and it's obvious we want that.

*name (capture everything including /) is an important escape hatch and a helpful counterpart to :name.

As for (/:optional), I personally wished for that when building the new Hanami site earlier this year, but I think(?) we lost it in the move to the Hanami-router 2.0 release, so I had to define duplicate routes to handle this case:

get "/guides/:org/:version/:slug", to: "guides.show"
get "/guides/:org/:version/:slug/*path", to: "guides.show"

A route of /guides/:org/:version/:slug(/*path) would have felt more elegant and certainly have been pleasing.

If we can support these three things, then I think we'll be able to express most routes well enough.

It also means we'll largely maintain compatibility with the routing support we've offered historically.

I'm not sure if this throws a spanner in your works, but I'm keen to hear your response to the above, @dcr8898 😄

@dcr8898
Copy link
Contributor Author

dcr8898 commented Mar 31, 2025

@timriley Thanks for the guidance! It's invaluable!

Quick overview of the router's function:

  • The Router object holds several Tries.
  • Each Trie is made up of some number of Nodes.
  • Any Node that is an endpoint holds a leaf for each route that ends there (usually one leaf, but more are possible).

To address your three feature requests out of order:

  • Typical usage. This is good to go now. 👍 This is implemented in the Tries and their child objects.
pattern = Mustermann.new('/:example', type: :rails)
pattern === "/foo.bar"     # => true
pattern === "/foo/bar"     # => false
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => nil
  • Globbing and catch-all routes. This is described in the guides. This is implemented in the Router object. I have not looked at or changed this in any way from 2.1, so I believe this should also work now. 👍
pattern = Mustermann.new('/*example', type: :rails)
pattern === "/foo.bar"     # => true
pattern === "/foo/bar"     # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => { "example" => "foo/bar" }
  • Optional route parts. This is not mentioned in the guides, but may have been partially doable in 2.1 as long as the optional part was within a route string segment and included a captured parameter. This may have worked because each segment with a dynamic variable defined within it got it's own Mustermann matcher. However, optional parts that included a forward slash or spanned two segments may have broken the 2.1 router (I think).

    In any event, I think optional parts are totally doable by automating your "duplicate route" strategy right in the Router object (that's been my plan). However, defining routes for all permutations of optional parts could get crazy if there are multiple optional parts or nested optional parts. I suppose we have to allow for that and advise users to use sparingly. 🤔

pattern = Mustermann.new('/:example(/:optional)', type: :rails)
pattern === "/foo.bar"     # => true
pattern === "/foo/bar"     # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar", "optional" => nil   }
pattern.params("/foo/bar") # => { "example" => "foo",     "optional" => "bar" }

Takeaways

My takeaway, since two of your requests should be working now.

  • Move optional parts from stretch goals to main goals.

Further thoughts

The current test suite includes what I've been calling "exotic" usages of Mustermann. For example, multiple dynamic variables in a segment or dynamic variables that are a portion of a segment. I believe that these usage patterns could be compatible with my current approach by . . . using Mustermann for these specific cases. We should be able to detect these "exotic" usages and only use Mustermann for the segments that are affected.

I like this approach of matching power to demand. The majority of use cases would be unaffected. This was the primary performance problem of the 2.2 router: invoking the full power of Mustermann for all routes, when it is not needed for the most common use cases.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Mar 31, 2025

@timriley

I'm not sure what you mean by this bit:

*name (capture everything including /) is an important escape hatch and a helpful counterpart to :name.

The splat doesn't capture the leading / in ("/*name"), but it will capture any character after that, including any further /`s.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 16, 2025

This commit refactors tests to gain clarity before moving forward.

At the start of the spike there were 37 skipped tests.

Prior to this commit, 16 tests were failing and 41 were skipped (I skipped a few more unit tests along the way).

This commit does the following:

  • Refactor unit tests for Trie, Node, and Leaf classess. Tests now match interface changes made during the spike. I started by un-skipping any tests I had skipped during the spike, then some tests were added and some updated or removed. (Wish I had TDDed this. 🫤 ) This refactor also led to two small changes in the Node and Leaf classes.
  • Marked the remaining failing tests (12) as skipped, with a FIXME: not supported in 2.2.1 comment on each. These tests all exercise presently unsupported behavior, such as escaped characters in route definitions, or sub-segment level dynamic variables (see my comments on "exotic" Mustermann usages above). These features are possible, but are not supported at this point in the spike.

Thankfully, CI failures should now be gone too. 🥳

Benchmark performance is indistinguishable from 2.2d.

Commit label: 2.2e
Test status: 549 examples, 0 failures, 49 pending

Next up is support for optional clauses.

expand to see graphs

graph comparing 2.1 with 2.2e in requests per second for up to 10000 defined routes. 2.2e is faster in all cases

graph comparing 2.1 with 2.2e in start up time. 2.2e is faster in all cases.

graph comparing 2.1 with 2.2e in memory usage. 2.2e uses almost identical memory in all cases.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 18, 2025

This commit implements optional route clauses. Nested optional clauses and consecutive optional clauses are supported. This is implemented by defining routes for every possible permutation of optional clauses.

This concludes the basic features that @timriley requested, although some more polish is probably needed, like automatic URI decoding. This is a good place for another way point and a meaningful conversation about where to go from here, especially as pertains to Mustermann. However, before we do that, I am curious to pursue a few more performance tweaks.

Benchmark performance is indistinguishable from 2.2e.

Commit label: 2.2f
Test status: 555 examples, 0 failures, 46 pending

Next up is some performance experiments.

expand to see graphs

graph comparing 2.1 with 2.2f in requests per second for up to 10000 defined routes. 2.2f is faster in all cases

graph comparing 2.1 with 2.2f in start up time. 2.2f is faster in all cases.

graph comparing 2.1 with 2.2f in memory usage. 2.2f uses almost identical memory in all cases.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 24, 2025

I have exhausted my ideas for performance improvements. I feel that this commit concludes this spike. I will declare a way point below and invite a discussion as to whether this spike should be pursued as an actual PR, and what further changes would be needed.

Improvements

The changes I made here centered mostly around replacing calls to Regexp#match? with String#inlcude?, since the String method is demonstrably faster. Making this change to detect "variable" routes in Router, and segments in Node, gave a noticeable (but small) boost to start-up time and requests per second. I also removed some minor indirection in Node (an unnecessary local variable).

Next, I converted the "globbed route" and "optional route" detection in Router from a Regexp to a String method. However, this did not seem to have much effect. I find this a little odd, even though these types of routes are not exercised by this benchmark, because these checks are made for every route when compiling the Tries. In fact, it seems to me that the rps benchmarks may have been slightly lower after this change for the 10, 100, and 1000 route cases. Weird.

Another change I attempted was to use String#split with a block directly, instead of String#split#each. My tests show that passing a block directly to #split is far faster than tacking on a call to #each. However, the router was unequivocally slower with this implementation. I don't know why.

When using #each, we throw away the first segment because it is always empty (since path strings always start with /). When using #split directly, I operated on the substring path[1..] to achieve the same effect. Again, tests show that this should not make a difference speed-wise, but the benchmark showed differently and I abandoned this change completely.

A keen eye will show that RPS and Runtime are slightly improved from 2.2f.

Commit label: 2.2g
Test status: 555 examples, 0 failures, 46 pending

expand to see graphs

graph comparing 2.1 with 2.2g in requests per second for up to 10000 defined routes. 2.2g is faster in all cases

graph comparing 2.1 with 2.2g in start up time. 2.2g is faster in all cases.

graph comparing 2.1 with 2.2g in memory usage. 2.2g uses almost identical memory in all cases.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 24, 2025

@timriley @cllns @kyleplump , once again I am tagging you to request your input.

I believe the exploratory phase of this spike is over, and we should discuss whether to pursue this refactor of the router in earnest.

Way Point No. 2

Functionality:

  • All of the features requested by @timriley are implemented.
  • All features documented in the Hanami routing guide are implemented (let me know if this is incorrect).
  • All tests are passing.
  • Nine tests that were previously passing have been skipped as not representative of our documented feature set (these tests would otherwise fail at present).

Goals Achieved

This spike demonstrates that Mustermann is not needed for the majority of the Router's operation, and that its removal will actually improve performance.

However, Mustermann usage is not gone completely: it is still used for all "globbed" routes and, more importantly, Mustermann objects are still used as "expanders" to generate relative and absolute paths for all named routes. This is an important observation: if the r10k benchmark used all named routes, as is common practice in the real world, then current start-up time would still be in the 10-second range for 10K routes. This would be true of the 2.1 router as well, and the 2.2b router would probably be in the 20-second range.

Performance:

  • Requests per second is improved in every scenario tested by this benchmark tool, and is close to that of Roda.
  • Start-up time is dramatically improved from 2.2, and is also better than 2.1.
  • Memory usage is unchanged.

Reflections

If all routes were fixed, with no dynamic variables, then every router would use a Hash for lookup and enjoy O(1) performance. However, the presence of dynamic variables makes matching requests to routes much more complicated. Sinatra checks every defined route, one after another, until it finds a match. Rails must optimize this strategy to some extent because it performs better than Sinatra, but is still far slower than Hanami or Roda on every measure.

This is the paradox of Mustermann: it is wonderfully powerful, but expensive to instantiate, and inherently limits the strategies that can be used to speed up routing.

In creating the 2.0/2.1 router, Luca chose to chose to pursue performance by striking a compromise: using Tries to improve route matching performance while still using Mustermann to match dynamic segments and to provide path "expanders." This worked, but imposed significant start-up time requirements for the creation of Mustermann "patterns," as they are referred to within that library.

The 2.2 implementation corrected a perceived bug in the operation of 2.1 by creating a Mustermann pattern for each "variable" route. This led to increased start-up time and now seems wasteful in light of the creation of another, separate pattern for every named route.

The use of Tries in the 2.x routers works because it reduces the problem space. By focusing on segments in paths, much of the complexity baked into Musterman patterns is avoided and much greater performance is possible. But this has the side effect of restricting the feature set of the router. Some of these features could be re-implemented later, but they are likely to lead to reduced performance compared to the present state.

This raises the question: how much performance is enough?

I love that the Hanami 2.x routers offer a feature set and developer experience that rival that of Rails, while achieving performance comparable to Roda. However, Rails' poor router performance doesn't seem to be a vocal pain-point for Rails users.

Discussion

Where do we go from here? Below are the most plausible paths:

  • Revert to 2.1. This is possible if we define the "bug" corrected in 2.2 as accepted behavior.
  • Stay with 2.2b. This is the 2.2 router with the performance regression corrected. This router could be improved by correcting the unnecessary creation of duplicate Mustermann patterns.
  • Removing Mustermann, as explored in this spike. I will discuss my ideas for pursuing this path further below.
  • Something else? I'm all ears. 🤓

Next Steps on the Current Path

If we choose to convert this spike to earnest work, I would do the following:

  • Keep Mustermann patterns for globbed routes--for now. These routes will always be special cases handled outside of the Tries. For now I would keep Mustermann for this. This would have the strange effect of allowing globbed routes to do things that non-globbed routes cannot do, such as, GET "/test/*glob/:v1-:v2-:v3" (this would return parameters for glob, v1, v2, and v3). It is possible to write our own parser for globbed routes, but they would still need to be handled outside of the Tries.
  • Likewise, I would keep Mustermann for creating named route expanders for now. However, it is not necessary to use a Mustermann pattern for this. Mustermann offers a simpler Expander object for this single purpose that is likely less expensive to instantiate. (EDIT: it is not simpler or faster 😞) If this still proved to be untenable due to start-up times, we could generate our own simple expanders pretty easily. I only lean towards Mustermann expanders because of the edge case presented by expanding routes with optional clauses.
  • Finally, I would make some changes to the router's data structure. I believe it can be simplified in a way that should reduce start-up time and memory usage, and might improve performance slightly (I'm not sure).

I am eager to hear your opinions, and those of the community at large. It may be helpful to broadcast this conversation more widely for community input.

Thank You!

Thank you for letting me pursue this, and for your input along the way!

expand to see graphs comparing 2.2g with 2.1, Roda, and Rails

comparing hanami 2.1, hanami 2.2g, roda, and rails based on requests per second

comparing hanami 2.1, hanami 2.2g, roda, and rails based on start up time

comparing hanami 2.1, hanami 2.2g, roda, and rails based on memory usage

@cllns
Copy link
Member

cllns commented Apr 24, 2025

Another change I attempted was to use String#split with a block directly, instead of String#split#each. My tests show that passing a block directly to #split is far faster than tacking on a call to #each. However, the router was unequivocally slower with this implementation. I don't know why.

When using #each, we throw away the first segment because it is always empty (since path strings always start with /). When using #split directly, I operated on the substring path[1..] to achieve the same effect. Again, tests show that this should not make a difference speed-wise, but the benchmark showed differently and I abandoned this change completely.

I suspect this is because calling path[1..] creates a new array, so it negates the optimization of calling split without each that avoids an allocation. I'm not sure where that's used but maybe you could do path.shift immediately after creation, or even perhaps shift the path string itself before it's split, so the leading slash isn't included?

@cllns cllns changed the title remove Mustermann and improve performance Remove Mustermann leaf matcher to improve performance Apr 24, 2025
Copy link
Member

@cllns cllns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thank you so much for continuing work on this @dcr8898! This is a huge lift.

I made a couple tiny suggestions, but overall this is good with me, based on your description of the changes. I'm fine with removing support for features that were only documented via specs. If those features breaking is a problem for users, they can revert to an earlier version for now, and we can figure out how to add support for them, or provide a workaround.

I renamed this PR to be more accurate, since we're not actually removing all uses of Mustermann, just one important use of it.

Curious what @kyleplump and @timriley have to say as well!

[
+EMPTY_STRING << match_data.pre_match << match_data[1] << match_data.post_match,
+EMPTY_STRING << match_data.pre_match << match_data.post_match
].each do |new_path|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating this intermediary array to iterate over it, could we repeat the conditional below for each new_path we create? Less DRY but possibly more performant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this. It makes sense.

This point also highlights the difference between writing speed-sensitive library code and writing apps: apps are optimized for maintainability, while the library code is optimized for performance. That often requires abandoning a lot of code hygiene practices that we usually promote (like DRY in this case). Many of those good practices are based on indirection, like using constants to label magic values, but that indirection impacts performance. This was the basis of my question above: How much performance is enough? I'm not sure it's worth in-lining all of the constants in the Router, for example, but others might disagree.

It's also hard to know if these micro-optimizations have an effect when we are using gross measurements like r10k. With that said, I'm curious to try this one and see what happens. 🤓

Copy link
Contributor Author

@dcr8898 dcr8898 Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a fresh look at this, since it only applies to optional routes, which are uncommon and not exercised by this benchmark at all, I wonder if this change is worthwhile. Thoughts?

Copy link
Contributor Author

@dcr8898 dcr8898 Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Fast Ruby, interpolation should be slightly faster than the shovel method. Do you think this change is worth making?

        [
          -"#{match_data.pre_match}#{match_data[1]}#{match_data.post_match}",
          -"#{match_data.pre_match}#{match_data.post_match}"
        ].each do |new_path|

This is what I was getting at with the "Question for All." I like searching for performance tweaks, and I think a router should be as performant as possible, but I agonize over the loss of readability and maintainability. In this case, I think switching to interpolation is probably okay, because the code remains understandable (I think). But I hesitate to un-DRY the code for what is a rare usage (optional clauses).

What do you all think?

Copy link
Member

@kyleplump kyleplump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is honestly an insane amount of work @dcr8898, and all of the commendations are well deserved 🎉 💪


(made a few code level comments)


in terms of philosophy, I agree with a lot of what you brought up. I think about this problem from two points of view and I think you've landed at ideal solutions for both (in my opinion):

  1. 'Hanami in a bubble': if Hanami was the only framework in the world, I think the important metric to capture with this change is to make the common use case as fast as possible, while providing clear instructions on the 'exotic' cases. The 'common case' is something Tim defined, and you've achieved 'as fast as possible' so big win there ❤️ . for the 'edge / exotic' use cases I agree with what Sean said: we can provide workarounds / documentation on how to accomplish these cases, and even support them using a slower Mustermann based approach. all good on this point, 10/10

  2. 'The real world': unfortunately Hanami does not exist in a bubble and will be constantly compared to rails, whether that's something we collectively like or not. Since that comparison will be at the forefront of developers (especially newcomers) minds, we should honestly lean into the comparison. your observation here: "... offer a feature set and developer experience that rival that of Rails, while achieving performance comparable to Roda", I think is a huge selling point of this approach / the Hanami router generally. it's easy to get stuck optimizing against yourself, but it's important to remember the broader context in which the framework exists. your work widens the gap with Rails and I think makes this spike super appealing


in summary: 'LGTM 👍'

excellent as always @dcr8898 , I'm very much in favor of the change as long as @timriley and @cllns are

[
+EMPTY_STRING << match_data.pre_match << match_data[1] << match_data.post_match,
+EMPTY_STRING << match_data.pre_match << match_data.post_match
].each do |new_path|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 25, 2025

I suspect this is because calling path[1..] creates a new array, so it negates the optimization of calling split without each that avoids an allocation. I'm not sure where that's used but maybe you could do path.shift immediately after creation, or even perhaps shift the path string itself before it's split, so the leading slash isn't included?

@cllns path[1..] here is operating on and returning a string. You can find the benchmark code I used for the different strategies here. The "ArraySegmenter" (what we are using) uses #split and throws away the first array member. (Do you think this effectively creates two arrays?)

The fastest strategy I found was the one you suggested. I called it "SneakyArray." To keep it faithful to our use case, I still used string[1..] before calling #split. It is decisively faster in the benchmark, but was noticeably slower when I tried it here. It's a bit confounding. 🧐

It may be related to the fact that re-implementing Trie#find in this way requires us to manually implement the functionality of #all? within the block passed to #split. I did this by calling break false if node.get returned nil on any segment. Something like this:

return unless path[1..].split("/") do |segment|
  node = node.get(segment, param_values)

  break false if node.nil?
end

This made the benchmark about 10% slower in the 10K route case. 🤔

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 25, 2025

return unless path[1..].split("/") do |segment|
  node = node.get(segment, param_values)

  break false if node.nil?
end

This made the benchmark about 10% slower in the 10K route case. 🤔

Correction. If I inline the splitting exactly as above in Trie#find, there is a small performance boost. I will commit this.

However, when I do the same in Trie#add, the situation is not clear cut. It is significantly slower with 10K routes, and faster with 10, 100 and 1K routes. Still weird. I won't change this for now.

Question for all

Is this performance increase (about 2% with 10K routes, less with fewer routes) worth the increased complexity of this change?

@cllns
Copy link
Member

cllns commented Apr 25, 2025

Question for all

Is this performance increase (about 2% with 10K routes, less with fewer routes) worth the increased complexity of this change?

To answer your question in general: we should optimize performance for 10-100 routes, never 10k routes. The reason I had an issue with 10k routes in a previous implementation (what started this saga 😅 ) is that it was causing the startup time to be so high (~10 seconds) that it was essentially un-usable (since it would add so much time to starting the server + for integration tests). It makes sense that performance degrades at the upper limit, in terms of execution time. We need to be careful to not let it explode, but a small increase is fine.

Sorry, I didn't realize you were already operating on a string. Since we use frozen string literals, we'll need to get copies instead of mutating the object anyway. Also, based on a quick benchmark, String#slice is ~7% faster than String#[]. Mind experimenting with that?

This code below is really hard for me to parse. Is there a way to simplify it? return unless split iterate break false. Could we just iterate over the segments and return if the node is nil?

return unless path[1..].split("/") do |segment|
  node = node.get(segment, param_values)

  break false if node.nil?
end

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 25, 2025

With the above changes there is a noticeable performance increase--about 5% in the 10K route case. Memory use and start-up time are not noticeably affected.

I don't know how far we want to go in pursuing further performance gains. This implementation is already markedly better than 2.1, 2.2, or 2.2b (which was the starting baseline for this exploration). Performance isn't the current goal, per se, but more of a baseline. But it is fun to see how far we can go. 🚀

The Big Questions

It seems like this is a viable path. Some "features" that were exercised in tests are not currently implemented, but I am confident most or all of them could be implemented in the future (although some them could impact performance).

With that said:

  • Do we want to implement this approach?
  • If so, do we merge it now or are there further changes required?

Commit label: 2.2h
Test status: 555 examples, 0 failures, 46 pending

expand to see graphs

graph comparing 2.1 with 2.2h in requests per second for up to 10000 defined routes. 2.2h is faster in all cases

graph comparing 2.1 with 2.2h in start up time. 2.2h is faster in all cases.

graph comparing 2.1 with 2.2h in memory usage. 2.2h uses almost identical memory in all cases.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 25, 2025

Sorry, I didn't realize you were already operating on a string. Since we use frozen string literals, we'll need to get copies instead of mutating the object anyway. Also, based on a quick benchmark, String#slice is ~7% faster than String#[]. Mind experimenting with that?

Interesting! I will try it! 🤓 The docs say #slice is an alias for []. I'm surprised there's a difference.

@timriley
Copy link
Member

Hi @dcr8898, I just want to thank you so much for this fantastic work. Whatever direction we take, this has been a hugely informative investigation that I'm sure we'll refer back to in the future.

This coming week I'm in the midst of talk writing (Baltic Ruby keynote coming up). This will take most of my focus. As soon as that's done I'll get onto responding to this PR :)

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 26, 2025

Hi @dcr8898, I just want to thank you so much for this fantastic work. Whatever direction we take, this has been a hugely informative investigation that I'm sure we'll refer back to in the future.

Thank you! I agree, this has been an important learning journey, especially since Luca has stepped back and this was totally his creation. I look forward to whatever path we choose! It's all upwards from here for Hanami! 💮

This coming week I'm in the midst of talk writing (Baltic Ruby keynote coming up). This will take most of my focus. As soon as that's done I'll get onto responding to this PR :)

No rush! I need to get back to looking for a job anyway! 🤓

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 27, 2025

This code below is really hard for me to parse. Is there a way to simplify it? return unless split iterate break false. Could we just iterate over the segments and return if the node is nil?

return unless path[1..].split("/") do |segment|
  node = node.get(segment, param_values)

  break false if node.nil?
end

Yeah, that's awful. 🙄 I'm going to chalk this up to tunnel-vision refactoring. I changed it to this:

def find(path)
  node = @root
  param_values = []

  path[1..].split(SEGMENT_SEPARATOR) do |segment|
    node = node.get(segment, param_values)

    break if node.nil?
  end

  node&.match(param_values)&.then { |found| [found.to, found.params] }
end

Is that better? Any more indirection will reverse the performance gain I think.

@dcr8898
Copy link
Contributor Author

dcr8898 commented Apr 29, 2025

This seems like a good time to pause and wait for @timriley's input.

I believe that the results so far show that this is a viable path. It's faster than using Mustermann because this approach shrinks the problem space. Mustermann is a powerful library for string pattern recognition, extraction, and expansion. However, our purpose is limited to path string parsing, not all string parsing.

This is a smaller problem space. In this space, we can take advantage of the fact that path strings have a defined structure: segments, separated by / characters. This simple realization powers the speed of the trie data structures and simplifies our parsing work for parameter extraction.

I want to reiterate that we remain reliant on Mustermann for both globbed routes and all named route expanders (all or most routes in most apps), neither of which are exercised with this benchmark tool. This means our start-up time concerns remain unless we use an alternative for path string expansion. I think this is doable.

The graphs below include 2.1 and 2.2b as reference points for our progress.

Commit label: 2.2i
Test status: 555 examples, 0 failures, 46 pending

expand to see graphs

graph comparing 2.1, 2.2b, and 2.2h in requests per second for up to 10000 defined routes. 2.2i is faster in all cases

graph comparing 2.1, 2.2b, and 2.2h in start up time. 2.2i is faster in all cases.

graph comparing 2.1, 2.2b, and 2.2h in memory usage. 2.2i uses almost identical memory to 2.1, and less than 2.2b, in all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants