java: Use a generated keyword matcher to improve performance #445

mpkorstanje · 2025-07-30T08:49:44Z

🤔 What's changed?

Improve performance by matching step and title keywords using a generated matcher class.

Before main @ b3413a9

Benchmark                          Mode  Cnt     Score    Error  Units
GherkinParserBenchmarkTest.parse  thrpt    5  1940.849 ± 17.126  ops/s

After

Benchmark                          Mode  Cnt     Score    Error  Units
GherkinParserBenchmarkTest.parse  thrpt    5  2213.691 ± 11.385  ops/s

⚡️ What's your motivation?

In #443 @jkronegg shows that unrolling the loops over the keywords speeds gherkin parsing significantly. I'm not sure about the theoretical underpinnings but the effect is there. This refines that solution by making the unrolling work for an arbitrary number of keywords in each language.

🏷️ What kind of change is this?

🏦 Refactoring/debt/DX (improvement to code design, tooling, etc. without changing behaviour)

♻️ Anything particular you want feedback on?

📋 Checklist:

I agree to respect and uphold the Cucumber Community Code of Conduct
I've changed the behaviour of the code
- I have added/updated tests to cover my changes.
My change requires a change to the documentation.
- I have updated the documentation accordingly.
Users should know about my change
- I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

Aside from being good practice, extensibility if not designed for should be prohibited, this will help isolate some of the effects of #445.

jkronegg

Nice job! The generated code looks good and the give confidence on the loop unrolling.
On my realworld project with 740 BDD scenarios, the GherkinMessagesFeatureParser.parse is 25% faster than with the main branch (100 ms -> 75 ms).

java/src/main/java/io/cucumber/gherkin/GherkinTokenMatcher.java

java/src/codegen/resources/templates/keyword-matchers.java.ftl

mpkorstanje · 2025-07-30T13:20:24Z

On my realworld project with 740 BDD scenarios, the GherkinMessagesFeatureParser.parse is 25% faster than with the main branch (100 ms -> 75 ms).

That is incredible. If you have more ideas I'd be happy to see them. 👍

mpkorstanje added a commit that referenced this pull request Jul 30, 2025

java: Make package private classes final

ff52902

Aside from being good practice, extensibility if not designed for should be prohibited, this will help isolate some of the effects of #445.

mpkorstanje mentioned this pull request Jul 30, 2025

java: Make package private classes final #446

Merged

mpkorstanje added a commit that referenced this pull request Jul 30, 2025

java: Make package private classes final (#446)

b3413a9

Aside from being good practice, extensibility if not designed for should be prohibited, this will help isolate some of the effects of #445.

mpkorstanje added 6 commits July 30, 2025 10:51

Use a (to be) generated matcher for faster keyword matching

ef3db6b

Also fetch the keyword from the matcher

d0fc8aa

Also fetch the length from the matcher

daceacb

Use generated code

8e82444

Generate all matchers

4c4fcf0

Sort all keywords

3551231

mpkorstanje force-pushed the generated-matcher-with-keyword-type-and-length branch from 71452c8 to 3551231 Compare July 30, 2025 08:51

mpkorstanje added 2 commits July 30, 2025 11:18

Improve codegen

feb6284

Fix up tests

211543c

mpkorstanje requested a review from jkronegg July 30, 2025 09:51

mpkorstanje added 2 commits July 30, 2025 11:55

Fix up tests

273a4c9

Reuse keyword matchers

d484a9d

mpkorstanje changed the title ~~java: Generate keyword matchers~~ java: Improve performance with a generated keyword matcher Jul 30, 2025

Update CHANGELOG

d6c113c

mpkorstanje marked this pull request as ready for review July 30, 2025 10:13

Touchups

43958ff

jkronegg approved these changes Jul 30, 2025

View reviewed changes

java/src/main/java/io/cucumber/gherkin/GherkinTokenMatcher.java Outdated Show resolved Hide resolved

java/src/main/java/io/cucumber/gherkin/GherkinTokenMatcher.java Show resolved Hide resolved

java/src/main/java/io/cucumber/gherkin/GherkinTokenMatcher.java Show resolved Hide resolved

Only update currentKeywordMatcher if language changed

c7a121a

jkronegg reviewed Jul 30, 2025

View reviewed changes

java/src/codegen/resources/templates/keyword-matchers.java.ftl Outdated Show resolved Hide resolved

mpkorstanje and others added 2 commits July 30, 2025 14:59

Use startsWithTitleKeyword again pull constant up into generated code

d1662a2

Copy tests from #443

891b08c

mpkorstanje changed the title ~~java: Improve performance with a generated keyword matcher~~ java: Use a generated keyword matcher to improve performance Jul 30, 2025

mpkorstanje merged commit c279678 into main Jul 30, 2025
4 checks passed

mpkorstanje deleted the generated-matcher-with-keyword-type-and-length branch July 30, 2025 13:13

mpkorstanje mentioned this pull request Jul 30, 2025

StepLine identification loop unrolling #443

Closed

7 tasks

mpkorstanje mentioned this pull request Jul 30, 2025

Improving parser performance #436

Merged

7 tasks

jkronegg mentioned this pull request Jul 30, 2025

Minor performance improvement after keyword match unrolling #449

Closed

7 tasks

This was referenced Aug 18, 2025

Bump Gherkin from 33.1.0 to 34.0.0 Romfos/BddDotNet#4

Closed

Bump Gherkin from 33.1.0 to 34.0.0 Romfos/BddDotNet#5

Closed

dependabot bot mentioned this pull request Sep 11, 2025

Bump Gherkin from 33.1.0 to 35.0.0 Romfos/BddDotNet#11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

java: Use a generated keyword matcher to improve performance #445

java: Use a generated keyword matcher to improve performance #445

Uh oh!

mpkorstanje commented Jul 30, 2025 •

edited

Loading

Uh oh!

jkronegg left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpkorstanje commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

java: Use a generated keyword matcher to improve performance #445

java: Use a generated keyword matcher to improve performance #445

Uh oh!

Conversation

mpkorstanje commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤔 What's changed?

⚡️ What's your motivation?

🏷️ What kind of change is this?

♻️ Anything particular you want feedback on?

📋 Checklist:

Uh oh!

jkronegg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpkorstanje commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mpkorstanje commented Jul 30, 2025 •

edited

Loading