-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-119879: Utilize last character gap for two-way periodic needles #119880
Conversation
cc @sweeneyde |
Removed asserts: assert(gap >= i - cut + 1);
assert(i - cut + 1 > gap); as they are implied by Py_ssize_t gap_jump_end = Py_MIN(len_needle, cut + gap); |
Note that asserts don't generate code: #define assert(op) /* empty */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea, thanks for investigating this.
I'd change the title to something like "Utilize last character gap for two-way periodic needles". The Horspool/"bad character" skip is everything involving table
, and this change doesn't look to involve that. Using gap
is the first "good suffix"-type rule.
Please also add a NEWS entry (you can click the "bedevere/news --> details" check).
Suspended to double check one last thing to make sure it all works ok. All seems to be correct. |
@@ -0,0 +1 @@ | |||
Utilization of last character gap (good suffix rule) for two-way periodic needles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear we're talking about fast search here.
(It looks like there's still whitespace on line 409.) Other than that, I'm happy to merge this. Although the cases it speeds up are probably rare in practice, there is no real increase in code complexity here: you're aligning the periodic algorithm with an existing trick used in the non-periodic algorithm, so LGTM. Going forward, we should discuss all of your ongoing work on fastsearch.h under one issue, say #119702 for example. There's no need to open a new issue for each individual proposed change. |
All is laid out in the issue. It would be great if someone familiar with an algorithm looked at this.