Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-119879: Utilize last character gap for two-way periodic needles #119880

Merged
merged 10 commits into from
Jun 4, 2024

Conversation

dg-pb
Copy link
Contributor

@dg-pb dg-pb commented May 31, 2024

All is laid out in the issue. It would be great if someone familiar with an algorithm looked at this.

@nineteendo
Copy link
Contributor

cc @sweeneyde

@dg-pb dg-pb changed the title gh-119879: bad character rule for two-way periodic needles gh-119879: Horspool skip for two-way periodic needles Jun 1, 2024
@dg-pb dg-pb marked this pull request as draft June 1, 2024 20:06
@dg-pb
Copy link
Contributor Author

dg-pb commented Jun 1, 2024

Removed asserts:

assert(gap >= i - cut + 1);
assert(i - cut + 1 > gap);

as they are implied by

Py_ssize_t gap_jump_end = Py_MIN(len_needle, cut + gap);

@dg-pb dg-pb marked this pull request as ready for review June 1, 2024 21:00
@nineteendo
Copy link
Contributor

Note that asserts don't generate code:

#define assert(op) /* empty */

@dg-pb dg-pb marked this pull request as draft June 1, 2024 21:57
@dg-pb dg-pb marked this pull request as ready for review June 2, 2024 00:32
Copy link
Member

@sweeneyde sweeneyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea, thanks for investigating this.

I'd change the title to something like "Utilize last character gap for two-way periodic needles". The Horspool/"bad character" skip is everything involving table, and this change doesn't look to involve that. Using gap is the first "good suffix"-type rule.

Please also add a NEWS entry (you can click the "bedevere/news --> details" check).

@sweeneyde sweeneyde self-assigned this Jun 2, 2024
@dg-pb dg-pb changed the title gh-119879: Horspool skip for two-way periodic needles gh-119879: Utilize last character gap for two-way periodic needles Jun 2, 2024
@erlend-aasland erlend-aasland added the performance Performance or resource usage label Jun 2, 2024
@dg-pb dg-pb marked this pull request as draft June 3, 2024 14:51
@dg-pb dg-pb marked this pull request as ready for review June 3, 2024 16:18
@dg-pb
Copy link
Contributor Author

dg-pb commented Jun 3, 2024

Suspended to double check one last thing to make sure it all works ok. All seems to be correct.

@@ -0,0 +1 @@
Utilization of last character gap (good suffix rule) for two-way periodic needles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear we're talking about fast search here.

@sweeneyde
Copy link
Member

(It looks like there's still whitespace on line 409.)

Other than that, I'm happy to merge this. Although the cases it speeds up are probably rare in practice, there is no real increase in code complexity here: you're aligning the periodic algorithm with an existing trick used in the non-periodic algorithm, so LGTM.

Going forward, we should discuss all of your ongoing work on fastsearch.h under one issue, say #119702 for example. There's no need to open a new issue for each individual proposed change.

@sweeneyde sweeneyde merged commit a8f1152 into python:main Jun 4, 2024
36 checks passed
@dg-pb dg-pb deleted the implement-119879 branch June 4, 2024 07:59
barneygale pushed a commit to barneygale/cpython that referenced this pull request Jun 5, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants