(core::str) Boyer-Moore string searching#1932
(core::str) Boyer-Moore string searching#1932killerswan wants to merge 14 commits intorust-lang:masterfrom
Conversation
|
Did you benchmark on small strings? I would expect that for anything shorter than than a hundred characters, a naive search (which doesn't allocate anything) is much faster than Boyer-Moore. Might be worthwhile to conditionalize the finding functions to pick an algorithm based on the haystack size (after informed benchmarking, of course). |
|
Yeah, that definitely needs doing. (I've benchmarked enough to know it will be worth it.) |
|
@brson -- hopefully you can do any remaining review/merging since you've merged other pull requests for |
|
I'll go ahead and merge this and find a reasonable cutoff point to switch from naive search to boyer-moore. |
|
This needs some tuning. In my cursory testing the performance is significantly worse than the naive implementation. I think we need some measurements that show when it makes sense to switch to boyer-moore. This test case doesn't terminate before I get bored and kill it: https://gist.github.com/2003993 |
|
Hmm, I expected large needles in small haystacks to be unimpressive, but that's worse than I expected! I bet a zero-copy slice would fix everything, but in the meantime I have several other ideas... I've just been busy at work this week, though. |
|
In progress. Would you all rather I close this and re-request later, or let this sit in the queue while we fiddle with this, or what? |
|
Feel free to leave the pull req open -- unless you're planning to let this sit for a month or so. |
…re_search for testing
this is currently Boyer-Moore-Horspool
|
@killerswan what's the status? it looks like you've made some improvements |
|
@brson Yeah... but I've now done enough testing to know that the average "needle" and "haystack" in a I'm going to put this and my test code in another repository, and close the pull request. (Edit, for reference: https://github.com/killerswan/boyer-moore-search). |
|
@killerswan That's a bummer, but it happens. I have many branches that will never go anywhere because they didn't pay off like I expected. Thanks for the effort though. |
Use SIMD intrinsics for `vfmaddsubph` and `vfmsubaddph`
Use SIMD intrinsics for `vfmaddsubph` and `vfmsubaddph`
Here, I've added a Boyer-Moore string search. It computes a pair of tables based on the "needle" being searched for, and then uses them to go faster through the "haystack"...
I've used it within str::find_str_between and str::iter_matches, and also added these functions: