Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-46848: Use stringlib/fastsearch in mmap #31625

Merged
merged 6 commits into from
Mar 2, 2022

Conversation

sweeneyde
Copy link
Member

@sweeneyde sweeneyde commented Mar 1, 2022

@sweeneyde sweeneyde marked this pull request as draft March 1, 2022 02:04
@sweeneyde sweeneyde marked this pull request as ready for review March 1, 2022 03:13
@rumpelsepp
Copy link
Contributor

I like this proposal better than mine: #31554.

@sweeneyde
Copy link
Member Author

Just a couple of benchmarks on Windows:

from pyperf import Runner
runner = Runner()

for haystack, needle in [
    ("""b'x' * 100_000""", """b'y'"""),
    ("""b'x' * 100_000""", """b'yz'"""),
    ("""b'x' * 100_000""", """b'xy'"""),
    ("""b'x' * 100_000""", """b'yx'"""),
    ("""b'ab' * 100_000""", """b'abracadabra'"""),
    ("""b'a' * 10_000 + b'b' * 10_000 + b'a' * 10_000""",
     """b'a' * 10_001"""),
]:
    runner.timeit(
        f"{needle} in {haystack}",
        setup=f"""\
haystack = {haystack}
needle = {needle}
import mmap
m = mmap.mmap(-1, len(haystack))
m.write(haystack)
m.seek(0)
""",
        stmt=f"m.find(needle)"
    )
Slower (2):
- b'yx' in b'x' * 100_000: 101 us +- 2 us -> 141 us +- 1 us: 1.39x slower
- b'xy' in b'x' * 100_000: 126 us +- 1 us -> 134 us +- 1 us: 1.06x slower

Faster (4):
- b'a' * 10_001 in b'a' * 10_000 + b'b' * 10_000 + b'a' * 10_000: 15.9 ms +- 0.5 ms -> 71.2 us +- 0.7 us: 223.53x faster
- b'y' in b'x' * 100_000: 101 us +- 1 us -> 3.37 us +- 0.07 us: 30.02x faster
- b'yz' in b'x' * 100_000: 101 us +- 1 us -> 25.5 us +- 0.3 us: 3.95x faster
- b'abracadabra' in b'ab' * 100_000: 335 us +- 8 us -> 303 us +- 3 us: 1.11x faster

Geometric mean: 5.20x faster

@sweeneyde
Copy link
Member Author

No refleaks

0:00:00 [1/1] test_mmap
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

@sweeneyde sweeneyde added the performance Performance or resource usage label Mar 2, 2022
@sweeneyde sweeneyde merged commit 6ddb09f into python:main Mar 2, 2022
@sweeneyde sweeneyde deleted the mmap_fastsearch branch March 2, 2022 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants