Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-30285: Optimize case-insensitive matching and searching #1482

Merged
merged 6 commits into from
May 9, 2017

Conversation

serhiy-storchaka
Copy link
Member

of regular expressions.

@serhiy-storchaka serhiy-storchaka added the performance Performance or resource usage label May 5, 2017
@mention-bot
Copy link

@serhiy-storchaka, thanks for your PR! By analyzing the history of the files in this pull request, we identified @rhettinger, @Yhg1s and @niemeyer to be potential reviewers.

@@ -268,7 +280,7 @@ def _optimize_charset(charset, fixup, fixes):
else:
charmap[av] = 1
elif op is RANGE:
r = range(av[0], av[1]+1)
r = r0 = range(av[0], av[1]+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is r0 used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 296.

if fixup and not hascased:
    hascased = any(map(iscased, r0))

This intentionally moved after the loops that set bits in charmap because for large ranges like range(0x20000, 0x110000) it is better to get an IndexError earlier (the size of charmap is 0x100 or 0x10000) than burn CPU in long useless loop.

# internal: optimize character set
out = []
tail = []
charmap = bytearray(256)
hascased = False
if fixup:
if flags & SRE_FLAG_UNICODE and not flags & SRE_FLAG_ASCII:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the if/else statement from 264-267 be replaced with:

iscased = _get_iscased(flags)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be.

Copy link
Contributor

@lisroach lisroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@serhiy-storchaka serhiy-storchaka merged commit 6d336a0 into python:master May 9, 2017
@serhiy-storchaka serhiy-storchaka deleted the re-cased branch May 9, 2017 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants