-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-113304: Add pos/endpos parameters to re module functions #113306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
adamsilkey
wants to merge
11
commits into
python:main
Choose a base branch
from
adamsilkey:gh113304-re-pos-endpos
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The current docs for Pattern.search/match/matchall/finditer/findall imply that `pos`/`endpos` are positional arguments only. But, in fact, they support keyword assignment, as seen: >>> import re >>> pattern = re.compile('abc') >>> pattern.search('012abc678', pos=3) <re.Match object; span=(3, 6), match='abc'> >>> pattern.search('012abc678', endpos=6) <re.Match object; span=(3, 6), match='abc'> >>> pattern.search('012abc678', pos=3, endpos=6) <re.Match object; span=(3, 6), match='abc'> The interactive help also shows this: >>> help(pattern.search) Help on built-in function search: search(string, pos=0, endpos=9223372036854775807) method of re.Pattern instance Scan through string looking for a match, and return a corresponding match object instance. Return None if no position in the string matches. (END) This commit updates the signatures of the affected methods in the doc to reflect.
Add special characters section to docs to enable finding via the table of contents and make discoverability easier.
This commit adds the `pos` and `endpos` parameters to the following top-level `re` module functions: - `re.match()` - `re.fullmatch()` - `re.search()` - `re.findall()` - `re.finditer()` Prior to this commit, the `pos` and `endpos` parameters were only available to users by first compiling a pattern using `re.compile`. Adding these optional arguments standardizes the behavior between the two and prevents users from being forced to compile if they wish to use the `pos`/`endpos` arguments. Rationale: There are a number of methods in the Python Regex Pattern class that support optional positional arguments (pos/endpos): - `Pattern.match(string[, pos[, endpos]])` - `Pattern.fullmatch(string[, pos[, endpos]])` - `Pattern.search(string[, pos[, endpos]])` - `Pattern.findall(string[, pos[, endpos]])` - `Pattern.finditer(string[, pos[, endpos]])` Additionally, Python provides access to these pattern methods as top-level convenience functions in the module itself: - `re.search()` - `re.match()` - `re.fullmatch()` - `re.findall()` - `re.finditer()` However, these top-level convenience functions do not support the optional arguments. If anyone wants to utilize the optional arguments, they must first compile a pattern with `re.compile()` and then call the method with the optional arguments. But all the top-level convenience functions do is compile the pattern, and then execute the pattern, as seen in the commit diff. Looking at the underlying C Code for these methods, the method defines `pos` and `endpos` as `0` and `PY_SSIZE_T_MAX` respectively. It only changes the values if the arg parser detects the presence of either `pos` or `endpos`. Here is an example from the match function: ```c static PyObject * _sre_SRE_Pattern_match(PatternObject *self, PyTypeObject *cls, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) { (...) Py_ssize_t pos = 0; Py_ssize_t endpos = PY_SSIZE_T_MAX; (...) pos = ival; (...) endpos = ival; (...) return_value = _sre_SRE_Pattern_match_impl(self, cls, string, pos, endpos); ```
- Add new header section describing the string indexing arguments - Update function signatures to reflect changes
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This commit adds the
pos
andendpos
parameters to the followingtop-level
re
module functions:re.match()
re.fullmatch()
re.search()
re.findall()
re.finditer()
Prior to this commit, the
pos
andendpos
parameters were onlyavailable to users by first compiling a pattern using
re.compile
.Adding these optional arguments standardizes the behavior between
the two and prevents users from being forced to compile if they wish
to use the
pos
/endpos
arguments.Additionally, this commit:
pos
/endpos
Rationale
There are a number of methods in the Python Regex Pattern class
that support optional string indexing parameters (pos/endpos):
Pattern.match(string[, pos[, endpos]])
Pattern.fullmatch(string[, pos[, endpos]])
Pattern.search(string[, pos[, endpos]])
Pattern.findall(string[, pos[, endpos]])
Pattern.finditer(string[, pos[, endpos]])
Additionally, Python provides access to these Pattern methods as
top-level convenience functions in the module itself:
re.match()
re.fullmatch()
re.search()
re.findall()
re.finditer()
However, these top-level convenience functions do not support the
optional arguments. If anyone wants to utilize the optional parameters,
they must first compile a pattern with
re.compile()
and then callthe method with the optional arguments.
But all the top-level convenience functions do is compile the pattern,
and then execute the pattern, as seen here:
Looking at the underlying C Code for these methods, the method defines
pos
andendpos
as0
andPY_SSIZE_T_MAX
respectively. It onlychanges the values if the arg parser detects the presence of either
pos
orendpos
.Here is an example from the match function, indentation adjusted
for readability:
This commit adds
pos=0
andendpos=sys.maxsize
to match theinternal behavior of the underlying C code.
Additional Documentation Updates
Add Special Characters section to re docs
Add special characters section to docs to enable finding via the
table of contents and make discoverability easier.
Update Pattern method signatures to reflect actual behavior
The current docs for Pattern.search/match/matchall/finditer/findall
imply that
pos
/endpos
are positional arguments only. But, infact, they support keyword assignment, as seen:
The interactive help also shows this:
📚 Documentation preview 📚: https://cpython-previews--113306.org.readthedocs.build/