Closed
Description
In #104512 we made pathlib.Path.glob()
use a "walk-and-filter" strategy for expanding **
wildcards in patterns: when we encounter a **
segment, we immediately consume subsequent segments and use them to build a regex that is used to filter results. This saves a bunch of scandir()
calls.
However! We actually build a regex for the entire pattern given to glob()
, rather than just the segments following **
wildcards. And so when evaluating a pattern like dir*/**/file*
, the dir*
part is needlessly matched twice against each path. @zooba noted this in a review comment at the time.
We should be able to improve performance by building an re.Pattern
only for segments following **
wildcards, and not the entire glob()
pattern.
Linked PRs
- GH-115060: Speed up
pathlib.Path.glob()
by removing redundant regex matching #115061 - GH-115060: Speed up
pathlib.Path.glob()
by skipping directory scanning #116152 - GH-115060: Speed up
pathlib.Path.glob()
by not scanning literal parts #117732 - GH-115060: Speed up
pathlib.Path.glob()
by omitting initialstat()
#117831