Skip to content

Speed up pathlib.Path.glob() by removing redundant regex matching #115060

Closed
@barneygale

Description

@barneygale

In #104512 we made pathlib.Path.glob() use a "walk-and-filter" strategy for expanding ** wildcards in patterns: when we encounter a ** segment, we immediately consume subsequent segments and use them to build a regex that is used to filter results. This saves a bunch of scandir() calls.

However! We actually build a regex for the entire pattern given to glob(), rather than just the segments following ** wildcards. And so when evaluating a pattern like dir*/**/file*, the dir* part is needlessly matched twice against each path. @zooba noted this in a review comment at the time.

We should be able to improve performance by building an re.Pattern only for segments following ** wildcards, and not the entire glob() pattern.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions