Skip to content

Make re.Match a well-rounded Sequence type #133546

Open
@vberlier

Description

@vberlier

Feature or enhancement

Proposal:

It would be nice if the following worked as expected:

m = re.match(r"(a)(b)(c)", "abc")

assert isinstance(m, Sequence)
assert len(m) == 4
assert list(m) == ["abc", "a", "b", "c"]

abc, a, b, c = m
assert abc == "abc" and a == "a" and b == "b" and c == "c"

match re.match(r"(\d+)-(\d+)-(\d+)", "2025-05-07"):
    case [_, year, month, day]:
        assert year == "2025" and month == "05" and day == "07"

If you also work with Javascript this will feel very familiar:

let m = "abc".match(/(a)(b)(c)/)

console.log(m instanceof Array) // true
console.log(m.length) // 4
console.log(Array.from(m)) // [ 'abc', 'a', 'b', 'c' ]

let [abc, a, b, c] = m
console.log(abc) // abc
console.log(a) // a
console.log(b) // b
console.log(c) // c

Back in 2016, the re.Match object API was expanded to include __getitem__ as a shortcut for .group(...).

The goal was to improve usability and approachability by making re.Match objects fit a bit more seamlessly into python's core data model. Accessing groups via subscripting is now intuitive, but because re.Match objects only have a __getitem__ and no __len__, they can't be used as a proper Sequence type.

To me, this always felt a bit awkward. After digging up the original discussion, it seems like the reason why __len__ didn't make it was that it was still undecided whether the returned value should take into account group 0 or not.

Almost a decade later, as a user, the way I see it is that the __getitem__ implementation we're now used to suggests a regular Sequence type that also happens to transparently translate group names provided as subscript to their corresponding group index. In fact, this is actually how it works in the underlying C code.

With this in mind, we can simply define __len__ taking into account group 0, and we'll finally be able to enjoy coherent re.Match objects that behave as proper Sequence types.

Has this already been discussed elsewhere?

https://discuss.python.org/t/make-re-match-a-well-rounded-sequence-type/91039

Links to previous discussion of this feature:

Improve the usability of the match object named group API (605bdae)

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions