Description
Feature or enhancement
Proposal:
It would be nice if the following worked as expected:
m = re.match(r"(a)(b)(c)", "abc")
assert isinstance(m, Sequence)
assert len(m) == 4
assert list(m) == ["abc", "a", "b", "c"]
abc, a, b, c = m
assert abc == "abc" and a == "a" and b == "b" and c == "c"
match re.match(r"(\d+)-(\d+)-(\d+)", "2025-05-07"):
case [_, year, month, day]:
assert year == "2025" and month == "05" and day == "07"
If you also work with Javascript this will feel very familiar:
let m = "abc".match(/(a)(b)(c)/)
console.log(m instanceof Array) // true
console.log(m.length) // 4
console.log(Array.from(m)) // [ 'abc', 'a', 'b', 'c' ]
let [abc, a, b, c] = m
console.log(abc) // abc
console.log(a) // a
console.log(b) // b
console.log(c) // c
Back in 2016, the re.Match
object API was expanded to include __getitem__
as a shortcut for .group(...)
.
The goal was to improve usability and approachability by making re.Match
objects fit a bit more seamlessly into python's core data model. Accessing groups via subscripting is now intuitive, but because re.Match
objects only have a __getitem__
and no __len__
, they can't be used as a proper Sequence
type.
To me, this always felt a bit awkward. After digging up the original discussion, it seems like the reason why __len__
didn't make it was that it was still undecided whether the returned value should take into account group 0
or not.
Almost a decade later, as a user, the way I see it is that the __getitem__
implementation we're now used to suggests a regular Sequence
type that also happens to transparently translate group names provided as subscript to their corresponding group index. In fact, this is actually how it works in the underlying C code.
With this in mind, we can simply define __len__
taking into account group 0
, and we'll finally be able to enjoy coherent re.Match
objects that behave as proper Sequence
types.
Has this already been discussed elsewhere?
https://discuss.python.org/t/make-re-match-a-well-rounded-sequence-type/91039
Links to previous discussion of this feature:
Improve the usability of the match object named group API (605bdae)