gh-136063: fix various quadratic worst-time complexities in `_header_value_parser.py` [WIP] #134947

picnixz · 2025-05-30T18:12:34Z

This is a work in progress (I need to go now) but I'll continue tomorrow. I want to add tests, and some other places are still not fixed because I didn't find a straightforward fix.

Issue: A Series of Simple DOS Vulnerabilities #134873

Issue: Potential Quadratic Complexity Vulnerabilities in the email Module #136063

picnixz · 2025-05-31T08:57:12Z

Urgh, so there is still a quadratic complexity that I need to think about, it's when we alternate if-branches. For instance in get_phrase:

    try:
        token, value = get_word(value)
        phrase.append(token)
    except errors.HeaderParseError:
        phrase.defects.append(errors.InvalidHeaderDefect(
            "phrase does not start with word"))
    while value and value[0] not in PHRASE_ENDS:
        if value[0]=='.':
            phrase.append(DOT)
            phrase.defects.append(errors.ObsoleteHeaderDefect(
                "period in 'phrase'"))
            value = value[1:]
        else:
            try:
                token, value = get_word(value)
            except errors.HeaderParseError:
                if value[0] in CFWS_LEADER:
                    token, value = get_cfws(value)
                    phrase.defects.append(errors.ObsoleteHeaderDefect(
                        "comment found without atom"))
                else:
                    raise
            phrase.append(token)
    return phrase, value

The

if value[0]=='.':
    phrase.append(DOT)
    phrase.defects.append(errors.ObsoleteHeaderDefect(
        "period in 'phrase'"))
    value = value[1:]

is quadratic if we're processing multiple times .. However, if I have something like 'a + '.a' * 100, then the if branch still requires a copy every two iterations, whatever I put inside. Even if the length reduces at each iteration, it doesn't sufficiently reduce. I'll need to think a bit more.

~~One idea would have been to use a deque to prevent a copy when stripping parts, but then this requires to reconstruct a deque everytime.~~

Maybe switch to a stateful parser? That way, we shouldn't have high complexity and we should be fine. But this requires a complete rewrite of this module.

picnixz · 2025-05-31T12:09:44Z

Ok, this patch still fixes some cases but not all. Cases where two branches alternate would still be subject to O(n²) complexities (unless we avoid the copy in .lstrip() or in [1:], it's not possible to avoid this with .lstrip() or slices since they still need O(n) to copy the rest of the string).

The advantage of .lstrip() over slices is essentially when the if branch is executed more than once before going to the else branch (namely, we can batch-process some characters). For instance, "a" + "." * 50000 is parsed using lstrip() in O(n) instead of O(n²). However, "a" + ".a" * 50000 is still subject to O(n²) parsing.

picnixz · 2025-05-31T12:49:22Z

@serhiy-storchaka I'm a bit stuck here. I don't really have a better idea than to rewrite the module where we would use a deque to hold the current value. That way, I can call .popleft() to drop prefixed chars. Unfortunately, this also means that cannot really return a string anymore as the module is used and signatures are actually in _typeshed: https://github.com/python/typeshed/blob/main/stdlib/email/_header_value_parser.pyi.

So, to ensure backward compatibility, I think I'll need a new module... I can't think of another solution instead of entirely rewriting the logic so that we don't have un-necessary slice so your help would be appreciated, TiA!

I thought about holding the current "index" where the parser stopped but again, I don't think it'll be sufficient as I'll still need to make slices at some point to extract some values to hold (OTOH, using a deque allows me to move some characters to elsewhere without having to copy the string twice, though I'll still need a ''.join() on the part that is being stored).

def get_something(value):
	storage = Storage()
	head = []
	while cond(value):
	    head.append(value.popleft())
	storage.value = ''.join(head)
	return storage, value

serhiy-storchaka · 2025-06-28T10:35:27Z

This approach doesn't work, because the code has inherently quadratic complexity. Every function splits the input string on two parts -- the parsed part and the rest. This operation (creating a new still unparsed string) has linear complexity. Repeated, the total complexity is quadratic.

The only way to avoid quadratic complexity is to change interface of all functions. To rewrite the code in a style similar to the re parser or HTMLParser. I'll take care of this when I'm done with HTMLParser.

picnixz · 2025-06-28T11:28:52Z

The only way to avoid quadratic complexity is to change interface of all functions. To rewrite the code in a style similar to the re parser or HTMLParser. I'll take care of this when I'm done with HTMLParser.

Yeah that's what I feared and thus I didn't start this work until you confirmed it. Thanks for taking care of this as it would take me too much time.

fix quadratic worst-time complexity in _header_value_parser.py

2121094

bedevere-app bot mentioned this pull request May 30, 2025

A Series of Simple DOS Vulnerabilities #134873

Closed

picnixz changed the title ~~gh-134873: fix various quadratic worst-time complexity in _header_value_parser.py [WIP]~~ gh-134873: fix various quadratic worst-time complexities in _header_value_parser.py [WIP] May 30, 2025

fix quadratic worst-time complexity in _header_value_parser.py

af32b1b

serhiy-storchaka self-requested a review June 1, 2025 06:06

This comment was marked as resolved.

Sign in to view

fixup

7bf0e7a

kexinoh mentioned this pull request Jun 28, 2025

Potential Quadratic Complexity Vulnerabilities in the email Module #136063

Open

serhiy-storchaka changed the title ~~gh-134873: fix various quadratic worst-time complexities in _header_value_parser.py [WIP]~~ gh-136063: fix various quadratic worst-time complexities in _header_value_parser.py [WIP] Jun 28, 2025

picnixz closed this Jun 28, 2025

github-project-automation bot moved this to Done in lavitaconnect@MOSTAFAAMMER Jun 28, 2025

picnixz deleted the fix/email/dos-134873 branch June 28, 2025 12:02

picnixz removed type-security A security issue needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Jun 28, 2025

picnixz removed the needs backport to 3.14 bugs and security fixes label Jun 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-136063: fix various quadratic worst-time complexities in `_header_value_parser.py` [WIP] #134947

gh-136063: fix various quadratic worst-time complexities in `_header_value_parser.py` [WIP] #134947

picnixz commented May 30, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

picnixz commented May 31, 2025 •

edited

Loading

Uh oh!

picnixz commented May 31, 2025

Uh oh!

picnixz commented May 31, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

serhiy-storchaka commented Jun 28, 2025

Uh oh!

picnixz commented Jun 28, 2025

Uh oh!

Uh oh!

Uh oh!

gh-136063: fix various quadratic worst-time complexities in _header_value_parser.py [WIP] #134947

gh-136063: fix various quadratic worst-time complexities in _header_value_parser.py [WIP] #134947

Conversation

picnixz commented May 30, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented May 31, 2025

Uh oh!

picnixz commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

serhiy-storchaka commented Jun 28, 2025

Uh oh!

picnixz commented Jun 28, 2025

Uh oh!

Uh oh!

gh-136063: fix various quadratic worst-time complexities in `_header_value_parser.py` [WIP] #134947

gh-136063: fix various quadratic worst-time complexities in `_header_value_parser.py` [WIP] #134947

picnixz commented May 30, 2025 •

edited by bedevere-app bot

Loading

picnixz commented May 31, 2025 •

edited

Loading

picnixz commented May 31, 2025 •

edited

Loading