Skip to content

[CVE-2024-11168] urlparse incorrectly retrieves IPv4 and regular name hosts from inside of brackets #103848

Closed
@JohnJamesUtley

Description

@JohnJamesUtley

Background

RFC 3986 defines a host as follows

host = IP-literal / IPv4address / reg-name

Where

IP-literal = "[" ( IPv6address / IPvFuture  ) "]"
reg-name = *( unreserved / pct-encoded / sub-delims )
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet

WhatWG says that "A valid host string must be a valid domain string, a valid IPv4-address string, or: U+005B ([), followed by a valid IPv6-address string, followed by U+005D (])."

The Bug

This is code from Lib/urllib/parse.py:196-208 used for retrieving the hostname from the netloc

    @property
    def _hostinfo(self):
        netloc = self.netloc
        _, _, hostinfo = netloc.rpartition('@')
        _, have_open_br, bracketed = hostinfo.partition('[')
        if have_open_br:
            hostname, _, port = bracketed.partition(']')
            _, _, port = port.partition(':')
        else:
            hostname, _, port = hostinfo.partition(':')
        if not port:
            port = None
        return hostname, port

It will incorrectly retrieve IPv4 addresses and regular name hosts from inside brackets. This is in violation of both specifications.

Minimally reproducible example:

from urllib.parse import urlsplit

parsedURL = urlsplit('scheme://user@[regname]/Path')
print(parsedURL.hostname) # Prints 'regname'

Your environment

  • CPython versions tested on:
  • Operating system and architecture:
    • Arch Linux x86_64

Linked PRs

Metadata

Metadata

Assignees

Labels

stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions