Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host and Authority Headers RFC Compliance: Decode Percent-encoded UTF8 Characters #21306

Open
ameily opened this issue May 16, 2022 · 2 comments

Comments

@ameily
Copy link
Contributor

ameily commented May 16, 2022

Title: Host and Authority Headers RFC Compliance: Decode Percent-encoded UTF8 Characters

Description:
While working on the unified header validation component (#20261), we found that the Host and Authority headers do not decode percent-encoded UTF8 characters, per the RFC spec.

Although the fix could be targeted for UHV, I wanted to register this issue with the community to get consensus on how percent-encoded characters should be handled within the H1 Host and H2 :authority headers. For now, we are only looking at the Host and :authority headers and not talking about URI or path normalization.

Some initial options after reading the RFCs, which could be implemented as new configuration settings:

  • Keep the current behavior and verify that Envoy users can register services that match on percent-encoded host/authority.
  • Decode all percent-encoded characters from Host and :authority, verify they are valid UTF8 codepoints, and re-encode them in the upstream request (where appropriate).
    • The URI RFC says that clients producing URIs should only encode non-ASCII characters in this way. Envoy could enforce this by also verifying that the decoded UTF8 codepoint is outside the ASCII range.
    • This could also be done on a per-service configuration basis (e.g.- decode_authority = [true|false]

Relevant Links:

  • RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax

    The reg-name syntax allows percent-encoded octets in order to represent non-ASCII registered names in a uniform way that is independent of the underlying name resolution technology. Non-ASCII characters must first be encoded according to UTF-8 (STD 63), and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters.

    URI producing applications must not use percent-encoding in host unless it is used to represent a UTF-8 character sequence.

  • The authority component within the URI is used by both H1 Host header and H2 :authority header:

    A client MUST send a Host header field in all HTTP/1.1 request messages. If the target URI includes an authority component, then a client MUST send a field-value for Host that is identical to that authority component, excluding any userinfo subcomponent and its @ delimiter.

    The :authority pseudo-header field includes the authority portion of the target URI (RFC 3986, Section 3.2). The authority MUST NOT include the deprecated userinfo subcomponent for http or https schemed URIs.

@ameily ameily added the triage Issue requires triage label May 16, 2022
@ameily
Copy link
Contributor Author

ameily commented May 16, 2022

CC @yanavlasov

@MdSahil-oss
Copy link

@snowp I would like to work on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants