Skip to content

gh-105973: Ensure that http header parser returns RFC 7230 compliant header values #105994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

bentasker
Copy link

@bentasker bentasker commented Jun 22, 2023

…e from header values

This adds a test to check for the behaviour identified in python#105973
…ations

Although python#105973 is promarily concerned with trailing whitespace not
being stripped, it's easy/inexpensive to test for other whitespace
behaviour (to detect any other future incompatabilities).

This test change ensures that

- Leading whitespace is trimmed from header values
- Whitespace in the middle of a header value is not removed
@ghost
Copy link

ghost commented Jun 22, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

Copy link
Contributor

@barneygale barneygale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to adjust email.policy.HTTP, rather than applying a post-process on the headers?

@bentasker
Copy link
Author

In principle, I think so, yes:

  • EmailPolicy would need updating to add a new attribute
  • EmailPolicy.header_source_parse() would need updating to call rstrip() depending on the value of the new attribute.

As long as the new attribute defaults to false (preserving current behaviour) there should be a low likelihood of impacting anything else.

email does feel a little like the wrong place to be adding HTTP specific logic (especially if any other differences are found in future) because it's not an obvious place to look when troubleshooting, but I can update the PR if that's the desired approach

@bentasker
Copy link
Author

Ooops, rebased after syncing the branch - has pulled other people's commits into this PR. Fixing that now.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

…mstances

This adds a new attribute `allow_trailing_whitespace` to `EmailPolicy` and
`Policy` and amends `header_source_parse` to consider it's state.

This currently works when `allow_trailing_whitespace` is manually
coerced to `False`, but does not appear to be working with
`http.client`'s invocations
@bentasker bentasker requested a review from a team as a code owner June 23, 2023 08:06
@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@bentasker
Copy link
Author

@barneygale it turns out that http.client wasn't actually using the HTTP policy so I've had to add an import and adjust the invocation in _parse_header_lines to account for that.

The branch has been rebased to remove my original changes (apart from the tests), and the requested changes to email.policy.HTTP have been made.

@bentasker
Copy link
Author

Looks like it's blown up a few tests - the way the Date header is handled looks to have changed (upsetting the urllib test). Also looks like quoting of strings in values has been lost.

Will try find time later to fix these

The original name was too long and disrupted the layout of
docstrings and test definitions.
@bentasker
Copy link
Author

I ran into an issue, so wanted to lay it out here for clarity - moving the change to email.policy.HTTP initially opened a can of worms.

client.http was using the default policy rather than HTTP, so it wasn't possible to make HTTP specific changes without first changing that.

Changing that, though, led to tests in other modules failing (quite reasonably) because it led to changes in response behaviour.

For example, passing Content-Type: text/css; charset=UTF-8 exhibits different behaviour:

Before: Content-Type: text/css; charset=UTF-8
After: Content-Type: text/css; charset="UTF-8"

This happens because the change in policy means the value is now passed through MimeParameters.__str__() which quotes the attribute string.

Whilst the result is legal in HTTP, it's a change in behaviour which might cause issues downstream. As an easy example of this, it breaks this test.

MimeParameters is quite far removed from email.policy so it's not possible to simply add a check in there to prevent quoting. To do that, we'd likely need to update the signature of both HeaderRegistry() and MimeParameters().

There are other changes to response formats too, email.policy.HTTP isn't currently generating HTTP compliant dates (it's providing a GMT offset rather than GMT). Easier to fix, but still an unintended (and breaking) effect.

So, in order to preserve existing behaviour whilst applying the trailing space fix, I've adjusted email.policy.HTTP to be a clone of Compat32 (the class it was using before the http.client change) rather than of EmailPolicy.

That change fixes the issue (but currently breaks the email tests).

@barneygale I wanted to check which you'd prefer:

  • Adjust Compat32 to fix the test (essentially adding some attributes that won't ever be used, and fixing docstrings)
  • Spin out a new HTTP specific class (allowing easier expansion/extension if needed later).

This change will break a number of `email` tests because `Compat32` does
not currently explicitly contain (or document) certain attributes
@bentasker
Copy link
Author

Looking at it with fresh eyes this morning has helped.

The issue (after switching to using Compat32) is that there's a stanza in the tests which specifies where the defaults should be read from - that just needed to be updated to take from compat32_defaults

@bentasker
Copy link
Author

All tests passed. Ready for review at your convenience @barneygale

@michaelfm1211
Copy link

michaelfm1211 commented Jun 25, 2023

Related are these PR and issue (feature request) I opened: #105918 and #105622. tl;dr they would change http to parse headers using email.policy.default instead of email.policy.compat32

I actually like using email.policy.HTTP over email.policy.default, so if this PR is merged the others should be closed, as they directly conflict with each other.

Or, both PRs could be combined in a way. Currently this PR clones email.policy.HTTP from the email.policy.compat32. It might be worth upgrading it to clone from email.policy.default (see my issue linked above for why this could be a good idea).

@bentasker
Copy link
Author

Just a gentle nudge - any chance of getting a review on this please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants