Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing the Forwarded header #2173

Merged
merged 1 commit into from
Aug 9, 2017
Merged

Fix parsing the Forwarded header #2173

merged 1 commit into from
Aug 9, 2017

Conversation

vfaronov
Copy link
Contributor

@vfaronov vfaronov commented Aug 6, 2017

What do these changes do?

Parse the Forwarded header more carefully, while staying about as fast, simple, and robust in the face of potential injection attacks. (Speed was measured with IPython’s %timeit on my laptop on a few typical and pathological header values.)

Are there changes in behavior for the user?

Most valid Forwarded headers that used to be parsed incorrectly by aiohttp are now parsed correctly. In particular:

  • commas and semicolons are allowed inside quoted-strings;
  • empty forwarded-pairs (as in for=_1;;by=_2) are allowed;
  • non-standard parameters are allowed (although this alone could be easily done in the previous parser).

Valid headers containing obs-text are still not parsed correctly, as this was an intentional decision in the previous parser (see comments) that I did not change.

Also, the previous parser used to bail out of (invalid) forwarded-elements containing duplicate parameter names. No rationale was given in the code, and I don’t think this is important, so the new parser doesn't enforce this.

Related issue number

#2170

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the changes folder
    • name it <issue_id>.<type> for example (588.bug)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

This fixes #2170 by parsing Forwarded more carefully, while staying about
as fast, simple, and robust in the face of potential injection attacks.
(Speed was measured with IPython's %timeit on my laptop on a few typical
and pathological header values.)

In particular:

- commas and semicolons are allowed inside quoted-strings;
- empty forwarded-pairs (as in "for=_1;;by=_2") are allowed;
- non-standard parameters are allowed (although this alone could be easily
  done in the previous parser).

This still doesn't parse valid headers containing obs-text, which was
an intentional decision in the previous parser (see comments) that I did
not change.

Also, the previous parser used to bail out of forwarded-elements containing
duplicate parameter names. No rationale was given in the code, and I don't
think this is important, so the new parser doesn't enforce this.
@vfaronov vfaronov changed the title Fix parsing the Forwarded header (#2170) Fix parsing the Forwarded header Aug 6, 2017
r'\1', value[1:-1])
fvparams[param.lower()] = value
elems = []
for field_value in self._message.headers.getall(hdrs.FORWARDED, ()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fafhrd91 @asvetlov
It is easily possible to build FSM which will parse this structure using regex module. Unfortunatelly built-in regexps are not able to capture all repeating matches.

Pros:

  • Less code
  • Less error-prone
  • Faster
  • Ability to parse other structures using higly-efficient FSMs

Cons:

  • Dependency on regex module
  • More difficult to understand (unsure, maybe more simplier)
  • Non-pythonic way.

(Anyway, maybe merge as shown here, but re-write later, I can do that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@socketpair please merge the PR first than feel free to create a new one for regexps based approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asvetlov how can I sure that other members agree with my decision to merge ? (sorry, I'm asking BEFORE making something awful)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@socketpair I’m not sure an FSM can handle my test_single_forwarded_header_injection1 case, which necessarily involves a form of backtracking. And I think this case is important to handle, at least until deployment experience shows that all reverse proxies take precautions against that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@socketpair the trick borrowed from CPython works: declare intention to merge (LGTM or github pull request approval), wait a day and do merge tomorrow if nobody objects.

r'\1', value[1:-1])
fvparams[param.lower()] = value
elems = []
for field_value in self._message.headers.getall(hdrs.FORWARDED, ()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@socketpair please merge the PR first than feel free to create a new one for regexps based approach.

@codecov-io
Copy link

Codecov Report

Merging #2173 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2173      +/-   ##
==========================================
+ Coverage   97.11%   97.14%   +0.02%     
==========================================
  Files          39       39              
  Lines        7878     7889      +11     
  Branches     1366     1367       +1     
==========================================
+ Hits         7651     7664      +13     
+ Misses        101      100       -1     
+ Partials      126      125       -1
Impacted Files Coverage Δ
aiohttp/web_request.py 99.67% <100%> (+0.68%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee69c40...a9ddd91. Read the comment docs.

@asvetlov asvetlov merged commit a0666e8 into aio-libs:master Aug 9, 2017
@asvetlov
Copy link
Member

asvetlov commented Aug 9, 2017

Thanks

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bot:chronographer:provided There is a change note present in this PR outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants