Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix router matching pre-encoded URLs #8898

Merged
merged 13 commits into from
Aug 31, 2024
Merged

Conversation

Dreamsorcerer
Copy link
Member

@Dreamsorcerer Dreamsorcerer commented Aug 26, 2024

Fixes #5621.
Fixes #6619.

@Dreamsorcerer Dreamsorcerer added backport-3.10 Trigger automatic backporting to the 3.10 release branch by Patchback robot backport-3.11 Trigger automatic backporting to the 3.11 release branch by Patchback robot labels Aug 26, 2024
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Aug 26, 2024
Copy link

codecov bot commented Aug 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.28%. Comparing base (9738426) to head (92e9ab5).
Report is 1 commits behind head on master.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #8898   +/-   ##
=======================================
  Coverage   98.27%   98.28%           
=======================================
  Files         107      107           
  Lines       34226    34221    -5     
  Branches     4058     4058           
=======================================
- Hits        33637    33633    -4     
+ Misses        416      415    -1     
  Partials      173      173           
Flag Coverage Δ
CI-GHA 98.17% <100.00%> (+<0.01%) ⬆️
OS-Linux 97.83% <100.00%> (+<0.01%) ⬆️
OS-Windows 96.24% <100.00%> (+<0.01%) ⬆️
OS-macOS 97.51% <100.00%> (+<0.01%) ⬆️
Py-3.10.11 97.61% <100.00%> (+<0.01%) ⬆️
Py-3.10.14 97.54% <100.00%> (+<0.01%) ⬆️
Py-3.11.9 97.77% <100.00%> (+<0.01%) ⬆️
Py-3.12.5 97.88% <100.00%> (+<0.01%) ⬆️
Py-3.9.13 97.49% <100.00%> (-0.01%) ⬇️
Py-3.9.19 97.43% <100.00%> (+<0.01%) ⬆️
Py-pypy7.3.16 97.04% <100.00%> (+<0.01%) ⬆️
VM-macos 97.51% <100.00%> (+<0.01%) ⬆️
VM-ubuntu 97.83% <100.00%> (+<0.01%) ⬆️
VM-windows 96.24% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Dreamsorcerer Dreamsorcerer marked this pull request as draft August 26, 2024 18:45
@Dreamsorcerer
Copy link
Member Author

I'm not too sure about this, but I think everything is working and passing if I make this change to yarl:

diff --git a/yarl/_quoting_py.py b/yarl/_quoting_py.py
index 585a1da..fc20b37 100644
--- a/yarl/_quoting_py.py
+++ b/yarl/_quoting_py.py
@@ -158,7 +158,7 @@ class _Unquoter:
                         if to_add is None:  # pragma: no cover
                             raise RuntimeError("Cannot quote None")
                         ret.append(to_add)
-                    elif unquoted in self._unsafe:
+                    elif unquoted in self._unsafe or unquoted == "/":
                         to_add = self._quoter(unquoted)
                         if to_add is None:  # pragma: no cover
                             raise RuntimeError("Cannot quote None")

Essentially, we shouldn't unquote %2F in the path, as this could lead to incorrectly treating it as a path separator.

@bdraco
Copy link
Member

bdraco commented Aug 26, 2024

I'm not too sure about this, but I think everything is working and passing if I make this change to yarl:

diff --git a/yarl/_quoting_py.py b/yarl/_quoting_py.py
index 585a1da..fc20b37 100644
--- a/yarl/_quoting_py.py
+++ b/yarl/_quoting_py.py
@@ -158,7 +158,7 @@ class _Unquoter:
                         if to_add is None:  # pragma: no cover
                             raise RuntimeError("Cannot quote None")
                         ret.append(to_add)
-                    elif unquoted in self._unsafe:
+                    elif unquoted in self._unsafe or unquoted == "/":
                         to_add = self._quoter(unquoted)
                         if to_add is None:  # pragma: no cover
                             raise RuntimeError("Cannot quote None")

Essentially, we shouldn't unquote %2F in the path, as this could lead to incorrectly treating it as a path separator.

https://datatracker.ietf.org/doc/html/rfc3986#section-2.4

When a URI is dereferenced, the components and subcomponents
significant to the scheme-specific dereferencing process (if any)
must be parsed and separated before the percent-encoded octets within
those components can be safely decoded, as otherwise the data may be
mistaken for component delimiters.

I read that as the URL needs to be separated by delimiters before its decoded

@Dreamsorcerer
Copy link
Member Author

I read that as the URL needs to be separated by delimiters before its decoded

Hmm, I'm not sure how best to accommodate that...

@bdraco
Copy link
Member

bdraco commented Aug 26, 2024

https://www.w3.org/Addressing/URL/4_URI_Recommentations.html

Example 2
The URIs
http://www.w3.org/albert/bertram/marie-claude

and
http://www.w3.org/albert/bertram%2Fmarie-claude

are NOT identical, as in the second case the encoded slash does not have hierarchical significance.

@bdraco
Copy link
Member

bdraco commented Aug 26, 2024

This gets messy quickly.. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-0450

@bdraco
Copy link
Member

bdraco commented Aug 26, 2024

The safest course seems to never decode %2F to / when decoding a path

@bdraco
Copy link
Member

bdraco commented Aug 26, 2024

@bdraco
Copy link
Member

bdraco commented Aug 26, 2024

I think making / unsafe in yarl makes sense.... but its probably a breaking change

@Dreamsorcerer
Copy link
Member Author

I think making / unsafe in yarl makes sense.... but its probably a breaking change

unsafe has a slightly different meaning, I tried that first.
/my/path%2Ffoo will become %2Fmy%2Fpath%2Ffoo

So, we'd probably want to add an ignore option or something separately, which would go where I hardcoded the value in the diff.

@Dreamsorcerer
Copy link
Member Author

Given the security implications, I suspect the change should be made regardless of backwards compatibility.

@Dreamsorcerer
Copy link
Member Author

aio-libs/yarl#1057

@Dreamsorcerer
Copy link
Member Author

Still haven't found a definitive answer on this one.....maybe we should be testing this against: https://url.spec.whatwg.org/ https://github.com/web-platform-tests/wpt/tree/master/url

Sounds like it could be a lot of work. Feel free to take that on, but I think this current solution is probably reasonable until that is done.

@bdraco
Copy link
Member

bdraco commented Aug 28, 2024

Still haven't found a definitive answer on this one.....maybe we should be testing this against: url.spec.whatwg.org web-platform-tests/wpt@master/url

Sounds like it could be a lot of work. Feel free to take that on, but I think this current solution is probably reasonable until that is done.

I got a bit of a chuckle out of that. Its definitely a lot of work, and I wasn't intending for it to be done here. While I think it would be great to test long term, I was thinking we could look this specific case is handled there as a reference point.

I plan on digging through it a bit more and provide some better analysis. Its Home Assistant beta week so I haven't had time to do that yet.

@bdraco bdraco mentioned this pull request Aug 31, 2024
@bdraco
Copy link
Member

bdraco commented Aug 31, 2024

Will test this shortly

@Dreamsorcerer Dreamsorcerer marked this pull request as ready for review August 31, 2024 14:24
@bdraco
Copy link
Member

bdraco commented Aug 31, 2024

I'm hoping to find time to test this today

@bdraco
Copy link
Member

bdraco commented Aug 31, 2024

Everything looks like its working as expected. Calling path is a bit heavier than raw_path but not much we can do about that.
Screenshot 2024-08-31 at 8 40 30 AM

The cached_property implementation in yarl is quite a bit slower than the reify implementation in aiohttp. I noted that in aio-libs/yarl#1065 .. maybe I need to do that sooner rather than later.

@Dreamsorcerer Dreamsorcerer merged commit 6be9452 into master Aug 31, 2024
34 of 35 checks passed
@Dreamsorcerer Dreamsorcerer deleted the fix-decoded-url-matching branch August 31, 2024 18:56
Copy link
Contributor

patchback bot commented Aug 31, 2024

Backport to 3.10: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 6be9452 on top of patchback/backports/3.10/6be94520ea46fe1829e6c9d986e7fc9f7db50cad/pr-8898

Backporting merged PR #8898 into master

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/aio-libs/aiohttp.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.10/6be94520ea46fe1829e6c9d986e7fc9f7db50cad/pr-8898 upstream/3.10
  4. Now, cherry-pick PR Fix router matching pre-encoded URLs #8898 contents into that branch:
    $ git cherry-pick -x 6be94520ea46fe1829e6c9d986e7fc9f7db50cad
    If it'll yell at you with something like fatal: Commit 6be94520ea46fe1829e6c9d986e7fc9f7db50cad is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x 6be94520ea46fe1829e6c9d986e7fc9f7db50cad
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Fix router matching pre-encoded URLs #8898 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.10/6be94520ea46fe1829e6c9d986e7fc9f7db50cad/pr-8898
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Copy link
Contributor

patchback bot commented Aug 31, 2024

Backport to 3.11: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 6be9452 on top of patchback/backports/3.11/6be94520ea46fe1829e6c9d986e7fc9f7db50cad/pr-8898

Backporting merged PR #8898 into master

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/aio-libs/aiohttp.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.11/6be94520ea46fe1829e6c9d986e7fc9f7db50cad/pr-8898 upstream/3.11
  4. Now, cherry-pick PR Fix router matching pre-encoded URLs #8898 contents into that branch:
    $ git cherry-pick -x 6be94520ea46fe1829e6c9d986e7fc9f7db50cad
    If it'll yell at you with something like fatal: Commit 6be94520ea46fe1829e6c9d986e7fc9f7db50cad is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x 6be94520ea46fe1829e6c9d986e7fc9f7db50cad
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Fix router matching pre-encoded URLs #8898 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.11/6be94520ea46fe1829e6c9d986e7fc9f7db50cad/pr-8898
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Dreamsorcerer added a commit that referenced this pull request Aug 31, 2024
Co-authored-by: J. Nick Koston <nick@koston.org>
(cherry picked from commit 6be9452)
Dreamsorcerer added a commit that referenced this pull request Aug 31, 2024
Co-authored-by: J. Nick Koston <nick@koston.org>
(cherry picked from commit 6be9452)
Dreamsorcerer added a commit that referenced this pull request Aug 31, 2024
Co-authored-by: J. Nick Koston <nick@koston.org>
(cherry picked from commit 6be9452)
Dreamsorcerer added a commit that referenced this pull request Aug 31, 2024
Co-authored-by: J. Nick Koston <nick@koston.org>
(cherry picked from commit 6be9452)
bdraco added a commit that referenced this pull request Sep 23, 2024
#8898 now passes the unquoted path and we would
unquote it again
bdraco added a commit that referenced this pull request Sep 23, 2024
#8898 now passes the unquoted path and we would
unquote it again
bdraco added a commit that referenced this pull request Sep 23, 2024
The minimum version must increase because we need ``URL.path_safe`` to be
able to fix #9267 and the original PR #8898
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-3.10 Trigger automatic backporting to the 3.10 release branch by Patchback robot backport-3.11 Trigger automatic backporting to the 3.11 release branch by Patchback robot bot:chronographer:provided There is a change note present in this PR
Projects
None yet
2 participants