Make METHRE RFC-7230 compliant #3235

kxepal · 2018-09-02T04:53:25Z

Definition of method is:

    method = token
    tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
            "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
    token = 1*tchar

So he had two issues:

Not all the characters were allowed.
Actually, we did allowed too much characters since $-_ parsed as:
"all characters in range between code 36 ($) and code 95 (_)" instead of
"characters with codes 36, 45 and 95". So we did match methods like
[GET] which are malformed according the spec.

kxepal · 2018-09-02T04:57:43Z

After review of #3233 I found this inconsistentcy between our regex and the spec. If fix is ok, will finish the rest bits.

asvetlov · 2018-09-02T09:20:07Z

@kxepal could you look on failed tests?

kxepal · 2018-09-02T09:30:15Z

Oh, I didn't run full test suite, just test_http_parser. Now should be fixed.

tests/test_web_protocol.py

codecov-io · 2018-09-02T09:59:23Z

Codecov Report

Merging #3235 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master    #3235   +/-   ##
=======================================
  Coverage   98.07%   98.07%           
=======================================
  Files          43       43           
  Lines        7856     7856           
  Branches     1354     1354           
=======================================
  Hits         7705     7705           
  Misses         59       59           
  Partials       92       92

Impacted Files	Coverage Δ
aiohttp/http_parser.py	`98.06% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b71302...7351a1f. Read the comment docs.

webknjaz · 2018-09-02T10:06:32Z

aiohttp/http_parser.py

@@ -29,7 +29,7 @@
    'RawRequestMessage', 'RawResponseMessage')

 ASCIISET = set(string.printable)
-METHRE = re.compile('[A-Z0-9$-_.]+')
+METHRE = re.compile("[!#$%&'*+\-.^_`|~0-9A-Za-z]+")


I think It might make sense to put a comment with that rule somewhere nearby:

# method = token # tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / # "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA # token = 1*tchar

There is git blame with that information, but having a copy in source code wouldn't hurt anyone. Added that one.

Yeah, I know about blame. But it's visually more convenient to see it next to the code. Like it's done in yarl, for example.

webknjaz · 2018-09-02T11:03:30Z

aiohttp/http_parser.py

+#     tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
+#             "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
+#     token = 1*tchar
+METHRE = re.compile("[!#$%&'*+\-.^_`|~0-9A-Za-z]+")


I'd use a raw-string literal r''

Also, + and * might also need escaping and | needs escaping (otherwise it divides groups in [])

Oh, and . might stand for any single character.

Probably they have restricted meanings within [], but IMHO better safe than sorry.

All true, but not for [] sets.

>>> import re >>> re.match('[.]', 't') >>> re.match('[.]', 't') is None True

digits can be replaced with \d (to match style below)

No, they cannot. \d means all the unicode numbers, not just ascii ones:

>>> from hypothesis.strategies import from_regex >>> thing = from_regex('^\d$').example() >>> re.match('\d', thing) <_sre.SRE_Match object; span=(0, 1), match='𝟙'> >>> thing '𝟙' >>> unicodedata.category(thing) 'Nd' >>> unicodedata.name(thing) 'MATHEMATICAL DOUBLE-STRUCK DIGIT ONE'

kxepal · 2018-09-02T11:09:59Z

tests/test_http_parser.py

@@ -540,7 +540,7 @@ def test_http_request_parser_two_slashes(parser):

 def test_http_request_parser_bad_method(parser):
    with pytest.raises(http_exceptions.BadStatusLine):
-        parser.feed_data(b'!12%()+=~$ /get HTTP/1.1\r\n\r\n')
+        parser.feed_data(b'=":<G>(e),[T];?" /get HTTP/1.1\r\n\r\n')


Why? These characters are valid.

Oh, it's a negative test. This reads confusing. Let's change it to parametrized test, smth like:

@pytest.mark.parametrize( 'invalid_byte', list(b'=":<>(),[];?"'), ) @pytest.raises(http_exceptions.BadStatusLine) def test_http_request_parser_bad_method(parser, invalid_byte): valid_method = b'Get' invalid_method = valid_method + invalid_byte invalid_request_line = invalid_method + b' /get HTTP/1.1\r\n\r\n' parser.feed_data(invalid_request_line)

with pytest.raises at least.
Personally I dont insist on pytest parametrization

Definition of method is: ``` method = token tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA token = 1*tchar ``` So we had two issues: 1. Not all the characters were allowed. 2. Actually, we did allowed too much characters since `$-_` parsed as: "all characters in range between code 36 ($) and code 95 (_)" instead of "characters with codes 36, 45 and 95". So we did match methods like `[GET]` which are malformed according the spec.

asvetlov · 2018-09-02T13:03:46Z

@kxepal please merge when ready

lock · 2019-10-28T02:03:48Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

kxepal requested a review from asvetlov September 2, 2018 04:54

kxepal force-pushed the fix-methodre branch 2 times, most recently from 24a60db to 881a7a0 Compare September 2, 2018 05:17

kxepal force-pushed the fix-methodre branch from f4116b3 to 9662f61 Compare September 2, 2018 09:29

webknjaz reviewed Sep 2, 2018

View reviewed changes

tests/test_web_protocol.py Show resolved Hide resolved

kxepal force-pushed the fix-methodre branch 2 times, most recently from 928df82 to 9272c20 Compare September 2, 2018 09:58

webknjaz reviewed Sep 2, 2018

View reviewed changes

kxepal force-pushed the fix-methodre branch from 9272c20 to 7ac7e4d Compare September 2, 2018 10:17

webknjaz reviewed Sep 2, 2018

View reviewed changes

kxepal force-pushed the fix-methodre branch 2 times, most recently from 69b5734 to 7351a1f Compare September 2, 2018 11:23

asvetlov approved these changes Sep 2, 2018

View reviewed changes

asvetlov merged commit de4daf3 into aio-libs:master Sep 8, 2018

lock bot added the outdated label Oct 28, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019

psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make METHRE RFC-7230 compliant #3235

Make METHRE RFC-7230 compliant #3235

kxepal commented Sep 2, 2018

kxepal commented Sep 2, 2018

asvetlov commented Sep 2, 2018

kxepal commented Sep 2, 2018

codecov-io commented Sep 2, 2018 •

edited

Loading

webknjaz Sep 2, 2018

kxepal Sep 2, 2018

webknjaz Sep 2, 2018

webknjaz Sep 2, 2018

webknjaz Sep 2, 2018

webknjaz Sep 2, 2018

webknjaz Sep 2, 2018

kxepal Sep 2, 2018

webknjaz Sep 2, 2018

kxepal Sep 2, 2018 •

edited

Loading

This comment was marked as outdated.

kxepal Sep 2, 2018

webknjaz Sep 2, 2018 •

edited

Loading

asvetlov Sep 2, 2018

asvetlov commented Sep 2, 2018

lock bot commented Oct 28, 2019

Make METHRE RFC-7230 compliant #3235

Make METHRE RFC-7230 compliant #3235

Conversation

kxepal commented Sep 2, 2018

kxepal commented Sep 2, 2018

asvetlov commented Sep 2, 2018

kxepal commented Sep 2, 2018

codecov-io commented Sep 2, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kxepal Sep 2, 2018 • edited Loading

Choose a reason for hiding this comment

This comment was marked as outdated.

Choose a reason for hiding this comment

webknjaz Sep 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asvetlov commented Sep 2, 2018

lock bot commented Oct 28, 2019

codecov-io commented Sep 2, 2018 •

edited

Loading

kxepal Sep 2, 2018 •

edited

Loading

webknjaz Sep 2, 2018 •

edited

Loading