-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce overhead to check if a host is an IP Address #9095
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #9095 +/- ##
=======================================
Coverage 98.30% 98.30%
=======================================
Files 107 107
Lines 34403 34403
Branches 4074 4081 +7
=======================================
Hits 33819 33819
Misses 412 412
Partials 172 172
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
If this is a performance concern, wouldn't it make more sense to make this a much simpler heuristic? I believe we're only trying to decide if the user is attempting a request via an IP or a domain. There doesn't seem to be any real need to validate that the syntax is correct. e.g. If we invert it to check if it looks like a domain, we could probably get away with something like |
Yeah I think that makes a lot more sense. I'll audit all usage to see if we actually care if its valid or not |
is_ip_address is only used in cookies Line 180 in 2b5a4b9
hostname comes from yarl as raw_host , it cannot contain a port
Line 339 in 2b5a4b9
hostname comes from yarl as raw_host , it cannot contain a port
Line 282 in 2b5a4b9
-- Line 211 in 2b5a4b9
yarl , will never contain a port-- Line 153 in 2b5a4b9
is_ipv4_address is never called directly is_ipv6_address is only used in client_reqrep to add aiohttp/aiohttp/client_reqrep.py Line 361 in 2b5a4b9
hostname comes from yarl as raw_host , it cannot contain a port
aiohttp/aiohttp/client_reqrep.py Line 615 in 2b5a4b9
hostname comes from yarl as raw_host , it cannot contain a port
I don't think any of the use cases need to validate |
Its the IPv6 regex that is heavy
import re
import timeit
_ipv4_pattern = (
r"^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}"
r"(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$"
)
_ipv6_pattern = (
r"^(?:(?:(?:[A-F0-9]{1,4}:){6}|(?=(?:[A-F0-9]{0,4}:){0,6}"
r"(?:[0-9]{1,3}\.){3}[0-9]{1,3}$)(([0-9A-F]{1,4}:){0,5}|:)"
r"((:[0-9A-F]{1,4}){1,5}:|:)|::(?:[A-F0-9]{1,4}:){5})"
r"(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}"
r"(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])|(?:[A-F0-9]{1,4}:){7}"
r"[A-F0-9]{1,4}|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}$)"
r"(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)|(?:[A-F0-9]{1,4}:){7}"
r":|:(:[A-F0-9]{1,4}){7})$"
)
_ipv4_regex = re.compile(_ipv4_pattern)
_ipv6_regex = re.compile(_ipv6_pattern, flags=re.IGNORECASE)
_ipv4_regexb = re.compile(_ipv4_pattern.encode("ascii"))
_ipv6_regexb = re.compile(_ipv6_pattern.encode("ascii"), flags=re.IGNORECASE)
host_str = "1.2.3.4"
print (['ipv4 string - decode + isdigit', timeit.timeit('host_str.replace(".","").isdigit()',globals=locals())])
print (['ipv4 string - regex', timeit.timeit('_ipv4_regex.search(host_str)',globals=locals())])
host_bin = b"1.2.3.4"
print(['ipv4 binary - decode + isdigit', timeit.timeit('host_bin.decode("ascii").replace(".","").isdigit()',globals=locals())])
print(['ipv4 binary - regex', timeit.timeit('_ipv4_regexb.search(host_bin)',globals=locals())])
host_str = "2001:db8::ff00:42:8329"
print(['ipv6 string - check for :', timeit.timeit('":" in host_str',globals=locals())])
print(['ipv6 string - regex', timeit.timeit('_ipv6_regex.search(host_str)',globals=locals())])
host_bin = b"2001:db8::ff00:42:8329"
print(['ipv6 binary - check for :', timeit.timeit('b":" in host_bin',globals=locals())])
print(['ipv6 binary - regex', timeit.timeit('_ipv6_regexb.search(host_bin)',globals=locals())]) |
Strictly speaking, you could even do |
I don't think we can do that since domains are allowed to start with a number I'm not sure there any TLDs that end with a number so |
Oh, that actually violates the DNS syntax in the spec... |
Nevermind, it was updated in https://www.rfc-editor.org/rfc/rfc1101 (section 3.1) |
It says that |
|
Tested on a few production Home Assistant instances. There are lots of places that use cookies and ip addresses so this has some good real world use testing. |
Backport to 3.10: 💚 backport PR created✅ Backport PR branch: Backported as #9096 🤖 @patchback |
(cherry picked from commit ffcf9dc)
Backport to 3.11: 💚 backport PR created✅ Backport PR branch: Backported as #9097 🤖 @patchback |
(cherry picked from commit ffcf9dc)
… is an IP Address (#9096) Co-authored-by: J. Nick Koston <nick@koston.org>
… is an IP Address (#9097) Co-authored-by: J. Nick Koston <nick@koston.org>
What do these changes do?
The IP Address regexes are a bit complex and take up ~35% of the time spent in
update_headers
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client_reqrep.py#L361The checks for IP Addresses did not need to validate that the IP Address is valid, only that it was not a domain name #9095 (comment) which means they can be much simpler.
Timings (more details below)
Are there changes in behavior for the user?
no
Is it a substantial burden for the maintainers to support this?
no
before
after