-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-28009: Fix uuid.uuid1() and uuid.get_node() on AIX #8672
Conversation
correct the field seperator character of ethernet adapter MAC address on AIX
…sg306562 also, calculate correct value when the fields are not all two characters
Lib/uuid.py
Outdated
if word.count(_mac_delim) == 5: | ||
if len(word) == 17: | ||
mac = int(word.replace(_mac_delim, b''), 16) | ||
elif len(word) < 17 and len(word) >= 11: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change. Is it only needed on AIX? I would prefer to keep ==17 test on other platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I do not have access to other platforms I was trying to write something that was generic.
Roughly - first test for the _MAC_DELIM count.
Then - if the overall length is 17 the characters will be the correct hex value - when evaluated, so the splitting into individual hex values and adding and shifting is not needed.
Else: each individual hex value must be added in separately AFTER an 8/bit left shift of the previous value.
Otherwise a macaddr from:
en0 1500 link#2 fa.d1.8c.f7.62.4 1601514231 0 664547436 0 0
returns as:
fad18cf7624 (fad 18cf 7624, dec 17236120008228) rather than as
fad18cf76204 (fad1 8cf7 6204 , dec 275777920131588)
If you still want it changed, please comment again after push in about 5 minutes)
Misc/NEWS.d/next/Library/2018-08-04-12-26-11.bpo-28009.4JcHZb.rst
Outdated
Show resolved
Hide resolved
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
pushed a small change because there was a "phase failure" in one of the tests (agent did not start). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes done, or comments added
…e getnode() Move _netstat_getnode() find alg so that it can be tested using unittest.mock Add new test `test_uuid/find_mac_netstat` for AIX and netstat
I have made the requested changes; please review again. |
Thanks for making the requested changes! @vstinner: please review the changes made to this pull request. |
Lib/test/test_uuid.py
Outdated
command='netstat', | ||
args='-ia', | ||
hw_identifiers=b'Address', | ||
get_index=lambda x: x * 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you need to use i+1 here. It looks like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess my replies did not make it here.
a) It is "* 1" not plus 1
b) it may indicate a bug (not in the test I fear, but elsewhere in core) - when I was trying to use the find_mac code to parse netstat output I first tried "get_index=lambda x: x", but it always evaluated as get_index=lambda x: x + 1. I may have also tried x + 0, but I do not recall. Before merging I can experiment with other values - and if they are not working that is a different bug - not the one I am trying to resolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my recent changes, this LGTM.
I'd like to get approval from another core dev, though. Or perhaps @zestyping (the original author of the uuid module)?
This as per Victor Stinner's explicit request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Here are my two cents. I realize my suggestions involve substantial changes and I don't have prior context on the evolution of this PR, so feel free to take with a grain of salt. I hope they are somewhat helpful, though. I don't have authority over this file, merely opinions.
As a general note, it seems to me that it's more robust to write code that works for all cases (e.g. split on both colons and periods; always tolerate hex bytes that have one or two characters) rather than different behaviours for each platform, which might be more likely to break if the platform detection is incorrect or if the format changes slightly on a given platform.
Lib/uuid.py
Outdated
@@ -384,48 +394,103 @@ def _popen(command, *args): | |||
def _is_universal(mac): | |||
return not (mac & (1 << 41)) | |||
|
|||
def _find_mac(command, args, hw_identifiers, get_index): | |||
|
|||
# In the next two fucnctions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand the general approach here, and it makes sense. These two functions need a better explanation of what they do, though—the comments here aren't enough for a reader to understand how they're intended to work, and the argument names hw_identifiers
and f_index
are quite cryptic.
Here's an attempt at some better names and a docstring:
def _find_mac_near_keyword(command, args, keywords, get_word_index):
"""Searches the output of a command for a MAC address near a keyword.
Each line of words in the output is lowercased and searched for any of
the given keywords. Upon a match, get_word_index is invoked to pick a
word from the line, given the index of the match (e.g. lambda i: 0 gets the
first word on the line; lambda i: i - 1 gets the word preceding the keyword)."""
Given that _find_mac_nextlines
is supposed to look for words under column headings, there's no need for it to take an f_index
function, and the third argument can be named more clearly. Indeed, it would be misleading to call the third argument hw_identifiers
, as it's a single string, not a list of strings. Here's a try:
def _find_mac_under_heading(command, args, heading):
"""Looks for a MAC address under a heading in the output of a command.
The first line of words in the output is searched for the given heading.
Words at the same word index as the heading in subsequent lines are then
examined to see if they look like MAC addresses."""
Another concern I have is that the logic about what looks like a MAC address is mixed in with the logic about where to find it. It would be ideal to factor it out, like this:
def _find_near_keyword(command, args, keywords, get_word_index):
"""Yields selected nearby words for keywords found in the output of a command.
...
yield _parse_mac(word)
...
def _find_mac_under_heading(command, args, heading):
"""Yields words found under a heading in the output of a command.
The first line of words in the output is searched for the given heading.
For each subsequent line, the word at the same word index as the heading
in subsequent lines is then yielded."""
def _parse_mac(word):
"""Converts a string to a 48-bit integer if it looks like a MAC address, or returns None."""
# A MAC address is six bytes, each printed as one or two hex digits.
# They may be separated by ':' or '.' depending on the platform.
parts = word.replace('.', ':').split(':')
if len(parts) == 6 and all(len(part) <= 2 for part in parts):
hexstr = b''.join(part.rjust(2, b'0') for part in parts)
try:
return int(hexstr, 16)
except ValueError:
pass
def _select_best_mac(macs):
"""Chooses a MAC from the given iterable, preferring any universally administered
MAC address over locally administered MAC addresses."""
local_macs = []
for mac in macs:
if _is_universal(mac):
return mac
elif mac:
local_macs.append(mac)
if local_macs:
return local_macs[0]
Then:
_select_best_mac(_find_near_keyword('ifconfig', args, keywords, lambda i: i + 1))
_select_best_mac(_find_under_heading('netstat', '-ia', b'Address')
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Very interesting to read up better ways to do things, e.g., "parts = word.replace('.', ':').split(':')"
Thanks for the review, @zestyping! For now, I've implemented those of your suggestions which don't entail refactoring beyond the scope of adding support for AIX. I've also implemented @ncoghlan's suggestion of adding a At this point I've reviewed all existing comments on this PR and implemented all of those which seemed in scope. Assuming the tests pass, this is good to go IMO. @ncoghlan, @vstinner, what do you think? |
Also, should this be cherry-picked into 3.8? |
Also - @taleinat - many thanks for your assistance! |
Thanks, @taleinat!
That's a very reasonable constraint. Can you help me understand your thinking behind keeping |
I kept them due to @vstinner's request to have this change only affect AIX. Your suggestion could potentially have side-effects on all platforms. Also, the Windows-specific |
That makes sense. Thanks for explaining! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a few test skips that I think would be better off checking for the specific assumption(s) they're making, rather than checking for AIX as a platform, but I don't think those are a merge blocking issue.
Lib/test/test_uuid.py
Outdated
@@ -674,26 +674,57 @@ class TestUUIDWithExtModule(BaseTestUUID, unittest.TestCase): | |||
class BaseTestInternals: | |||
_uuid = py_uuid | |||
|
|||
# data is specific to AIX - with '.' as _MAC_DELIM | |||
# and strings shorter than 17 bytes (no leading 0) | |||
@unittest.skipUnless(_AIX, 'requires AIX') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that this is a valid test anywhere that those two conditions are true, so rather than testing for "AIX_" here, you can just test for the MAC notation conventions that the test cares about.
Are you looking at the latest version of this PR? I reworked all of those tests and there are no longer any such skip-checks. |
Also, assuming this is now ready to be merged, should it be back-ported to 3.8? (Seems like too big a change for 3.7.) |
@taleinat Indeed, I don't think GH was showing me the right thing (it's possible it was attempting to show me "changes since your last review" and the outcome was misleading). I've reviewed the actual latest version now, and my +1 still holds :) Agreed this is definitely out of scope for 3.7. The case with a 3.8 backport is murky - the challenge with all the AIX patches is that handling AIX properly is a new feature (since it behaves differently from other *nix platforms in some cases), so "fix bug that only occurs on AIX" isn't an obviously suitable candidate for backporting. Given that ambiguity, we should probably pass the question to @ambv as 3.8 release manager. |
The backport to 3.8 is not a show-stopper. Nice to have, as it was "developed" on 3.7 and 3.8, and time has moved on. Why not a show-stopper? The test suite does not see this as an error, so it does not affect the bot status. So, the backport is more about how much importance someone gives this. Thank you all for your time reviewing! |
FYI: linux et al use b':' to separate the 6 fields of a MAC address, also the fields are always two characters. On AIX the character b'.' is used, and when the decimal value is < 16, only one character is used.
Was: b'00:0a:10:20:30:40', on AIX - could be: b'0.a.10.20.30.40'
p.s. I hope this can be considered for backporting.
https://bugs.python.org/issue28009