-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-47040: improve document of checksum functions #31955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since CPython 3.0.0, the checksums are always truncated to `unsigned int`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly: keep the versionchanged, just make the suggested minor edit to the text.
Lib/test/test_binascii.py
Outdated
@@ -241,6 +242,15 @@ def test_crc32(self): | |||
|
|||
self.assertRaises(TypeError, binascii.crc32) | |||
|
|||
def test_random_crc32(self): | |||
dat = random.randbytes(1234) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests that run on random data have a good chance of being flakey when a problem exists rather than reliably reproducing the problem. The specific data or the seed used to generate it needs to be recorded and present in test output to meaningfully debug the problem.
The flakiness aspect can be improved by using a lot of runs over different random data (statistically likely to always trigger the problem being tested for on at least one random input) or by using a smaller set of strategically chosen inputs with desirable output values to test against.
For crc32 I suggest just chosing a few strategic inputs that generate low and high crc32 values that might have illustrated this issue and need for the & mask in Python 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For crc32 I suggest just chosing a few strategic inputs that generate low and high crc32 values that might have illustrated this issue and need for the & mask in Python 2.
I ran an alder32()
/crc32()
test code, tested nearly 2 billion random 1KiB strings, no one result is greater than UINT32_MAX
, although the type of return value is uLong
(unsigned long).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because >UINT32_MAX is impossible. this is a 32bit value.
it isn't clear what the purpose of this test is. what does it reliably prevent from happening?
it will pass because everything is correct today. I'm more interested in when and why would it fail and does it do so is a consistent reliable reproducable manner that can be debugged from the test failure.
the thing I had assumed this test was testing was that a negative value is never returned, as would happen half the time in 32-bit python 2. effectively a regression test for an old situation that py3's implementation can never trigger. if you want it to be a regression test, it should use values that would've triggered the regression: Ensure it includes a hash in the range 0x8000_0000 - 0xffff_ffff. those are what would've been negative long ago. So rather than random data, I'd pick a known input that has a crc in that range to cover that case.
if you want a test that guarantees a crc will never be greater than 0xffff_ffff, there is no input you can give a crc or adler 32 algorithm that will generate such a thing.
For that to happen the underlying implementation itself would need to be fundamentally flawed. If that is what you intend this test to prevent, it could do so; but not reliably by using a single random input. so it doesn't seem worth having.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remove the unit-tests, since no code actually changed.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I modified the NEWS file: -Internal cleanup to :func:`zlib.crc32` / :func:`binascii.crc32` to not use
-an intermediate signed value. No functional change. Clarified the old Python
-versions compatiblity note in the docstrings.
+Clarified the old Python versions compatiblity note of :func:`binascii.crc32` /
+:func:`zlib.adler32` / :func:`zlib.crc32` functions.
This is an internal change, not visiable to users, so I removed it. Now this PR only modifies the doc and polishes the code slightly. |
This PR polished the code. |
…H-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org>
) (pythonGH-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 6d290d5) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
…H-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 6d290d5) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
) (pythonGH-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 6d290d5) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
Since CPython 3.0.0, the checksums are always truncated to
unsigned int
:zlib.adler32()
: https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L930zlib.crc32()
: https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L950binascii.crc32()
: https://github.com/python/cpython/blob/v3.0/Modules/binascii.c#L1035Also polish the code.
https://bugs.python.org/issue47040