Skip to content

bpo-47040: improve document of checksum functions #31955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from Mar 19, 2022
Merged

bpo-47040: improve document of checksum functions #31955

merged 8 commits into from Mar 19, 2022

Conversation

ghost
Copy link

@ghost ghost commented Mar 17, 2022

@ghost ghost changed the title bpo-47040: remove an invalid document of zlib module bpo-47040: remove invalid document of checksum functions Mar 17, 2022
wjssz and others added 2 commits March 17, 2022 17:41
Since CPython 3.0.0, the checksums are always truncated to `unsigned int`.
Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly: keep the versionchanged, just make the suggested minor edit to the text.

@@ -241,6 +242,15 @@ def test_crc32(self):

self.assertRaises(TypeError, binascii.crc32)

def test_random_crc32(self):
dat = random.randbytes(1234)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests that run on random data have a good chance of being flakey when a problem exists rather than reliably reproducing the problem. The specific data or the seed used to generate it needs to be recorded and present in test output to meaningfully debug the problem.

The flakiness aspect can be improved by using a lot of runs over different random data (statistically likely to always trigger the problem being tested for on at least one random input) or by using a smaller set of strategically chosen inputs with desirable output values to test against.

For crc32 I suggest just chosing a few strategic inputs that generate low and high crc32 values that might have illustrated this issue and need for the & mask in Python 2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For crc32 I suggest just chosing a few strategic inputs that generate low and high crc32 values that might have illustrated this issue and need for the & mask in Python 2.

I ran an alder32()/crc32() test code, tested nearly 2 billion random 1KiB strings, no one result is greater than UINT32_MAX, although the type of return value is uLong (unsigned long).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because >UINT32_MAX is impossible. this is a 32bit value.

it isn't clear what the purpose of this test is. what does it reliably prevent from happening?

it will pass because everything is correct today. I'm more interested in when and why would it fail and does it do so is a consistent reliable reproducable manner that can be debugged from the test failure.

the thing I had assumed this test was testing was that a negative value is never returned, as would happen half the time in 32-bit python 2. effectively a regression test for an old situation that py3's implementation can never trigger. if you want it to be a regression test, it should use values that would've triggered the regression: Ensure it includes a hash in the range 0x8000_0000 - 0xffff_ffff. those are what would've been negative long ago. So rather than random data, I'd pick a known input that has a crc in that range to cover that case.

if you want a test that guarantees a crc will never be greater than 0xffff_ffff, there is no input you can give a crc or adler 32 algorithm that will generate such a thing.

For that to happen the underlying implementation itself would need to be fundamentally flawed. If that is what you intend this test to prevent, it could do so; but not reliably by using a single random input. so it doesn't seem worth having.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove the unit-tests, since no code actually changed.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@ghost ghost changed the title bpo-47040: remove invalid document of checksum functions bpo-47040: improve document of checksum functions Mar 18, 2022
@ghost
Copy link
Author

ghost commented Mar 19, 2022

I modified the NEWS file:

-Internal cleanup to :func:`zlib.crc32` / :func:`binascii.crc32` to not use
-an intermediate signed value. No functional change. Clarified the old Python
-versions compatiblity note in the docstrings.
+Clarified the old Python versions compatiblity note of :func:`binascii.crc32` /
+:func:`zlib.adler32` / :func:`zlib.crc32` functions.

Internal cleanup to :func:zlib.crc32 / :func:binascii.crc32 to not use an intermediate signed value. No functional change.

This is an internal change, not visiable to users, so I removed it.

Now this PR only modifies the doc and polishes the code slightly.
After this PR, let's fix the USE_ZLIB_CRC32 code path bug mentioned by issue47040 in another issue.

@ghost
Copy link
Author

ghost commented Mar 19, 2022

This PR polished the code.
If this PR doesn't fit to backport to the 3.9/3.10 branch, I'll make another PR with no code polished for backport.

@gpshead gpshead merged commit b3f2d4c into python:main Mar 19, 2022
@ghost ghost deleted the zlib_hash_doc branch March 20, 2022 04:41
gpshead added a commit that referenced this pull request Mar 20, 2022
…H-32002)

Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier.
Also cleans up an unnecessary intermediate variable in the implementation.

Authored-By: Ma Lin / animalize
Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 20, 2022
) (pythonGH-32002)

Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier.
Also cleans up an unnecessary intermediate variable in the implementation.

Authored-By: Ma Lin / animalize
Co-authored-by: Gregory P. Smith <greg@krypto.org>
(cherry picked from commit 6d290d5)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
miss-islington added a commit that referenced this pull request Mar 20, 2022
…H-32002)

Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier.
Also cleans up an unnecessary intermediate variable in the implementation.

Authored-By: Ma Lin / animalize
Co-authored-by: Gregory P. Smith <greg@krypto.org>
(cherry picked from commit 6d290d5)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
hello-adam pushed a commit to hello-adam/cpython that referenced this pull request Jun 2, 2022
) (pythonGH-32002)

Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier.
Also cleans up an unnecessary intermediate variable in the implementation.

Authored-By: Ma Lin / animalize
Co-authored-by: Gregory P. Smith <greg@krypto.org>
(cherry picked from commit 6d290d5)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants