bpo-47040: improve document of checksum functions #31955

ghost · 2022-03-17T05:57:24Z

Since CPython 3.0.0, the checksums are always truncated to unsigned int:
zlib.adler32(): https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L930
zlib.crc32(): https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L950
binascii.crc32(): https://github.com/python/cpython/blob/v3.0/Modules/binascii.c#L1035

Also polish the code.

https://bugs.python.org/issue47040

Modules/zlibmodule.c

Since CPython 3.0.0, the checksums are always truncated to `unsigned int`.

gpshead

mostly: keep the versionchanged, just make the suggested minor edit to the text.

Doc/library/binascii.rst

gpshead · 2022-03-18T00:47:17Z

Lib/test/test_binascii.py

@@ -241,6 +242,15 @@ def test_crc32(self):

        self.assertRaises(TypeError, binascii.crc32)

+    def test_random_crc32(self):
+        dat = random.randbytes(1234)


Tests that run on random data have a good chance of being flakey when a problem exists rather than reliably reproducing the problem. The specific data or the seed used to generate it needs to be recorded and present in test output to meaningfully debug the problem.

The flakiness aspect can be improved by using a lot of runs over different random data (statistically likely to always trigger the problem being tested for on at least one random input) or by using a smaller set of strategically chosen inputs with desirable output values to test against.

For crc32 I suggest just chosing a few strategic inputs that generate low and high crc32 values that might have illustrated this issue and need for the & mask in Python 2.

For crc32 I suggest just chosing a few strategic inputs that generate low and high crc32 values that might have illustrated this issue and need for the & mask in Python 2.

I ran an alder32()/crc32() test code, tested nearly 2 billion random 1KiB strings, no one result is greater than UINT32_MAX, although the type of return value is uLong (unsigned long).

because >UINT32_MAX is impossible. this is a 32bit value.

it isn't clear what the purpose of this test is. what does it reliably prevent from happening?

it will pass because everything is correct today. I'm more interested in when and why would it fail and does it do so is a consistent reliable reproducable manner that can be debugged from the test failure.

the thing I had assumed this test was testing was that a negative value is never returned, as would happen half the time in 32-bit python 2. effectively a regression test for an old situation that py3's implementation can never trigger. if you want it to be a regression test, it should use values that would've triggered the regression: Ensure it includes a hash in the range 0x8000_0000 - 0xffff_ffff. those are what would've been negative long ago. So rather than random data, I'd pick a known input that has a crc in that range to cover that case.

if you want a test that guarantees a crc will never be greater than 0xffff_ffff, there is no input you can give a crc or adler 32 algorithm that will generate such a thing.

For that to happen the underlying implementation itself would need to be fundamentally flawed. If that is what you intend this test to prevent, it could do so; but not reliably by using a single random input. so it doesn't seem worth having.

I remove the unit-tests, since no code actually changed.

bedevere-bot · 2022-03-18T00:52:19Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

ghost · 2022-03-19T03:42:02Z

I modified the NEWS file:

-Internal cleanup to :func:`zlib.crc32` / :func:`binascii.crc32` to not use
-an intermediate signed value. No functional change. Clarified the old Python
-versions compatiblity note in the docstrings.
+Clarified the old Python versions compatiblity note of :func:`binascii.crc32` /
+:func:`zlib.adler32` / :func:`zlib.crc32` functions.

Internal cleanup to :func:zlib.crc32 / :func:binascii.crc32 to not use an intermediate signed value. No functional change.

This is an internal change, not visiable to users, so I removed it.

Now this PR only modifies the doc and polishes the code slightly.
After this PR, let's fix the USE_ZLIB_CRC32 code path bug mentioned by issue47040 in another issue.

ghost · 2022-03-19T05:52:05Z

This PR polished the code.
If this PR doesn't fit to backport to the 3.9/3.10 branch, I'll make another PR with no code polished for backport.

…H-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org>

) (pythonGH-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 6d290d5) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>

…H-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 6d290d5) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>

) (pythonGH-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 6d290d5) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>

bedevere-bot added the awaiting review label Mar 17, 2022

the-knights-who-say-ni added the CLA signed label Mar 17, 2022

JelleZijlstra reviewed Mar 17, 2022

View reviewed changes

Modules/zlibmodule.c Outdated Show resolved Hide resolved

ghost changed the title ~~bpo-47040: remove an invalid document of zlib module~~ bpo-47040: remove invalid document of checksum functions Mar 17, 2022

ghost commented Mar 17, 2022

View reviewed changes

Modules/zlibmodule.c Show resolved Hide resolved

ghost commented Mar 17, 2022

View reviewed changes

Modules/zlibmodule.c Show resolved Hide resolved

wjssz and others added 2 commits March 17, 2022 17:41

remove invalid document of checksum functions

aa812f8

Since CPython 3.0.0, the checksums are always truncated to `unsigned int`.

reworded.

03ec827

gpshead requested changes Mar 18, 2022

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels Mar 18, 2022

ghost changed the title ~~bpo-47040: remove invalid document of checksum functions~~ bpo-47040: improve document of checksum functions Mar 18, 2022

gpshead and others added 6 commits March 18, 2022 10:57

update versionchanged text to be more explicit.

c123b56

update the versionchanged text to be more explicit

2e70760

indentation

3bdee12

indentation

10e83fa

remove unit-tests

7db926d

improve NEWS

78a7940

gpshead approved these changes Mar 19, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels Mar 19, 2022

gpshead merged commit b3f2d4c into python:main Mar 19, 2022

bedevere-bot removed the awaiting merge label Mar 19, 2022

ghost deleted the zlib_hash_doc branch March 20, 2022 04:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-47040: improve document of checksum functions #31955

bpo-47040: improve document of checksum functions #31955

Uh oh!

ghost commented Mar 17, 2022 •

edited by ghost

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gpshead left a comment

Uh oh!

Uh oh!

gpshead Mar 18, 2022

Uh oh!

ghost Mar 18, 2022

Uh oh!

gpshead Mar 18, 2022

Uh oh!

ghost Mar 19, 2022

Uh oh!

bedevere-bot commented Mar 18, 2022

Uh oh!

ghost commented Mar 19, 2022

Uh oh!

ghost commented Mar 19, 2022 •

edited by ghost

Loading

Uh oh!

Uh oh!

Uh oh!

bpo-47040: improve document of checksum functions #31955

bpo-47040: improve document of checksum functions #31955

Uh oh!

Conversation

ghost commented Mar 17, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gpshead left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gpshead Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

ghost Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

gpshead Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

ghost Mar 19, 2022

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Mar 18, 2022

Uh oh!

ghost commented Mar 19, 2022

Uh oh!

ghost commented Mar 19, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ghost commented Mar 17, 2022 •

edited by ghost

Loading

ghost commented Mar 19, 2022 •

edited by ghost

Loading