Skip to content

bpo-39039: tarfile raises descriptive exception from zlib.error #27766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 29, 2021

Conversation

jdevries3133
Copy link
Contributor

@jdevries3133 jdevries3133 commented Aug 14, 2021

  • during tarfile parsing, a zlib error indicates invalid data
  • tarfile.open now raises a descriptive exception from the zlib error
  • this makes it clear to the user that they may be trying to open a
    corrupted tar file

I think that this small changes achieves a fix for the bpo. Tests still pass,
so that's a good sign that maybe this change does not introduce side-effects,
but I am hoping that someone with knowledge of this library help me think
of some edge cases to try out.

Please note that I am a relative novice developer, and while this patch is
good to the best of my understanding, my understanding is not the best!

https://bugs.python.org/issue39039

* during tarfile parsing, a zlib error indicates invalid data
* tarfile.open now raises a descriptive exception from the zlib error
* this makes it clear to the user that they may be trying to open a
  corrupted tar file
Copy link
Contributor

@ambv ambv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. This will need some changes and a test.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@ethanfurman ethanfurman self-assigned this Aug 17, 2021
@jdevries3133
Copy link
Contributor Author

@ambv I am having a hard time figuring out how to reproduce this other than just using the file originally attached on the bpo. See my comment on the bpo, and I appreciate if you can offer some help! https://bugs.python.org/issue39039

@jdevries3133 jdevries3133 marked this pull request as draft August 18, 2021 19:20
@jdevries3133
Copy link
Contributor Author

@ambv ok, I implemented all the suggested changes and also added a test. However, the test is obviously not good because I want the test to genuinely reproduce the bug condition by passing in a file which actually causes the zlib error to be raised. However, I can't do that because I don't really know how the file uploaded to the bpo can be reproduced.

I also haven't gone through with investigating lzma or bz2 files. Partly because I can't figure out how to corrupt files in a way to elicit the problem response. I haven't been able to make lzma or bz2 archives that fit the edge case because I don't know how.

I'm going to keep experimenting with this and try to learn more. If you or anyone else has any insight into how the corrupt bpo-uploaded file was created, I would appreciate any advice because I have had no luck.

Also, note that the history of what I have tried and what I (don't) understand is written out here in a bit more detail:

https://gist.github.com/jdevries3133/acbb5ba2a19093d3bcc214733ef85e5a

@ambv
Copy link
Contributor

ambv commented Aug 22, 2021

Since Ethan is interested in this, I won't be approving or further requesting changes.

I also haven't gone through with investigating lzma or bz2 files. Partly because I can't figure out how to corrupt files in a way to elicit the problem response. I haven't been able to make lzma or bz2 archives that fit the edge case because I don't know how.

In this case leave this for a future improvement. Thanks for trying it out, Jack!

@jdevries3133 jdevries3133 marked this pull request as ready for review August 22, 2021 19:22
Comment on lines +2359 to +2360
except ImportError:
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: this particular raise will be pretty noisy since it will be a chained failure. But this is good, it's an unexpected issue for us and we want all the context we can get so that somebody might report a good bug to us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I figured. I could even do raise Exception('please submit a bug report!') from e but maybe that is too much.

@jdevries3133
Copy link
Contributor Author

@ambv ok, I all the feedback has been addressed.

Łukasz, if you do feel that there are more potential edge cases that I haven't been able to address, maybe you could summarize them on the bpo? The original poster of the bpo did add some additional context for how he made the file, so maybe in combination with that, others can look into further edge cases.

For the bot: I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@ambv: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from ambv August 22, 2021 19:27
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Sep 22, 2021
@ambv ambv removed the stale Stale PR or inactive for long period of time. label Sep 28, 2021
@ambv ambv merged commit b6fe857 into python:main Sep 29, 2021
@ambv ambv added needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes labels Sep 29, 2021
@miss-islington
Copy link
Contributor

Thanks @jdevries3133 for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @jdevries3133 for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @jdevries3133 and @ambv, I could not cleanly backport this to 3.9 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker b6fe8572509b77d2002eaddf99d718e9b4835684 3.9

@miss-islington miss-islington assigned ambv and unassigned ethanfurman Sep 29, 2021
@miss-islington
Copy link
Contributor

Sorry @jdevries3133 and @ambv, I had trouble checking out the 3.10 backport branch.
Please backport using cherry_picker on command line.
cherry_picker b6fe8572509b77d2002eaddf99d718e9b4835684 3.10

ambv pushed a commit to ambv/cpython that referenced this pull request Sep 29, 2021
pythonGH-27766)

* during tarfile parsing, a zlib error indicates invalid data
* tarfile.open now raises a descriptive exception from the zlib error
* this makes it clear to the user that they may be trying to open a
  corrupted tar file
(cherry picked from commit b6fe857)

Co-authored-by: Jack DeVries <58614260+jdevries3133@users.noreply.github.com>
@bedevere-bot
Copy link

GH-28613 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Sep 29, 2021
ambv pushed a commit to ambv/cpython that referenced this pull request Sep 29, 2021
…pythonGH-27766)

* during tarfile parsing, a zlib error indicates invalid data
* tarfile.open now raises a descriptive exception from the zlib error
* this makes it clear to the user that they may be trying to open a
  corrupted tar file.
(cherry picked from commit b6fe857)

Co-authored-by: Jack DeVries <58614260+jdevries3133@users.noreply.github.com>
@bedevere-bot
Copy link

GH-28614 is a backport of this pull request to the 3.9 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label Sep 29, 2021
ambv added a commit that referenced this pull request Sep 29, 2021
GH-27766) (GH-28613)

* during tarfile parsing, a zlib error indicates invalid data
* tarfile.open now raises a descriptive exception from the zlib error
* this makes it clear to the user that they may be trying to open a
  corrupted tar file
(cherry picked from commit b6fe857)

Co-authored-by: Jack DeVries <58614260+jdevries3133@users.noreply.github.com>
ambv added a commit that referenced this pull request Sep 29, 2021
…GH-27766) (GH-28614)

* during tarfile parsing, a zlib error indicates invalid data
* tarfile.open now raises a descriptive exception from the zlib error
* this makes it clear to the user that they may be trying to open a
  corrupted tar file.
(cherry picked from commit b6fe857)

Co-authored-by: Jack DeVries <58614260+jdevries3133@users.noreply.github.com>
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot PPC64LE Fedora Stable 3.9 has failed when building commit 7bff4d3.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/397/builds/207) and take a look at the build logs.
  4. Check if the failure is related to this commit (7bff4d3) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/397/builds/207

Failed tests:

  • test_unicodedata

Failed subtests:

  • test_normalization - test.test_unicodedata.NormalizationTest

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

412 tests OK.

10 slowest tests:

  • test_hashlib: 6 min 23 sec
  • test_urllib2net: 6 min
  • test_gdb: 3 min 56 sec
  • test_concurrent_futures: 3 min 38 sec
  • test_codecmaps_kr: 3 min 34 sec
  • test_tokenize: 3 min 11 sec
  • test_multiprocessing_spawn: 3 min 11 sec
  • test_unicodedata: 3 min 2 sec
  • test_unparse: 2 min 54 sec
  • test_peg_generator: 2 min 38 sec

1 test failed:
test_unicodedata

12 tests skipped:
test_devpoll test_ioctl test_kqueue test_msilib test_startfile
test_tix test_tk test_ttk_guionly test_winconsoleio test_winreg
test_winsound test_zipfile64

1 re-run test:
test_unicodedata

Total duration: 13 min 45 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
    testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
    self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot PPC64LE RHEL7 3.9 has failed when building commit 7bff4d3.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/219/builds/233) and take a look at the build logs.
  4. Check if the failure is related to this commit (7bff4d3) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/219/builds/233

Failed tests:

  • test_unicodedata

Failed subtests:

  • test_normalization - test.test_unicodedata.NormalizationTest

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

411 tests OK.

10 slowest tests:

  • test_hashlib: 8 min 56 sec
  • test_codecmaps_jp: 4 min 5 sec
  • test_codecmaps_kr: 3 min 46 sec
  • test_codecmaps_cn: 3 min 35 sec
  • test_concurrent_futures: 3 min 29 sec
  • test_multiprocessing_spawn: 3 min 20 sec
  • test_unicodedata: 3 min 18 sec
  • test_tokenize: 3 min 17 sec
  • test_unparse: 3 min 3 sec
  • test_lib2to3: 2 min 20 sec

1 test failed:
test_unicodedata

12 tests skipped:
test_devpoll test_ioctl test_kqueue test_msilib test_startfile
test_tix test_tk test_ttk_guionly test_winconsoleio test_winreg
test_winsound test_zipfile64

2 re-run tests:
test_unicodedata test_urllib2net

1 test run no tests:
test_urllib2net

Total duration: 19 min 6 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
    self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
    testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot PPC64LE RHEL7 LTO + PGO 3.9 has failed when building commit 7bff4d3.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/532/builds/232) and take a look at the build logs.
  4. Check if the failure is related to this commit (7bff4d3) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/532/builds/232

Failed tests:

  • test_unicodedata

Failed subtests:

  • test_normalization - test.test_unicodedata.NormalizationTest

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

411 tests OK.

10 slowest tests:

  • test_hashlib: 7 min 39 sec
  • test_codecmaps_kr: 5 min 45 sec
  • test_codecmaps_jp: 4 min 57 sec
  • test_urllib2net: 4 min 6 sec
  • test_concurrent_futures: 3 min 15 sec
  • test_unicodedata: 2 min 55 sec
  • test_multiprocessing_spawn: 2 min 48 sec
  • test_tokenize: 2 min 45 sec
  • test_codecmaps_cn: 2 min 20 sec
  • test_unparse: 2 min 4 sec

1 test failed:
test_unicodedata

13 tests skipped:
test_devpoll test_gdb test_ioctl test_kqueue test_msilib
test_startfile test_tix test_tk test_ttk_guionly test_winconsoleio
test_winreg test_winsound test_zipfile64

1 re-run test:
test_unicodedata

Total duration: 17 min 26 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
    self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
    testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot PPC64LE RHEL7 LTO 3.9 has failed when building commit 7bff4d3.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/322/builds/233) and take a look at the build logs.
  4. Check if the failure is related to this commit (7bff4d3) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/322/builds/233

Failed tests:

  • test_unicodedata

Failed subtests:

  • test_normalization - test.test_unicodedata.NormalizationTest

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

412 tests OK.

10 slowest tests:

  • test_hashlib: 6 min 23 sec
  • test_codecmaps_kr: 4 min 12 sec
  • test_unicodedata: 3 min 34 sec
  • test_peg_generator: 3 min 31 sec
  • test_codecmaps_jp: 3 min 21 sec
  • test_codecmaps_cn: 3 min 19 sec
  • test_concurrent_futures: 3 min 15 sec
  • test_tokenize: 3 min 7 sec
  • test_multiprocessing_spawn: 2 min 55 sec
  • test_urllib2net: 2 min 19 sec

1 test failed:
test_unicodedata

12 tests skipped:
test_devpoll test_ioctl test_kqueue test_msilib test_startfile
test_tix test_tk test_ttk_guionly test_winconsoleio test_winreg
test_winsound test_zipfile64

1 re-run test:
test_unicodedata

Total duration: 12 min 13 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto/build/Lib/http/client.py", line 596, in _readinto_chunked
    n = self._safe_readinto(mvb)
http.client.IncompleteRead: IncompleteRead(51 bytes read, 1893 more expected)


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
    testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
    testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
http.client.IncompleteRead: IncompleteRead(0 bytes read)


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
    self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot PPC64LE Fedora Stable LTO + PGO 3.9 has failed when building commit 7bff4d3.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/46/builds/210) and take a look at the build logs.
  4. Check if the failure is related to this commit (7bff4d3) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/46/builds/210

Failed tests:

  • test_unicodedata

Failed subtests:

  • test_normalization - test.test_unicodedata.NormalizationTest

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

411 tests OK.

10 slowest tests:

  • test_codecmaps_jp: 6 min 39 sec
  • test_hashlib: 6 min 38 sec
  • test_codecmaps_cn: 4 min 2 sec
  • test_unicodedata: 3 min 19 sec
  • test_concurrent_futures: 2 min 44 sec
  • test_codecmaps_tw: 2 min 11 sec
  • test_multiprocessing_spawn: 1 min 50 sec
  • test_tokenize: 1 min 25 sec
  • test_codecmaps_hk: 1 min 15 sec
  • test_multiprocessing_forkserver: 1 min 13 sec

1 test failed:
test_unicodedata

13 tests skipped:
test_devpoll test_gdb test_ioctl test_kqueue test_msilib
test_startfile test_tix test_tk test_ttk_guionly test_winconsoleio
test_winreg test_winsound test_zipfile64

1 re-run test:
test_unicodedata

Total duration: 9 min 43 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/multiprocessing/resource_tracker.py", line 201, in main
    cache[rtype].remove(name)
KeyError: '/psm_812a9b04'


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
    testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
http.client.IncompleteRead: IncompleteRead(0 bytes read)


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/multiprocessing/resource_tracker.py", line 201, in main
    cache[rtype].remove(name)
KeyError: '/psm_2b77cafb'


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
    self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
    testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/multiprocessing/resource_tracker.py", line 201, in main
    cache[rtype].remove(name)
KeyError: '/psm_11f35282'


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/http/client.py", line 596, in _readinto_chunked
    n = self._safe_readinto(mvb)
http.client.IncompleteRead: IncompleteRead(873 bytes read, 1780 more expected)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants