-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: bpo-34990: year 2038 problem in compileall.py #9892
Conversation
Use 'L' as an unsigned long for the timestamp
|
@bmwiedemann yep, I was in the plane from PySS and I just commited my code, and now, I just saw the test of @tirkarthi . I am going to include it. |
…-01-20 Co-authored-by: Karthikeyan Singaravelan <tir.karthi@gmail.com>
@matrixise Thanks for the PR. I think it's a NEWS entry worthy fix to add a short note that |
@matrixise Sorry, I just noticed you have added one :) |
@tirkarthi please, could you check my PR with the last commits. |
btw: the first bad mtime is 0x80000000 or 2147483648 or 2038-01-19T03:14:08+00:00 |
def test_compile_all_2038(self): | ||
with open(self.source_path, 'r') as f: | ||
os.utime(f.name, (2147558400, 2147558400)) # Jan 20, 2038 as touch | ||
self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, but maybe this is not a good place in this file to add the test in, because SOURCE_DATE_EPOCH changes the default .pyc type to be hash-based, which would not hit the error condition. See #9607
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, just copy the code of @tirkarthi . I will wait for a review from @vstinner or an other core-dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be moved to CompileallTestsBase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@matrixise Thanks, Looks good to me from the context of the issue reported but I have limited understanding about the code so I hope someone else might have a better look. Seems there was some discussion about handling 2038 problem as part of the original PR where this code was added : #4575 (comment) |
I think the discussion in PR 4575 was about future-proofing the new .pyc format, but it failed to notice that by 2038, only 31 bits are exhausted and bit 32 (which was erroneously interpreted as the sign-bit here), allows to keep the format alive until 2106 - which we can safely assume out-of-scope for today. |
This is the reason why I move to 'L' instead of 'l' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check also other parts of the stdlib that work with mtime in pyc files. For example zipimport and checkpyc.py need updates. Maybe there are other parts.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
Lib/compileall.py
Outdated
@@ -139,7 +139,7 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0, | |||
if not force: | |||
try: | |||
mtime = int(os.stat(fullname).st_mtime) | |||
expect = struct.pack('<4sll', importlib.util.MAGIC_NUMBER, | |||
expect = struct.pack('<4slL', importlib.util.MAGIC_NUMBER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would be better to handle the second field as an unsigned integer too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
importlib uses int(mtime) & 0xFFFFFFFF, I suggest... int(mtime) & 0xFFFF_FFFF for readability :-D
@serhiy-storchaka last update for In https://github.com/python/cpython/blob/master/Lib/zipimport.py#L602 you convert the dword as a uint32 and LGTM. |
@vstinner and @serhiy-storchaka I have fixed mtime with 0xFFFF_FFFF and use an unsigned long, where I found |
Tools/scripts/checkpyc.py
Outdated
@@ -51,7 +51,7 @@ def main(): | |||
print('Bad MAGIC word in ".pyc" file', end=' ') | |||
print(repr(name_c)) | |||
continue | |||
mtime = get_long(mtime_str) | |||
mtime = get_long(mtime_str) & 0xFFFF_FFFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a NOP, because mtime_str
is just 4 bytes, that cannot encode anything larger than 0xFFFF_FFFF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I am not familiar with the operations on the bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zipimport needs truncating the timestamp to 32 bit for comparison.
Also update _code_to_timestamp_pyc() in _bootstrap_external.
@@ -51,7 +51,7 @@ def main(): | |||
print('Bad MAGIC word in ".pyc" file', end=' ') | |||
print(repr(name_c)) | |||
continue | |||
mtime = get_long(mtime_str) | |||
mtime = get_long(mtime_str) & 0xFFFF_FFFF | |||
if mtime in {0, -1}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mtime can't be -1 after adding & 0xFFFF_FFFF
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add the 0xFFFF_FFFF on the line 57?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size): | |||
else: | |||
mtime = int(-0x100000000 + int(mtime)) | |||
pyc = (importlib.util.MAGIC_NUMBER + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the above few lines can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the lines from 38->41?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. They are roughly equivalent to mtime = int(mtime) & 0xFFFF_FFFF
if use an unsigned integer format.
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size): | |||
else: | |||
mtime = int(-0x100000000 + int(mtime)) | |||
pyc = (importlib.util.MAGIC_NUMBER + | |||
struct.pack("<iii", 0, int(mtime), size & 0xFFFFFFFF) + data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed format units are used.
Tools/scripts/checkpyc.py
Outdated
@@ -51,7 +51,7 @@ def main(): | |||
print('Bad MAGIC word in ".pyc" file', end=' ') | |||
print(repr(name_c)) | |||
continue | |||
mtime = get_long(mtime_str) | |||
mtime = get_long(mtime_str) & 0xFFFF_FFFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sees the definition of get_long(). It returns an unsigned 32-bit integer, and returns -1 only on error. You need to convert not it, but the file timestamp.
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size): | |||
else: | |||
mtime = int(-0x100000000 + int(mtime)) | |||
pyc = (importlib.util.MAGIC_NUMBER + | |||
struct.pack("<iii", 0, int(mtime) & 0xFFFF_FFFF, size & 0xFFFF_FFFF) + data) | |||
struct.pack("<iii", 0, int(mtime), size & 0xFFFF_FFFF) + data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this revert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just because @serhiy-storchaka indicated that the signed was used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that you need to update the format string.
Lib/importlib/_bootstrap_external.py
Outdated
@@ -586,7 +586,7 @@ def _code_to_timestamp_pyc(code, mtime=0, source_size=0): | |||
"Produce the data for a timestamp-based pyc." | |||
data = bytearray(MAGIC_NUMBER) | |||
data.extend(_pack_uint32(0)) | |||
data.extend(_pack_uint32(mtime)) | |||
data.extend(_pack_uint32(mtime) & 0xFFFF_FFFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is useless, _pack_uint32() already does "& 0xFFFFFFFF". importlib can be left unchanged.
Lib/test/test_compileall.py
Outdated
@@ -53,8 +53,8 @@ def add_bad_source_file(self): | |||
def timestamp_metadata(self): | |||
with open(self.bc_path, 'rb') as file: | |||
data = file.read(12) | |||
mtime = int(os.stat(self.source_path).st_mtime) | |||
compare = struct.pack('<4sll', importlib.util.MAGIC_NUMBER, 0, mtime) | |||
mtime = int(os.stat(self.source_path).st_mtime) & 0xFFFF_FFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I like the underscore: it's easier to spot bugs. You missed something :-)
def test_compile_all_2038(self): | ||
with open(self.source_path, 'r') as f: | ||
os.utime(f.name, (2147558400, 2147558400)) # Jan 20, 2038 as touch | ||
self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be moved to CompileallTestsBase.
@@ -0,0 +1 @@ | |||
Use an unsigned long for the timestamp for the .pyc file in compileall.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You describe the fix, but you should describe the fixed bug instead: compileall now supports timestamps newer than year 2038.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -54,7 +54,7 @@ def main(): | |||
mtime = get_long(mtime_str) | |||
if mtime in {0, -1}: | |||
print('Bad ".pyc" file', repr(name_c)) | |||
elif mtime != st[ST_MTIME]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this code should be modified, but it is modified: should we also cast to int()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like that?
elif int(mtime) != int(st[ST_MTIME] & 0xFFFF_FFFF)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mtime
is already int. But st[ST_MTIME]
is float.
@serhiy-storchaka can you continue the review with me, because I am a little bit like a chicken without head ;-) I try to understand with |
_pack_uint32() already does "& 0xFFFFFFFF"
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size): | |||
else: | |||
mtime = int(-0x100000000 + int(mtime)) | |||
pyc = (importlib.util.MAGIC_NUMBER + | |||
struct.pack("<iii", 0, int(mtime) & 0xFFFF_FFFF, size & 0xFFFF_FFFF) + data) | |||
struct.pack("<iii", 0, int(mtime), size & 0xFFFF_FFFF) + data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that you need to update the format string.
@@ -51,7 +51,7 @@ def main(): | |||
print('Bad MAGIC word in ".pyc" file', end=' ') | |||
print(repr(name_c)) | |||
continue | |||
mtime = get_long(mtime_str) | |||
mtime = get_long(mtime_str) & 0xFFFF_FFFF | |||
if mtime in {0, -1}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@@ -54,7 +54,7 @@ def main(): | |||
mtime = get_long(mtime_str) | |||
if mtime in {0, -1}: | |||
print('Bad ".pyc" file', repr(name_c)) | |||
elif mtime != st[ST_MTIME]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mtime
is already int. But st[ST_MTIME]
is float.
@@ -0,0 +1 @@ | |||
compileall now supports timestamps newer than year 2038 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use punctuation.
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size): | |||
else: | |||
mtime = int(-0x100000000 + int(mtime)) | |||
pyc = (importlib.util.MAGIC_NUMBER + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. They are roughly equivalent to mtime = int(mtime) & 0xFFFF_FFFF
if use an unsigned integer format.
|
||
class CompileallTestsWithSourceEpoch(CompileallTestsBase, | ||
unittest.TestCase, | ||
metaclass=SourceDateEpochTestMeta, | ||
source_date_epoch=True): | ||
pass | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change.
Due to the WIP in the title I added the DO-NOT-MERGE label. |
@brettcannon Thank you, I will continue to work on this issue on the next week, this week I am at the hospital with my daughter. |
Only 19 years to go... Can we get this moving again? |
Ping @matrixise: they are still some comments of @serhiy-storchaka that you didn't address. |
@vstinner pong, ok thanks for the reminder. |
So, I am not serious with this bug, I am going to continue on this bug, I would like to close all my PRs. |
FYI I ignore this issue since it's still tagged as "WIP". I prefer to wait until it's no longer a WIP to review it. |
Removing the import team from reviewing as this is still a WIP. We can be added back when it's ready to go. |
@matrixise any chance to get this finished this year? |
It's ok as soon as it's fixed before 2038 :-D |
I am afraid there are way too many places where software is used unchanged for a long time. Even 18 years. And of course this is not the only timebomb waiting to trigger. |
It was a joke. Hint: see the emoticon. |
Since Stéphane said:
I opened up #19651 with the more permanent solution of using a 64-bit timestamp value as Benjamin originally suggested in their PEP 552 PR. |
I pinged @matrixise about this and they said I can feel free to take this on since they're too busy to move forward with it. #19708 is a continuation of this PR to make It should be possible to compare the code for both the solutions now. |
I guess this can be close since #19708 is merged |
Indeed. Thanks for noticing. |
Use 'L' as an unsigned long for the timestamp
https://bugs.python.org/issue34990