Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: bpo-34990: year 2038 problem in compileall.py #9892

Closed
wants to merge 12 commits into from

Conversation

matrixise
Copy link
Member

@matrixise matrixise commented Oct 15, 2018

Use 'L' as an unsigned long for the timestamp

https://bugs.python.org/issue34990

Use 'L' as an unsigned long for the timestamp
@bmwiedemann
Copy link
Contributor

Lib/test/test_compileall.py also contains 2 instances of <4sll that probably should be updated accordingly.
and you could add the test code provided by xtreak in the bug.

@matrixise
Copy link
Member Author

@bmwiedemann yep, I was in the plane from PySS and I just commited my code, and now, I just saw the test of @tirkarthi . I am going to include it.

@tirkarthi
Copy link
Member

@matrixise Thanks for the PR. I think it's a NEWS entry worthy fix to add a short note that compileall now supports files with modification time greater than or equal to 19-01-2038 (I guess)

@tirkarthi
Copy link
Member

@matrixise Sorry, I just noticed you have added one :)

@matrixise
Copy link
Member Author

@tirkarthi please, could you check my PR with the last commits.

@bmwiedemann
Copy link
Contributor

btw: the first bad mtime is 0x80000000 or 2147483648 or 2038-01-19T03:14:08+00:00

def test_compile_all_2038(self):
with open(self.source_path, 'r') as f:
os.utime(f.name, (2147558400, 2147558400)) # Jan 20, 2038 as touch
self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, but maybe this is not a good place in this file to add the test in, because SOURCE_DATE_EPOCH changes the default .pyc type to be hash-based, which would not hit the error condition. See #9607

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, just copy the code of @tirkarthi . I will wait for a review from @vstinner or an other core-dev.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be moved to CompileallTestsBase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tirkarthi
Copy link
Member

@matrixise Thanks, Looks good to me from the context of the issue reported but I have limited understanding about the code so I hope someone else might have a better look. Seems there was some discussion about handling 2038 problem as part of the original PR where this code was added : #4575 (comment)

@bmwiedemann
Copy link
Contributor

I think the discussion in PR 4575 was about future-proofing the new .pyc format, but it failed to notice that by 2038, only 31 bits are exhausted and bit 32 (which was erroneously interpreted as the sign-bit here), allows to keep the format alive until 2106 - which we can safely assume out-of-scope for today.

@matrixise
Copy link
Member Author

This is the reason why I move to 'L' instead of 'l'

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check also other parts of the stdlib that work with mtime in pyc files. For example zipimport and checkpyc.py need updates. Maybe there are other parts.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

@@ -139,7 +139,7 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
if not force:
try:
mtime = int(os.stat(fullname).st_mtime)
expect = struct.pack('<4sll', importlib.util.MAGIC_NUMBER,
expect = struct.pack('<4slL', importlib.util.MAGIC_NUMBER,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be better to handle the second field as an unsigned integer too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importlib uses int(mtime) & 0xFFFFFFFF, I suggest... int(mtime) & 0xFFFF_FFFF for readability :-D

@matrixise
Copy link
Member Author

@serhiy-storchaka last update for Tools/scripts/cherrypyc.py by in June 2013, we only read the magic number and timestamp, just that: https://github.com/python/cpython/blob/master/Tools/scripts/checkpyc.py#L45-L46

In https://github.com/python/cpython/blob/master/Lib/zipimport.py#L602 you convert the dword as a uint32 and LGTM.

@matrixise matrixise requested a review from a team October 17, 2018 12:14
@matrixise
Copy link
Member Author

@vstinner and @serhiy-storchaka I have fixed mtime with 0xFFFF_FFFF and use an unsigned long, where I found <4slL but for the other files, I am not an expert of the API and I don't know where I could search, zipimport and checkpy.py seem to be good.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a NOP, because mtime_str is just 4 bytes, that cannot encode anything larger than 0xFFFF_FFFF

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I am not familiar with the operations on the bits.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zipimport needs truncating the timestamp to 32 bit for comparison.

Also update _code_to_timestamp_pyc() in _bootstrap_external.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF
if mtime in {0, -1}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mtime can't be -1 after adding & 0xFFFF_FFFF.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the 0xFFFF_FFFF on the line 57?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the above few lines can be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lines from 38->41?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They are roughly equivalent to mtime = int(mtime) & 0xFFFF_FFFF if use an unsigned integer format.

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
struct.pack("<iii", 0, int(mtime), size & 0xFFFFFFFF) + data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed format units are used.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sees the definition of get_long(). It returns an unsigned 32-bit integer, and returns -1 only on error. You need to convert not it, but the file timestamp.

@matrixise matrixise closed this Oct 17, 2018
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
struct.pack("<iii", 0, int(mtime) & 0xFFFF_FFFF, size & 0xFFFF_FFFF) + data)
struct.pack("<iii", 0, int(mtime), size & 0xFFFF_FFFF) + data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this revert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just because @serhiy-storchaka indicated that the signed was used here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that you need to update the format string.

@@ -586,7 +586,7 @@ def _code_to_timestamp_pyc(code, mtime=0, source_size=0):
"Produce the data for a timestamp-based pyc."
data = bytearray(MAGIC_NUMBER)
data.extend(_pack_uint32(0))
data.extend(_pack_uint32(mtime))
data.extend(_pack_uint32(mtime) & 0xFFFF_FFFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is useless, _pack_uint32() already does "& 0xFFFFFFFF". importlib can be left unchanged.

@@ -53,8 +53,8 @@ def add_bad_source_file(self):
def timestamp_metadata(self):
with open(self.bc_path, 'rb') as file:
data = file.read(12)
mtime = int(os.stat(self.source_path).st_mtime)
compare = struct.pack('<4sll', importlib.util.MAGIC_NUMBER, 0, mtime)
mtime = int(os.stat(self.source_path).st_mtime) & 0xFFFF_FFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I like the underscore: it's easier to spot bugs. You missed something :-)

def test_compile_all_2038(self):
with open(self.source_path, 'r') as f:
os.utime(f.name, (2147558400, 2147558400)) # Jan 20, 2038 as touch
self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be moved to CompileallTestsBase.

@@ -0,0 +1 @@
Use an unsigned long for the timestamp for the .pyc file in compileall.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You describe the fix, but you should describe the fixed bug instead: compileall now supports timestamps newer than year 2038.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -54,7 +54,7 @@ def main():
mtime = get_long(mtime_str)
if mtime in {0, -1}:
print('Bad ".pyc" file', repr(name_c))
elif mtime != st[ST_MTIME]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this code should be modified, but it is modified: should we also cast to int()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like that?

elif int(mtime) != int(st[ST_MTIME] & 0xFFFF_FFFF)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mtime is already int. But st[ST_MTIME] is float.

@matrixise
Copy link
Member Author

@serhiy-storchaka can you continue the review with me, because I am a little bit like a chicken without head ;-) I try to understand with 0xFFFF_FFFF ;-)

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
struct.pack("<iii", 0, int(mtime) & 0xFFFF_FFFF, size & 0xFFFF_FFFF) + data)
struct.pack("<iii", 0, int(mtime), size & 0xFFFF_FFFF) + data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that you need to update the format string.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF
if mtime in {0, -1}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@@ -54,7 +54,7 @@ def main():
mtime = get_long(mtime_str)
if mtime in {0, -1}:
print('Bad ".pyc" file', repr(name_c))
elif mtime != st[ST_MTIME]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mtime is already int. But st[ST_MTIME] is float.

@@ -0,0 +1 @@
compileall now supports timestamps newer than year 2038
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use punctuation.

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They are roughly equivalent to mtime = int(mtime) & 0xFFFF_FFFF if use an unsigned integer format.


class CompileallTestsWithSourceEpoch(CompileallTestsBase,
unittest.TestCase,
metaclass=SourceDateEpochTestMeta,
source_date_epoch=True):
pass


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change.

@matrixise matrixise changed the title bpo-34990: year 2038 problem in compileall.py WIP: bpo-34990: year 2038 problem in compileall.py Oct 25, 2018
@brettcannon
Copy link
Member

Due to the WIP in the title I added the DO-NOT-MERGE label.

@matrixise
Copy link
Member Author

@brettcannon Thank you, I will continue to work on this issue on the next week, this week I am at the hospital with my daughter.

@bmwiedemann
Copy link
Contributor

Only 19 years to go... Can we get this moving again?

@vstinner
Copy link
Member

Ping @matrixise: they are still some comments of @serhiy-storchaka that you didn't address.

@matrixise
Copy link
Member Author

@vstinner pong, ok thanks for the reminder.

@matrixise
Copy link
Member Author

So, I am not serious with this bug, I am going to continue on this bug, I would like to close all my PRs.

@vstinner
Copy link
Member

@vstinner pong, ok thanks for the reminder.

FYI I ignore this issue since it's still tagged as "WIP". I prefer to wait until it's no longer a WIP to review it.

@brettcannon brettcannon removed the request for review from a team April 17, 2019 22:45
@brettcannon
Copy link
Member

Removing the import team from reviewing as this is still a WIP. We can be added back when it's ready to go.

@bmwiedemann
Copy link
Contributor

@matrixise any chance to get this finished this year?

@vstinner
Copy link
Member

@matrixise any chance to get this finished this year?

It's ok as soon as it's fixed before 2038 :-D

@bmwiedemann
Copy link
Contributor

It's ok as soon as it's fixed before 2038 :-D

I am afraid there are way too many places where software is used unchanged for a long time. Even 18 years. And of course this is not the only timebomb waiting to trigger.

@vstinner
Copy link
Member

It was a joke. Hint: see the emoticon.

@ammaraskar
Copy link
Member

Since Stéphane said:

I would like to close all my PRs.

I opened up #19651 with the more permanent solution of using a 64-bit timestamp value as Benjamin originally suggested in their PEP 552 PR.

@ammaraskar
Copy link
Member

I pinged @matrixise about this and they said I can feel free to take this on since they're too busy to move forward with it. #19708 is a continuation of this PR to make compileall use an unsigned int.

It should be possible to compare the code for both the solutions now.

@tirkarthi
Copy link
Member

I guess this can be close since #19708 is merged

@encukou
Copy link
Member

encukou commented Aug 24, 2021

Indeed. Thanks for noticing.

@encukou encukou closed this Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants