Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-1635741: Port _datetime module to multiphase initialization (PEP 489) #19122

Closed
wants to merge 1 commit into from

Conversation

phsilva
Copy link
Contributor

@phsilva phsilva commented Mar 23, 2020

@phsilva phsilva changed the title bpo-1635741: Port datetime module to multiphase initialization (PEP 489) bpo-1635741: Port _datetime module to multiphase initialization (PEP 489) Mar 23, 2020
@phsilva phsilva force-pushed the bpo-1635741-datetime branch from d45b8bd to 4b620ec Compare March 23, 2020 18:33
@pganssle
Copy link
Member

I'm not sure I totally understand what this multi-phase initialization is about, so it's hard for me to judge this change. PEP 489 gives a few reasons to adopt multi-phase initialization: subinterpreter support, making it easier to reload the module and storing arbitrary module-level resources in a garbage-collected manner. It also, curiously, says that in Py3 extension modules are not being added to sys.modules, but that does not seem to be the case for datetime?

>>> import sys
>>> import datetime
>>> "datetime" in sys.modules
True
>>> "_datetime" in sys.modules
True

For the other reasons, I'm not sure if this PR achieves them by simply mechanically separating the exec and init steps, because we still have things like the population of the global C API capsule. Do we have to do anything special for that? Is there any way to write a test that would check that we're doing this right?

I'm fine with migrating it over, but it's hard to tell if it's being done correctly if I don't have a clear idea of what we're looking to achieve here and what the consequences could be. Interestingly, I was just reading through PEP 489 the other day and considering porting over _datetimemodule.c, so this is quite a timely PR.

@vstinner
Copy link
Member

I would prefer to first land PR #19119 and then rebase this one on top of it.

@phsilva phsilva force-pushed the bpo-1635741-datetime branch from 4b620ec to 0ac3031 Compare March 25, 2020 00:46
@phsilva
Copy link
Contributor Author

phsilva commented Mar 25, 2020

I would prefer to first land PR #19119 and then rebase this one on top of it.

Just rebased on top of #19119.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the two issues that I spotted.

cc @corona10: that's a non-trivial module ;-)

@@ -6555,8 +6536,31 @@ PyInit__datetime(void)
us_per_day = PyLong_FromDouble(86400000000.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is wrong: if datetimemodule_exec() is executed more than once, the constant is initialized multiple times. I suggest to fix this code step by step. First step: only set a variable if it's NULL.

Something like:

    if (!us_per_ms) {
        us_per_ms = PyLong_FromLong(1000);
    }
    (...)
    if (us_per_ms == NULL || us_per_second == NULL ||
        us_per_minute == NULL || seconds_per_day == NULL) {
        return -1;
    }

Later we may put these constants in a new module state, so it would become possible to clear them at exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -6406,83 +6387,83 @@ PyInit__datetime(void)

x = new_delta(0, 0, 1, 0);
if (x == NULL || PyDict_SetItemString(d, "resolution", x) < 0)
return NULL;
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path leaks x reference if PyDict_SetItemString() fails. I propose to not fix this bug in this PR. Let's move step by step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, not fixing for now. Will add to the summary of future refactors.

m = PyModule_Create(&datetimemodule);
if (m == NULL)
return NULL;

PyTypeObject *types[] = {
&PyDateTime_DateType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static types are not really compatible with having two instances of the same C extension module.

Moreover, this function sets attributes of the type dictionary each time datetimemodule_exec() is called. Problem: type attributes are cached. If you run test_datetime twice, each run creates a fresh _datetime module: ./python -m test -v test_datetime test_datetime. But many tests fail at the second run. A workaround is to call explicitly PyType_ClearCache(); at the function exit wih a comment explaining that it's a workaround for static types.

Note: when running test_datetime, test_divide_and_round() fails: see https://bugs.python.org/issue40058 (it's not a regression of this PR).

Another way to see the bug:

import sys
for i in range(1, 3):
    print(f"== run #{i} ==")
    import _datetime
    print("resolution", _datetime.timedelta.resolution)
    print("min", _datetime.timedelta.min)
    print("max", _datetime.timedelta.max)
    _datetime = None
    del sys.modules['_datetime']

Output:

== run #1 ==
resolution 0:00:00.000001
min -999999999 days, 0:00:00
max 999999999 days, 23:59:59.999999
== run #2 ==
resolution -999999999 days, 0:00:00
min 999999999 days, 23:59:59.999999
max 0001-01-01

When a second instance of _datetime is created, _datetime.timedelta.max becomes a date object instead of a timedelta object. It's because the type cache isn't updated when the dict is modified directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added PyType_ClearCache workaround, will add the conversion to use heap-allocated PyTypeObjects to future PR.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@vstinner
Copy link
Member

@pganssle: With this change, when a second _datetime instance is created: the exec method is executed again.

Right now, Modules/_datetimemodule.c still uses static types, so the two instances still use the same types.

Example with this change applied:

$ ./python
Python 3.9.0a5+ (heads/pr/19122:0ac3031a80, Mar 25 2020, 02:25:19) 
>>> import _datetime as dt1
>>> import sys; del sys.modules['_datetime']
>>> import _datetime as dt2
>>> dt1.timedelta is dt2.timedelta
True
>>> dt2 is dt1
False

This PR prepares the code to be able to switch to types allocated on the heap.

One problem of the datetime module is that it exposes its internals through a PyCapsule object. If it becomes possible to unload the C extension module and destroy its types, the PyCapsule may be made of dangling pointers...

To switch to types allocated on the heap, the PyCapsule object should use strong references and use a destructor.

PyModule_AddIntMacro(m, MINYEAR);
PyModule_AddIntMacro(m, MAXYEAR);
PyModule_AddIntMacro(module, MINYEAR);
PyModule_AddIntMacro(module, MAXYEAR);

x = PyCapsule_New(&CAPI, PyDateTime_CAPSULE_NAME, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it's correct to override the capsule if PyInit__datetime() is called twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAPI is not fully static, there is a heap-allocated type that is created on exec. Not sure we can change this right now.

9a42428#diff-0090febf24974047330d8a02d4dbdc79L6486-L6487

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we change it?

If it's important that we have two separate PyCapsule_New objects, doesn't that mean that if you have a C extension running in two subinterpreters that references this capsule, they'll get a different value for PyDateTimeAPI depending on when they look up the reference rather than depending on which subinterpreter they are running from?

Not sure I see a way around this without a breaking change to the API.

@phsilva phsilva force-pushed the bpo-1635741-datetime branch from 0ac3031 to 9a42428 Compare March 25, 2020 21:27
@phsilva
Copy link
Contributor Author

phsilva commented Mar 26, 2020

Summary

Fixed:

  • prevent multiple initializations of module state (us_per_ms and the like).
  • PyType_ClearCache workaround.

For future PR:

  • Remove static state and use proper PyModule_GetState.
  • Add destructor to the PyCapsule to allow deallocation
  • Move static PyTypeObjects to PyType_Spec heap-allocated types.
  • Fix possible refleak when creating min/max/resolutions constants.

cc: @vstinner

@phsilva
Copy link
Contributor Author

phsilva commented Mar 26, 2020

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@vstinner: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from vstinner March 26, 2020 02:21
@phsilva
Copy link
Contributor Author

phsilva commented Apr 1, 2020

@vstinner do you think we can proceed with the changes so far? In the previous comment I detailed the current status.

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @phsilva.

Are there any tests you can add to ensure that this is working correctly?

I am not sure I can recommend putting them in datetimetester.py, because it does some funky stuff with imports, but a temporary test file (until we can sort out the datetimetester imports issue), or possibly some existing test file intended to test modules for PEP 489 compliance would be good.

I note that @vstinner already has at least one example of a test you can perform: induce two fresh imports of datetime and check that the constants are all right.

return m;
return -1;

// Workaround to allow static PyTypeObjects to have they dict redefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Workaround to allow static PyTypeObjects to have they dict redefined.
// Workaround to allow static PyTypeObjects to have their dict redefined.

PyModuleDef_HEAD_INIT,
"_datetime",
"Fast implementation of the datetime type.",
0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this implies that the module is safe for subinterpreters, but this module still has several process-wide globals. Is it actually appropriate to set m_size to 0 in this case?

(Yes, I realize that it cannot be -1 if you want multi-phase initialization).

return -1;

// Workaround to allow static PyTypeObjects to have they dict redefined.
// XXX: to be removed once _datetime moves to heap allocated PyTypeObjects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there negative consequences to this if we don't move to heap allocated PyTypeObjects?

Are there any downsides to using heap allocated types?

I don't want us to be in a situation where we have some workaround with negative consequences that we end up not being able to remove because we can't convert to heap allocated types.

PyModule_AddIntMacro(m, MINYEAR);
PyModule_AddIntMacro(m, MAXYEAR);
PyModule_AddIntMacro(module, MINYEAR);
PyModule_AddIntMacro(module, MAXYEAR);

x = PyCapsule_New(&CAPI, PyDateTime_CAPSULE_NAME, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we change it?

If it's important that we have two separate PyCapsule_New objects, doesn't that mean that if you have a C extension running in two subinterpreters that references this capsule, they'll get a different value for PyDateTimeAPI depending on when they look up the reference rather than depending on which subinterpreter they are running from?

Not sure I see a way around this without a breaking change to the API.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@phsilva
Copy link
Contributor Author

phsilva commented Apr 1, 2020

@pganssle Thanks for the extensive review. I will address each issue and let you know.

@vstinner
Copy link
Member

vstinner commented Sep 8, 2020

The PyCapsule C API must be enhanced first, before we can fix this issue. The API should be modified to pass the module instance. I close the PR.

If someone wants to work on that, please open a new issue about the PyCapsule C API.

@vstinner vstinner closed this Sep 8, 2020
@yol
Copy link

yol commented Nov 9, 2020

I can definitely confirm that this PR fixes bugs with the _datetime module and subinterpreters, so it would be awesome to have the module ported 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants