-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-108753: Enhance pystats #108754
gh-108753: Enhance pystats #108754
Conversation
Currently, running Python always writes many lines of statistics to stderr at Python exit when the
These logs are useless since stats gathering is off by default. It's just not convenient. With this PR, statistics are only dumped when
Using
Statistics are not zero anymore. |
With these changes, the test suite now pass on Python built with Py_STATS, since statistics are no longer dumped into stderr by default. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
Using
python -X pystats
, logs are dumped but also gathered, so it's more useful :-)
Unfortunately, this makes it harder to not collect stats during startup. The way it's typically used in pyperf
, for example, is for stats to automatically turn on and off only around the actual benchmarking code. pyperf
could be updated (probably) to call sys._stats_clear()
before the first sys._stats_on()
, perhaps, but it's almost never desirable to collect stats from an entire run -- that feels like the wrong default.
I think I'd prefer the flag/envvar to only control the dumping at exit. But, thinking even more about it, maybe we should just disable the printing to stderr altogether and just create the output directory if it doesn't exist (and raise an exception if that fails). We don't use the stderr in the benchmarking infrastructure, and it doesn't play well with multiple processes (which pyperf almost always creates). That would remove the need for the envvar/flag, and there would only be two, not three, things users would need to do.
I like the refactor here outside of the specialization code where this began -- we definitely do anticipate this growing beyond that. And being able to pass the tests is great -- maybe (as a separate discussion) this config should be tested in a buildbot.
|
||
* Garbage collections; | ||
* Objects visited; | ||
* Objects collected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these docs. Should we also name the "keys" in the output corresponding to each of these? I know the output format isn't really "designed" and there are no guarantees of stability there, but still...
Currently, without my change, I have no preference for this question. I almost never used |
Ah, got it. Sorry I missed that. So for sure we want to decouple "start collecting as early as possible" from "dumping at exit". I think just removing dumping to stderr might be the easiest way to do that, but I'd certainly want to hear from others who use this -- maybe that's something really convenient someone relies on. |
In general, I think that it's a good practice to reset statistics and start gathering stats when the benchmark starts, and then turns off gathering stats when the benchmark stops. I would be compatible with the current behavior of this PR, no? |
If I understand correctly, with this change, pyperf would have to be modified to clear stats the first time a benchmark is run (but not subsequent times a benchmark is run), in order to get the stats out at the end. Right now, it doesn't have to worry about stats prior to the first benchmark run. Also, the orchestration process would have to clear and turn off stats, which I don't think it currently does. |
To run a benchmark, I don't think that using import math
import sys
def cpu_intensive():
len(str(math.factortial(1000)))
def bench():
sys._stats_clear() # just in case if it was already enabled
sys._stats_on()
cpu_intensive()
sys._stats_off()
sys._stats_dump()
# not sure what happens if Python dumps again stats at exit
# would sys._stats_clear() help here?
bench() |
Sorry, I wasn't clear. My point is just that this is a breaking change for pyperf, which could be updated, of course. The pseudo code is more like:
Since
I'm not opposed, but an alternative that wouldn't require updating |
My usage case is to gets statistics on a command without having to modify the code: For me, it's a similar usage that running a program in a profiler or anything collecting data about the executed program: collect data and then write the output. If you need a less coarse statistics gathering than "the whole program", I would suggest you... to not use the If you think that there is an use case to use Ok, maybe there is a way to please all use cases. What if... what if it would be possible to disable writing statistics at exit using existing API? For me, it sounds bad to write statistics if the following functions are called:
|
I modified my PR to not dump statistics if they are all equal to zero.
@mdboom: Does it solve your use case? |
This script can also be used to test my PR. Run it with import sys
import gc
def heavy_code():
for _ in range(10**4):
pass
for _ in range(3):
gc.collect()
def mybench():
sys._stats_clear()
sys._stats_on()
heavy_code()
sys._stats_off()
dumped = sys._stats_dump()
print("stats dumped?", dumped)
dumped = sys._stats_dump()
print("oops, called twice! dumped?", dumped)
mybench()
print("exit") |
This is fine, but I think pyperf will need to be updated to remain correct. It still will only dump at exit if I think if you just revert this diff below (and update your new docs accordingly), it's probably ok -- it will dump stats on end if any were collected (which should be the case when running inside pyperf as-is): Since it's now only dumping if there are any collected stats, it should also solve what I understand you set out to solve. |
Good :-)
This change is only for Python 3.13, you have time to adapt pyperf if needed :-) pyperf spawns subprocesses, you can control if they are run with
You're right. Always calling |
Statistics gathering is now off by default. Use the "-X pystats" command line option or set the new PYTHONSTATS environment variable to 1 to turn statistics gathering on at Python startup. Statistics are no longer dumped at exit if statistics gathering was off or statistics have been cleared. Changes: * Add PYTHONSTATS environment variable. * sys._stats_dump() now returns False if statistics are not dumped because they are all equal to zero. * Add PyConfig._pystats member. * Add tests on sys functions and on setting PyConfig._pystats to 1. * Add Include/cpython/pystats.h and Include/internal/pycore_pystats.h header files. * Rename '_py_stats' variable to '_Py_stats'. * Exclude Include/cpython/pystats.h from the Py_LIMITED_API. * Move pystats.h include from object.h to Python.h. * Add _Py_StatsOn() and _Py_StatsOff() functions. Remove '_py_stats_struct' variable from the API: make it static in specialize.c. * Document API in Include/pystats.h and Include/cpython/pystats.h. * Complete pystats documentation in Doc/using/configure.rst. * Don't write "all zeros" stats: if _stats_off() and _stats_clear() or _stats_dump() were called. * _PyEval_Fini() now always call _Py_PrintSpecializationStats() which does nothing if stats are all zeros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor nits, but other than that, looks good to me.
Thanks for pushing through for something that doesn't break pyperf!
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
@mdboom: Thanks for the review, I accepted your suggestions. Oops, I didn't notice that stats are dumped to stderr, not stdout. |
📚 Documentation preview 📚: https://cpython-previews--108754.org.readthedocs.build/