Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the linux perf profiler to see Python calls #96143

Closed
pablogsal opened this issue Aug 20, 2022 · 8 comments
Closed

Allow the linux perf profiler to see Python calls #96143

pablogsal opened this issue Aug 20, 2022 · 8 comments
Assignees
Labels
3.12 bugs and security fixes type-feature A feature request or enhancement

Comments

@pablogsal
Copy link
Member

The linux perf profiler is a very powerful tool but unfortunately is not able to see Python calls (only the C stack) and therefore it cannot be used (neither its very complete ecosystem) to profile Python applications and extensions.

Turns out that node and the JVM have developed a way to leverage the perf profiler for the Java and javascript frames. They use their JIT compilers to generate a unique area in memory where they place assembly code that in turn calls the frame evaluator function. This JIT compiled areas are unique per function/code object. They use the perf maps (perf allows to place a map in /temp/perf-PID.map with information mapping the JIT-ed areas to a string that identifies them and this allows perf to map java/javascript names to the JIT-ed areas, basically showing the non-native function names on the stack.

We can do a simple version of this idea in Python by using a very simple JIT compiler that compiles a assembly template that is the used to jump to PyEval_EvalFrameDefault and we can place the code names and filenames in the special perf file. This allows perf to see Python calls as well:

perf_names

And this works with all the tools in the perf ecosystem, like flamegraphs:

perf_flame

See also:
https://www.brendangregg.com/Slides/KernelRecipes_Perf_Events.pdf

@kumaraditya303 kumaraditya303 added type-feature A feature request or enhancement 3.12 bugs and security fixes labels Aug 20, 2022
@pablogsal pablogsal self-assigned this Aug 20, 2022
@pablogsal
Copy link
Member Author

Is also very easy to transform these into python-only flamegraphs by filtering the py:: prefix:
perf

miss-islington pushed a commit that referenced this issue Aug 30, 2022
⚠️  ⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers ⚠️ ⚠️ 

If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**. 

If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
pablogsal added a commit that referenced this issue Aug 30, 2022
…#96433)

* gh-96132: Add some comments and minor fixes missed in the original PR

* Update Doc/using/cmdline.rst

Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>

Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 30, 2022
@gpshead
Copy link
Member

gpshead commented Aug 31, 2022

  • A Linux buildbot with PYTHONPERFSUPPORT=1 and relevant CFLAGS=-fno-omit-frame-pointer needs to be setup.
  • Something needs to garbage collect it's /tmp/perf-$pid.map files as well.

miss-islington pushed a commit that referenced this issue Sep 1, 2022
minor missed test cleanup to use the modern API from the big review.

Automerge-Triggered-By: GH:gpshead
@pablogsal
Copy link
Member Author

pablogsal commented Sep 1, 2022

Something needs to garbage collect it's /tmp/perf-$pid.map files as well.

That's really up to the user unfortunately. The files must be available after the process finishes and at report time so I don't see what we can do that automatically cleans these files because we don't know when the user has finished with them.


Or do you mean in the buildbot? In that case, tests are deleting created files already so they should not be polluting the machine so these won't pile up in buildbots.

@gpshead
Copy link
Member

gpshead commented Sep 2, 2022

Or do you mean in the buildbot? In that case, tests are deleting created files already so they should not be polluting the machine so these won't pile up in buildbots.

Yes I was talking about the desired buildbot config. A tmpwatcher of some form set to tmp files over a few hours old is likely sufficient. With the environment variable set to enable perf everywhere, a single test run probably has hundreds if not thousands of PIDs. ;)

(This is where the buildbot design really shows age. A fresh container per buildbot worker test session would make sense.)

@pablogsal
Copy link
Member Author

Yes I was talking about the desired buildbot config. A tmpwatcher of some form set to tmp files over a few hours old is likely sufficient. With the environment variable set to enable perf everywhere, a single test run probably has hundreds if not thousands of PIDs. ;)

Ah in that case I don't think is needed. Tests check the perf files before and after the tests and deleted any new file that matches PIDs that have been spawned during tests. I will revise this logic to ensure it works correctly when running parallel test suites but I think that should be enough.

@pablogsal
Copy link
Member Author

@gpshead We have a perf buildbot now:

https://buildbot.python.org/all/#/builders/1078

[buildbot@4142e9f43556 build]$ ./python -m test test_perf_profiler -v
== CPython 3.12.0a1+ (heads/main:0e15c31c7e, Nov 1 2022, 11:10:40) [GCC 12.2.0]
== Linux-5.4.0-131-generic-x86_64-with-glibc2.36 little-endian
== cwd: /buildbot/buildarea/3.x.pablogsal-arch-x86_64.perfbuild/build/build/test_python_21653æ
== CPU count: 8
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 1.57 Run tests sequentially
0:00:00 load avg: 1.57 [1/1] test_perf_profiler
test_python_calls_appear_in_the_stack_if_perf_activated (test.test_perf_profiler.TestPerfProfiler.test_python_calls_appear_in_the_stack_if_perf_activated) ... ok
test_python_calls_do_not_appear_in_the_stack_if_perf_activated (test.test_perf_profiler.TestPerfProfiler.test_python_calls_do_not_appear_in_the_stack_if_perf_activated) ... ok
test_sys_api (test.test_perf_profiler.TestPerfTrampoline.test_sys_api) ... ok
test_sys_api_get_status (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_get_status) ... ok
test_sys_api_with_existing_trampoline (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_with_existing_trampoline) ... ok
test_sys_api_with_invalid_trampoline (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_with_invalid_trampoline) ... ok
test_trampoline_works (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works) ... ok
test_trampoline_works_with_forks (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works_with_forks) ... ok

----------------------------------------------------------------------
Ran 8 tests in 0.928s

OK

== Tests result: SUCCESS ==

1 test OK.

Total duration: 1.2 sec
Tests result: SUCCESS

here is the relevant log from the last build:

0:00:47 load avg: 1.03 [ 80/437] test_perf_profiler passed

@gpshead
Copy link
Member

gpshead commented Nov 1, 2022

sweet!

FYI - our TensorFlow folks internally just re-worked their equivalent parallel perf-trampoline-hook Python profile enabling work to do what it needs using our backport of these changes to our internal 3.9 soon to be 3.10 runtime. Once they start using that hopefully to helps shake any strange issues out (of which I'm sure we all hope there are none). 😄

marking closed as I think everything for this is done at this point?

@gpshead gpshead closed this as completed Nov 1, 2022
@pablogsal
Copy link
Member Author

marking closed as I think everything for this is done at this point?

Nothing critical that I can think. I will probably add some extra examples to the doc but I can open more issues for that.

FYI - our TensorFlow folks internally just re-worked their equivalent parallel perf-trampoline-hook Python profile enabling work to do what it needs using our backport of these changes to our internal 3.9 soon to be 3.10 runtime. Once they start using that hopefully to helps shake any strange issues out (of which I'm sure we all hope there are none).

That's wonderful. This will allow us to test this more heavily before the final release 👍

dependabot bot added a commit to ronaldoussoren/cpython that referenced this issue Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants