Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118518: Improve perf docs #118708

Merged
merged 3 commits into from
May 7, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 48 additions & 22 deletions Doc/howto/perf_profiling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -162,12 +162,12 @@ the :option:`!-X` option takes precedence over the environment variable.

Example, using the environment variable::

$ PYTHONPERFSUPPORT=1 python script.py
$ PYTHONPERFSUPPORT=1 perf record -F 9999 -g -o perf.data python script.py
$ perf report -g -i perf.data

Example, using the :option:`!-X` option::

$ python -X perf script.py
$ perf record -F 9999 -g -o perf.data python -X perf script.py
$ perf report -g -i perf.data

Example, using the :mod:`sys` APIs in file :file:`example.py`:
Expand All @@ -184,7 +184,7 @@ Example, using the :mod:`sys` APIs in file :file:`example.py`:

...then::

$ python ./example.py
$ perf record -F 9999 -g -o perf.data python ./example.py
$ perf report -g -i perf.data


Expand All @@ -210,31 +210,57 @@ of ``perf``.
How to work without frame pointers
----------------------------------

If you are working with a Python interpreter that has been compiled without frame pointers
you can still use the ``perf`` profiler but the overhead will be a bit higher because Python
needs to generate unwinding information for every Python function call on the fly. Additionally,
``perf`` will take more time to process the data because it will need to use the DWARF debugging
information to unwind the stack and this is a slow process.
If you are working with a Python interpreter that has been compiled without
frame pointers you can still use the ``perf`` profiler but the overhead will be
pablogsal marked this conversation as resolved.
Show resolved Hide resolved
a bit higher because Python needs to generate unwinding information for every
Python function call on the fly. Additionally, ``perf`` will take more time to
process the data because it will need to use the DWARF debugging information to
unwind the stack and this is a slow process.

To enable this mode, you can use the environment variable :envvar:`PYTHON_PERF_JIT_SUPPORT` or the
:option:`-X perf_jit <-X>` option, which will enable the JIT mode for the ``perf`` profiler.
To enable this mode, you can use the environment variable
:envvar:`PYTHON_PERF_JIT_SUPPORT` or the :option:`-X perf_jit <-X>` option,
which will enable the JIT mode for the ``perf`` profiler.

When using the perf JIT mode, you need an extra step before you can run ``perf report``. You need to
call the ``perf inject`` command to inject the JIT information into the ``perf.data`` file.
.. note::

Due to a bug in the ``perf`` tool, only ``perf`` versions higher than v6.8
will work with the JIT mode. The fix was also backported to the v6.7.2
version of the tool.
pablogsal marked this conversation as resolved.
Show resolved Hide resolved

Note that when checking the version of the ``perf`` tool (which can be done
by running ``perf version``) you must take into account that some distros
add some custom version numbers including a ``-`` character. This means
that ``perf 6.7-3`` is not necessarily ``perf 6.7.3``.

When using the perf JIT mode, you need an extra step before you can run ``perf
report``. You need to call the ``perf inject`` command to inject the JIT
information into the ``perf.data`` file.::

$ perf record -F 9999 -g --call-graph dwarf -o perf.data python -Xperf_jit my_script.py
$ perf inject -i perf.data --jit
$ perf report -g -i perf.data
$ perf inject -i perf.data --jit --output perf.jit.data
$ perf report -g -i perf.jit.data

or using the environment variable::

$ PYTHON_PERF_JIT_SUPPORT=1 perf record -F 9999 -g --call-graph dwarf -o perf.data python my_script.py
$ perf inject -i perf.data --jit
$ perf report -g -i perf.data

Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take snapshots of the stack of
the process being profiled and save the information in the ``perf.data`` file. By default the size of
the stack dump is 8192 bytes but the user can change the size by passing the size after comma like
``--call-graph dwarf,4096``. The size of the stack dump is important because if the size is too small
``perf`` will not be able to unwind the stack and the output will be incomplete.
$ perf inject -i perf.data --jit --output perf.jit.data
$ perf report -g -i perf.jit.data

``perf inject --jit`` command will read ``perf.data``,
automatically pick up the perf dump file that Python creates (in
``/tmp/perf-$PID.dump``), and then create ``perf.jit.data`` which merges all the
JIT information together. It should also create a lot of ``jitted-XXXX-N.so``
files in the current directory which are ELF images for all the JIT trampolines
that were created by Python.

.. warning::
Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take
snapshots of the stack of the process being profiled and save the
information in the ``perf.data`` file. By default the size of the stack dump
is 8192 bytes but the user can change the size by passing the size after
comma like ``--call-graph dwarf,4096``. The size of the stack dump is
important because if the size is too small ``perf`` will not be able to
unwind the stack and the output will be incomplete. On the other hand, if
the size is too big, then ``perf`` won't be able to sample the process as
frequently as it would like as the overhead will be higher.

Loading