Skip to content

Conversation

@godlygeek
Copy link
Contributor

Up until now we've been greedily capturing the line number of each
Python frame as we first encounter it, but this is unnecessarily
inefficient. Since we've already been assuming that it's safe to call
PyFrame_GetLineNumber without the GIL held (and we'd already need to
change this code should that assumption become false in a future CPython
version), we can delay calculating the Python line number until we're
about to emit the frame. Since many frames are entered without ever
being emitted (because no allocation is performed under them), this
drastically reduces the number of calls to PyFrame_GetLineNumber,
which is a deceptively slow function.

@godlygeek godlygeek added the enhancement New feature or request label Apr 21, 2022
@godlygeek godlygeek self-assigned this Apr 21, 2022
@pablogsal pablogsal enabled auto-merge (rebase) April 21, 2022 10:03
@godlygeek godlygeek force-pushed the fetch_python_line_numbers_lazily branch from aa31877 to 57f69c6 Compare April 21, 2022 16:44
@godlygeek godlygeek marked this pull request as draft April 21, 2022 20:37
auto-merge was automatically disabled April 21, 2022 20:37

Pull request was converted to draft

@godlygeek
Copy link
Contributor Author

Flipping this to Draft - this is subtly broken in a way that our test suite doesn't catch. I'll need to circle back to this - and add a new test...

@godlygeek godlygeek force-pushed the fetch_python_line_numbers_lazily branch 4 times, most recently from 6cddca2 to c20dd97 Compare May 11, 2022 06:46
@godlygeek godlygeek marked this pull request as ready for review May 11, 2022 06:54
@godlygeek
Copy link
Contributor Author

OK, I think this is now actually correct... And the new test case covers the problem that the original version of this had, so that we know we won't regress on it.

@godlygeek godlygeek force-pushed the fetch_python_line_numbers_lazily branch 4 times, most recently from fa73200 to 55b8367 Compare May 11, 2022 19:00
@godlygeek godlygeek force-pushed the fetch_python_line_numbers_lazily branch from 55b8367 to 69de8a3 Compare August 16, 2022 23:39
godlygeek and others added 2 commits August 16, 2022 20:31
Up until now we've been greedily capturing the line number of each
Python frame as we first encounter it, but this is unnecessarily
inefficient. Since we've already been assuming that it's safe to call
`PyFrame_GetLineNumber` without the GIL held (and we'd already need to
change this code should that assumption become false in a future CPython
version), we can delay calculating the Python line number until we're
about to emit the frame. Since many frames are entered without ever
being emitted (because no allocation is performed under them), this
drastically reduces the number of calls to `PyFrame_GetLineNumber`,
which is a deceptively slow function.

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
If we only update line numbers on Python frame pushes and pops we'll get
this wrong, as the line number to be tracked changes in between two
allocations without us ever entering or exiting a Python function.

Signed-off-by: Matt Wozniski <godlygeek@gmail.com>
@godlygeek godlygeek force-pushed the fetch_python_line_numbers_lazily branch from 69de8a3 to b50d319 Compare August 17, 2022 00:32
@pablogsal
Copy link
Member

pablogsal commented Aug 17, 2022

Experiments executed for python -m memray run -o /dev/null -m test test_list test_list test_list test_list test_list test_list test_list test_list test_list test_list for a total duration of 1.2595737380001992 seconds in baseline.

With this PR

      Calls to PyFrame_GetLineNumber: 25550
     0.03%     0.03%  python  python3.10  [.] PyFrame_GetLineNumber                                                                                ▒

Baseline

      Calls to PyFrame_GetLineNumber: 270090
     0.39%     0.39%  python  python3.10                               [.] PyFrame_GetLineNumber                                                   ◆

This means that in general the most we can save (that we are indeed saving it) is 0.39% of the total runtime (this is assuming no PYTHONMALLOC=malloc where most savings will be there since calls to PyFrame_GetLineNumber in pushPythonFrame are a bigger percentage of the runtime because our hooks are called less).

For the case where PYTHONMALLOC=malloc for a total duration of 3.0231499819997225 seconds in baseline:

PR

      Calls to PyFrame_GetLineNumber: 21057170
+    5.26%     5.26%  python  python3.10                               [.] PyFrame_GetLineNumber

Baseline

      Calls to PyFrame_GetLineNumber: 21125440
+    7.23%     7.23%  python  python3.10                               [.] PyFrame_GetLineNumber

This means that in the latter we are saving ~2% maximum. This is consistent with the timing results of 2.8712007330000233 (PR) against 2.935535135000009 (baseline).

PR

  Test duration: 2.8712007330000233
  Time (mean ± σ):      3.058 s ±  0.051 s    [User: 3.008 s, System: 0.056 s]
  Range (min … max):    2.983 s …  3.167 s    10000 runs

Baseline

    Test duration: 2.935535135000009
  Time (mean ± σ):      3.120 s ±  0.046 s    [User: 3.070 s, System: 0.056 s]
  Range (min … max):    3.064 s …  3.210 s    10000 runs

That is around 3.5% of the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants