-
Notifications
You must be signed in to change notification settings - Fork 476
refactor(profiling): make current_tasks and current_greenlets non global
#15806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codeowners resolved as |
current_tasks and current_greenlets non global
Performance SLOsComparing candidate kowalski/refactor-profiling-make-current_tasks-and-current_greenlets-non-global (86b8892) with baseline kowalski/refactor-profiling-remove-some-globals (7f4d114) 📈 Performance Regressions (3 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 18.043µs (SLO: <20.000µs -9.8%) vs baseline: 📈 +21.9% Memory: ✅ 42.605MB (SLO: <43.250MB 🟡 -1.5%) vs baseline: +4.8% ✅ add_inplace_aspectTime: ✅ 14.923µs (SLO: <20.000µs 📉 -25.4%) vs baseline: +0.1% Memory: ✅ 42.546MB (SLO: <43.250MB 🟡 -1.6%) vs baseline: +4.6% ✅ add_inplace_noaspectTime: ✅ 0.337µs (SLO: <10.000µs 📉 -96.6%) vs baseline: +0.5% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.7% ✅ add_noaspectTime: ✅ 0.546µs (SLO: <10.000µs 📉 -94.5%) vs baseline: +0.2% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.6% ✅ bytearray_aspectTime: ✅ 17.932µs (SLO: <30.000µs 📉 -40.2%) vs baseline: -0.2% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ bytearray_extend_aspectTime: ✅ 24.010µs (SLO: <30.000µs 📉 -20.0%) vs baseline: +0.6% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ bytearray_extend_noaspectTime: ✅ 2.736µs (SLO: <10.000µs 📉 -72.6%) vs baseline: ~same Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ bytearray_noaspectTime: ✅ 1.471µs (SLO: <10.000µs 📉 -85.3%) vs baseline: +1.4% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ bytes_aspectTime: ✅ 16.713µs (SLO: <20.000µs 📉 -16.4%) vs baseline: +0.9% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ bytes_noaspectTime: ✅ 1.399µs (SLO: <10.000µs 📉 -86.0%) vs baseline: -1.1% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ bytesio_aspectTime: ✅ 55.518µs (SLO: <70.000µs 📉 -20.7%) vs baseline: -0.1% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.7% ✅ bytesio_noaspectTime: ✅ 3.280µs (SLO: <10.000µs 📉 -67.2%) vs baseline: +0.3% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ capitalize_aspectTime: ✅ 14.703µs (SLO: <20.000µs 📉 -26.5%) vs baseline: +0.5% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ capitalize_noaspectTime: ✅ 2.598µs (SLO: <10.000µs 📉 -74.0%) vs baseline: +0.3% Memory: ✅ 42.526MB (SLO: <43.500MB -2.2%) vs baseline: +4.7% ✅ casefold_aspectTime: ✅ 14.666µs (SLO: <20.000µs 📉 -26.7%) vs baseline: +0.3% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ casefold_noaspectTime: ✅ 3.151µs (SLO: <10.000µs 📉 -68.5%) vs baseline: -0.4% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ decode_aspectTime: ✅ 15.604µs (SLO: <30.000µs 📉 -48.0%) vs baseline: -0.4% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.9% ✅ decode_noaspectTime: ✅ 1.616µs (SLO: <10.000µs 📉 -83.8%) vs baseline: +1.3% Memory: ✅ 42.703MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.1% ✅ encode_aspectTime: ✅ 18.146µs (SLO: <30.000µs 📉 -39.5%) vs baseline: 📈 +22.6% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ encode_noaspectTime: ✅ 1.499µs (SLO: <10.000µs 📉 -85.0%) vs baseline: +0.7% Memory: ✅ 42.526MB (SLO: <43.500MB -2.2%) vs baseline: +4.4% ✅ format_aspectTime: ✅ 171.318µs (SLO: <200.000µs 📉 -14.3%) vs baseline: +0.1% Memory: ✅ 42.743MB (SLO: <43.250MB 🟡 -1.2%) vs baseline: +5.0% ✅ format_map_aspectTime: ✅ 191.152µs (SLO: <200.000µs -4.4%) vs baseline: ~same Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.6% ✅ format_map_noaspectTime: ✅ 3.797µs (SLO: <10.000µs 📉 -62.0%) vs baseline: -0.3% Memory: ✅ 42.566MB (SLO: <43.250MB 🟡 -1.6%) vs baseline: +4.9% ✅ format_noaspectTime: ✅ 3.179µs (SLO: <10.000µs 📉 -68.2%) vs baseline: +0.9% Memory: ✅ 42.507MB (SLO: <43.250MB 🟡 -1.7%) vs baseline: +4.7% ✅ index_aspectTime: ✅ 15.336µs (SLO: <20.000µs 📉 -23.3%) vs baseline: -0.2% Memory: ✅ 42.605MB (SLO: <43.250MB 🟡 -1.5%) vs baseline: +5.0% ✅ index_noaspectTime: ✅ 0.462µs (SLO: <10.000µs 📉 -95.4%) vs baseline: -0.5% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ join_aspectTime: ✅ 17.064µs (SLO: <20.000µs 📉 -14.7%) vs baseline: ~same Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ join_noaspectTime: ✅ 1.555µs (SLO: <10.000µs 📉 -84.5%) vs baseline: -0.5% Memory: ✅ 42.644MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +4.9% ✅ ljust_aspectTime: ✅ 20.815µs (SLO: <30.000µs 📉 -30.6%) vs baseline: -0.2% Memory: ✅ 42.546MB (SLO: <43.250MB 🟡 -1.6%) vs baseline: +4.9% ✅ ljust_noaspectTime: ✅ 2.742µs (SLO: <10.000µs 📉 -72.6%) vs baseline: +1.0% Memory: ✅ 42.644MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +5.1% ✅ lower_aspectTime: ✅ 17.925µs (SLO: <30.000µs 📉 -40.2%) vs baseline: +0.3% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ lower_noaspectTime: ✅ 2.414µs (SLO: <10.000µs 📉 -75.9%) vs baseline: ~same Memory: ✅ 42.605MB (SLO: <43.250MB 🟡 -1.5%) vs baseline: +4.9% ✅ lstrip_aspectTime: ✅ 17.698µs (SLO: <30.000µs 📉 -41.0%) vs baseline: -0.1% Memory: ✅ 42.566MB (SLO: <43.250MB 🟡 -1.6%) vs baseline: +4.8% ✅ lstrip_noaspectTime: ✅ 1.881µs (SLO: <10.000µs 📉 -81.2%) vs baseline: +1.4% Memory: ✅ 42.526MB (SLO: <43.500MB -2.2%) vs baseline: +4.7% ✅ modulo_aspectTime: ✅ 166.489µs (SLO: <200.000µs 📉 -16.8%) vs baseline: ~same Memory: ✅ 42.723MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +4.6% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 179.945µs (SLO: <200.000µs 📉 -10.0%) vs baseline: +2.9% Memory: ✅ 42.723MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +4.8% ✅ modulo_aspect_for_bytesTime: ✅ 168.587µs (SLO: <200.000µs 📉 -15.7%) vs baseline: -0.2% Memory: ✅ 42.723MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 172.181µs (SLO: <200.000µs 📉 -13.9%) vs baseline: +0.2% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +4.4% ✅ modulo_noaspectTime: ✅ 3.648µs (SLO: <10.000µs 📉 -63.5%) vs baseline: -0.9% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ replace_aspectTime: ✅ 211.982µs (SLO: <300.000µs 📉 -29.3%) vs baseline: +0.1% Memory: ✅ 42.723MB (SLO: <44.000MB -2.9%) vs baseline: +4.8% ✅ replace_noaspectTime: ✅ 2.908µs (SLO: <10.000µs 📉 -70.9%) vs baseline: ~same Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ repr_aspectTime: ✅ 1.427µs (SLO: <10.000µs 📉 -85.7%) vs baseline: ~same Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ repr_noaspectTime: ✅ 0.530µs (SLO: <10.000µs 📉 -94.7%) vs baseline: +0.9% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.8% ✅ rstrip_aspectTime: ✅ 19.031µs (SLO: <30.000µs 📉 -36.6%) vs baseline: +0.4% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.0% ✅ rstrip_noaspectTime: ✅ 2.059µs (SLO: <10.000µs 📉 -79.4%) vs baseline: +7.2% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ slice_aspectTime: ✅ 15.903µs (SLO: <20.000µs 📉 -20.5%) vs baseline: ~same Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ slice_noaspectTime: ✅ 0.596µs (SLO: <10.000µs 📉 -94.0%) vs baseline: -0.4% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ stringio_aspectTime: ✅ 53.849µs (SLO: <80.000µs 📉 -32.7%) vs baseline: -0.5% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.9% ✅ stringio_noaspectTime: ✅ 3.617µs (SLO: <10.000µs 📉 -63.8%) vs baseline: -0.8% Memory: ✅ 42.684MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.2% ✅ strip_aspectTime: ✅ 17.659µs (SLO: <20.000µs 📉 -11.7%) vs baseline: -0.3% Memory: ✅ 42.507MB (SLO: <43.500MB -2.3%) vs baseline: +4.4% ✅ strip_noaspectTime: ✅ 1.870µs (SLO: <10.000µs 📉 -81.3%) vs baseline: +0.5% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.9% ✅ swapcase_aspectTime: ✅ 18.453µs (SLO: <30.000µs 📉 -38.5%) vs baseline: -0.2% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ swapcase_noaspectTime: ✅ 2.795µs (SLO: <10.000µs 📉 -72.0%) vs baseline: +0.7% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.8% ✅ title_aspectTime: ✅ 18.219µs (SLO: <30.000µs 📉 -39.3%) vs baseline: -0.3% Memory: ✅ 42.644MB (SLO: <43.000MB 🟡 -0.8%) vs baseline: +5.0% ✅ title_noaspectTime: ✅ 2.644µs (SLO: <10.000µs 📉 -73.6%) vs baseline: -1.0% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.9% ✅ translate_aspectTime: ✅ 24.357µs (SLO: <30.000µs 📉 -18.8%) vs baseline: 📈 +18.6% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +5.1% ✅ translate_noaspectTime: ✅ 4.355µs (SLO: <10.000µs 📉 -56.5%) vs baseline: +0.7% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ upper_aspectTime: ✅ 17.988µs (SLO: <30.000µs 📉 -40.0%) vs baseline: +1.0% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.6% ✅ upper_noaspectTime: ✅ 2.425µs (SLO: <10.000µs 📉 -75.7%) vs baseline: -0.4% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.196µs (SLO: <10.000µs 📉 -48.0%) vs baseline: 📈 +21.1% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.8% ✅ ospathbasename_noaspectTime: ✅ 4.273µs (SLO: <10.000µs 📉 -57.3%) vs baseline: -0.6% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ ospathjoin_aspectTime: ✅ 6.235µs (SLO: <10.000µs 📉 -37.6%) vs baseline: +0.3% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ ospathjoin_noaspectTime: ✅ 6.302µs (SLO: <10.000µs 📉 -37.0%) vs baseline: -0.4% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.9% ✅ ospathnormcase_aspectTime: ✅ 3.576µs (SLO: <10.000µs 📉 -64.2%) vs baseline: ~same Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.0% ✅ ospathnormcase_noaspectTime: ✅ 3.638µs (SLO: <10.000µs 📉 -63.6%) vs baseline: +1.2% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +5.1% ✅ ospathsplit_aspectTime: ✅ 4.926µs (SLO: <10.000µs 📉 -50.7%) vs baseline: +0.1% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ ospathsplit_noaspectTime: ✅ 5.039µs (SLO: <10.000µs 📉 -49.6%) vs baseline: +0.7% Memory: ✅ 42.526MB (SLO: <43.500MB -2.2%) vs baseline: +4.8% ✅ ospathsplitdrive_aspectTime: ✅ 3.765µs (SLO: <10.000µs 📉 -62.3%) vs baseline: +0.8% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ ospathsplitdrive_noaspectTime: ✅ 0.752µs (SLO: <10.000µs 📉 -92.5%) vs baseline: +0.2% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ ospathsplitext_aspectTime: ✅ 4.605µs (SLO: <10.000µs 📉 -53.9%) vs baseline: -0.4% Memory: ✅ 42.526MB (SLO: <43.500MB -2.2%) vs baseline: +4.9% ✅ ospathsplitext_noaspectTime: ✅ 4.617µs (SLO: <10.000µs 📉 -53.8%) vs baseline: -1.1% Memory: ✅ 42.487MB (SLO: <43.500MB -2.3%) vs baseline: +4.8% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.408µs (SLO: <20.000µs 📉 -83.0%) vs baseline: 📈 +13.2% Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.9% ✅ 1-count-metrics-100-timesTime: ✅ 199.928µs (SLO: <220.000µs -9.1%) vs baseline: ~same Memory: ✅ 34.937MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.0% ✅ 1-distribution-metric-1-timesTime: ✅ 3.347µs (SLO: <20.000µs 📉 -83.3%) vs baseline: -0.8% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ 1-distribution-metrics-100-timesTime: ✅ 214.751µs (SLO: <230.000µs -6.6%) vs baseline: +0.7% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.7% ✅ 1-gauge-metric-1-timesTime: ✅ 2.211µs (SLO: <20.000µs 📉 -88.9%) vs baseline: -0.7% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.7% ✅ 1-gauge-metrics-100-timesTime: ✅ 137.109µs (SLO: <150.000µs -8.6%) vs baseline: +0.6% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.6% ✅ 1-rate-metric-1-timesTime: ✅ 3.175µs (SLO: <20.000µs 📉 -84.1%) vs baseline: +0.1% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +5.0% ✅ 1-rate-metrics-100-timesTime: ✅ 214.115µs (SLO: <250.000µs 📉 -14.4%) vs baseline: ~same Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.6% ✅ 100-count-metrics-100-timesTime: ✅ 20.051ms (SLO: <22.000ms -8.9%) vs baseline: +1.0% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.232ms (SLO: <2.550ms 📉 -12.5%) vs baseline: -0.7% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.0% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.421ms (SLO: <1.550ms -8.3%) vs baseline: +0.3% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.7% ✅ 100-rate-metrics-100-timesTime: ✅ 2.203ms (SLO: <2.550ms 📉 -13.6%) vs baseline: +1.5% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.4% ✅ flush-1-metricTime: ✅ 4.593µs (SLO: <20.000µs 📉 -77.0%) vs baseline: -0.5% Memory: ✅ 34.957MB (SLO: <35.500MB 🟡 -1.5%) vs baseline: +4.1% ✅ flush-100-metricsTime: ✅ 174.683µs (SLO: <250.000µs 📉 -30.1%) vs baseline: +0.4% Memory: ✅ 35.095MB (SLO: <35.500MB 🟡 -1.1%) vs baseline: +4.1% ✅ flush-1000-metricsTime: ✅ 2.161ms (SLO: <2.500ms 📉 -13.6%) vs baseline: ~same Memory: ✅ 36.038MB (SLO: <36.500MB 🟡 -1.3%) vs baseline: +4.7% 🟡 Near SLO Breach (16 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
## Description https://datadoghq.atlassian.net/browse/PROF-13364 This PR makes the `python_stack` variable non global (and non static) so that its lifetime is easier to understand and to reason about. This change is trivial as `python_stack` is already at the moment used exactly like a local variable would be. Since it is used by many parts of the Thread unwinding logic, I have made it part of the `ThreadInfo` fields, so that each `ThreadInfo` method can easily use it. Methods outside `ThreadInfo` access it using a reference argument. _This is part of a broader effort to get rid of statics and globals in the Python Profiler._ - #15801 - #15806 - #15805 ## Performance Running the Profiler with those changes under the Full Host Profiler shows no noticeable CPU time change (which could have happened, at least marginally, since we have changed the semantics around and lifetime of `FrameStack` objects).
ed8e918
into
kowalski/refactor-profiling-remove-some-globals
## Description This PR removes several global variables we currently have and don't use (from when we moved echion to dd-trace-py) and replaces a global variant `runtime` (which was set to `&_PyRuntime`) with a local one instead. _This is part of a broader effort to get rid of statics and globals in the Python Profiler._ - #15801 - #15805 - #15806
…g#15801) ## Description https://datadoghq.atlassian.net/browse/PROF-13364 This PR makes the `python_stack` variable non global (and non static) so that its lifetime is easier to understand and to reason about. This change is trivial as `python_stack` is already at the moment used exactly like a local variable would be. Since it is used by many parts of the Thread unwinding logic, I have made it part of the `ThreadInfo` fields, so that each `ThreadInfo` method can easily use it. Methods outside `ThreadInfo` access it using a reference argument. _This is part of a broader effort to get rid of statics and globals in the Python Profiler._ - DataDog#15801 - DataDog#15806 - DataDog#15805 ## Performance Running the Profiler with those changes under the Full Host Profiler shows no noticeable CPU time change (which could have happened, at least marginally, since we have changed the semantics around and lifetime of `FrameStack` objects).
## Description This PR removes several global variables we currently have and don't use (from when we moved echion to dd-trace-py) and replaces a global variant `runtime` (which was set to `&_PyRuntime`) with a local one instead. _This is part of a broader effort to get rid of statics and globals in the Python Profiler._ - DataDog#15801 - DataDog#15805 - DataDog#15806
Description
This is part of a broader effort to get rid of statics and globals in the Python Profiler.
python_stacklocal and non static #15801current_tasksandcurrent_greenletsnon global #15806