Skip to content

add automatic heap snapshots to help debug memory issues #5920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rochdev
Copy link
Member

@rochdev rochdev commented Jun 20, 2025

What does this PR do?

Add automatic heap snapshots to help debug memory issues.

Motivation

Sometimes heap dumps are needed to debug memory issues. This can be annoying for users as they have to manually add code to generate the heap dumps, and sometimes they are not generated correctly (wrong interval, only one sample, without garbage collection, etc). This change adds the ability to generate heap snapshots directly from dd-trace by simply passing an environment variable, making the whole process a lot easier.

Additional Notes

@BridgeAR I know you have been working on logic to write the heap snapshot manually so that we can also send it elsewhere, let me know if you want me to add it to this PR, otherwise I think it could be a future improvement.

I thought about using telemetry for this but I don't like the security implications of sending a full memory dump directly to us.

Copy link

github-actions bot commented Jun 20, 2025

Overall package size

Self size: 9.72 MB
Deduped: 106.23 MB
No deduping: 106.75 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 8.5.2 | 19.33 MB | 19.34 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.8.2 | 9.56 MB | 9.93 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.3 | 2.95 MB | 5.6 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.14.0 | 120.58 kB | 841.68 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | lru-cache | 7.18.3 | 133.92 kB | 133.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 5.3.2 | 53.63 kB | 53.63 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | dc-polyfill | 0.1.9 | 25.11 kB | 25.11 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.2 | 23.54 kB | 23.54 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

Copy link

codecov bot commented Jun 20, 2025

Codecov Report

Attention: Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.

Project coverage is 80.78%. Comparing base (42ce94b) to head (7a98495).

Files with missing lines Patch % Lines
packages/dd-trace/src/heap_snapshots.js 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5920      +/-   ##
==========================================
+ Coverage   80.75%   80.78%   +0.02%     
==========================================
  Files         462      463       +1     
  Lines       19923    19956      +33     
==========================================
+ Hits        16089    16121      +32     
- Misses       3834     3835       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rochdev rochdev marked this pull request as ready for review June 20, 2025 17:13
@rochdev rochdev requested a review from a team as a code owner June 20, 2025 17:13
@pr-commenter
Copy link

pr-commenter bot commented Jun 20, 2025

Benchmarks

Benchmark execution time: 2025-06-20 21:17:11

Comparing candidate commit 7a98495 in PR branch auto-heap-snapshots with baseline commit 42ce94b in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1272 metrics, 51 unstable metrics.

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jun 20, 2025

Datadog Report

Branch report: auto-heap-snapshots
Commit report: f1f1e6e
Test service: dd-trace-js-integration-tests

✅ 0 Failed, 1251 Passed, 0 Skipped, 18m 52.43s Total Time

Copy link
Collaborator

@watson watson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I don't like how you're using dynamic require in packages/dd-trace/src/heap_snapshots.js instead of just sticking them all at the top of the file. None of the ones referenced run any code upon being required, so they should be safe to just stick at the top of the file.

Comment on lines +50 to +60
if (config.heapSnapshot.count > 0 && globalThis.gc) {
const log = require('./log')
const folder = config.heapSnapshot.folder

try {
await scheduleSnapshot(config, 1)
log.debug('Wrote heap snapshots to %s.', folder)
} catch (e) {
log.error('Failed to write heap snapshots to %s.', folder, e)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Personally, I like to reverse the if-statement and return instead of indenting everything in one large block, but feel free to ignore

Suggested change
if (config.heapSnapshot.count > 0 && globalThis.gc) {
const log = require('./log')
const folder = config.heapSnapshot.folder
try {
await scheduleSnapshot(config, 1)
log.debug('Wrote heap snapshots to %s.', folder)
} catch (e) {
log.error('Failed to write heap snapshots to %s.', folder, e)
}
}
if (config.heapSnapshot.count === 0 || !globalThis.gc) return
const log = require('./log')
const folder = config.heapSnapshot.folder
try {
await scheduleSnapshot(config, 1)
log.debug('Wrote heap snapshots to %s.', folder)
} catch (e) {
log.error('Failed to write heap snapshots to %s.', folder, e)
}


globalThis.gc()
await setImmediate()
globalThis.gc() // Run full GC a second time for anything missed in first GC.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a know issue, I prefer if we reference a bug report/documentation/blog post specifying why this is required

Comment on lines +33 to +43
const filename = format(
'Heap-%s%s%s-%s%s%s-%s-%s.heapsnapshot',
date.getFullYear(),
pad(date.getMonth()),
pad(date.getDate()),
pad(date.getHours()),
pad(date.getMinutes()),
pad(date.getSeconds()),
process.pid,
threadId
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate that we have to replicate this from V8 just because we want to specify the folder. It would be nice if specifying a folder without having to specify the filename was supported by V8

const { join } = require('path')
const { start } = require('../src/heap_snapshots')

const folder = mkdtempSync(join(tmpdir(), 'dd-trace-heap-snapshot-'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters, but I assume the trailing - was a mistake?

Suggested change
const folder = mkdtempSync(join(tmpdir(), 'dd-trace-heap-snapshot-'))
const folder = mkdtempSync(join(tmpdir(), 'dd-trace-heap-snapshot'))


clearInterval(interval)

const pattern = /^Heap-\d{8}-\d{6}-\d+-\d+\.heapsnapshot$/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but just a slight improvement to the validation now that you are generating the filename manually (where threadId is imported at the top of the spec file). This would also catch cases where somehow files are reused between runs for some reason

Suggested change
const pattern = /^Heap-\d{8}-\d{6}-\d+-\d+\.heapsnapshot$/
const pattern = new RegExp(`^Heap-\\d{8}-\\d{6}-${process.pid}-${threadId}\\.heapsnapshot$`)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants