Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sdk-metrics): Add missing catch and handle error in promise of PeriodicExportingMetricReader #5006

Merged

Conversation

jj22ee
Copy link
Contributor

@jj22ee jj22ee commented Sep 21, 2024

Which problem is this PR solving?

PeriodicExportingMetricReader's doExport method will throw an Error if metrics export failed, and is expected to be caught within this try-catch.

This method is called in two locations within the same if-else, but is only caught/handled in the else while not being caught/handled in the if. Although the if-else is within a try-catch, one of the doExport invocations is called within a Promise.then(), which isn't being caught right now

This PR ensures that doExport in Location 1 is caught.

Short description of the changes

  • Add catch to the promise that runs doExport in the then statement

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Test in my application involving PeriodicExportingMetricReader with small metric export interval and many resource detectors (which causes resourceMetrics.resource.asyncAttributesPending to be true)

export OTEL_METRIC_EXPORT_INTERVAL=50     // 50ms export interval makes it easier to trigger this error, but is unrealistic
export OTEL_NODE_RESOURCE_DETECTORS='process,env,host'
  • Before fix, I trigger crash
Accessing resource attributes before async attributes settled
/Users/-------/Documents/sample_project/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:76
                throw new Error(`PeriodicExportingMetricReader: metrics export failed (error ${result.error})`);
                      ^

Error: PeriodicExportingMetricReader: metrics export failed (error Error: 14 UNAVAILABLE: No connection established. Last error: connect ECONNREFUSED 127.0.0.1:4318 (2024-09-20T22:43:31.324Z))
    at doExport (/Users/-------/Documents/sample_project/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:76:23)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at runNextTicks (node:internal/process/task_queues:64:3)
    at process.processImmediate (node:internal/timers:449:9)
    at process.callbackTrampoline (node:internal/async_hooks:130:17)
  • After fix, error is logged
Accessing resource attributes before async attributes settled
{"stack":"Error: PeriodicExportingMetricReader: metrics export failed (error Error: 14 UNAVAILABLE: No connection established. Last error: connect ECONNREFUSED 127.0.0.1:4318 (2024-09-21T00:02:04.200Z))\n    at doExport (/Users/-------/Documents/sample_project/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:76:23)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at runNextTicks (node:internal/process/task_queues:64:3)\n    at process.processImmediate (node:internal/timers:449:9)\n    at process.callbackTrampoline (node:internal/async_hooks:130:17)\n    at async PeriodicExportingMetricReader._doRun (/Users/-------/Documents/sample_project/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:84:13)\n    at async PeriodicExportingMetricReader._runOnce (/Users/-------/Documents/sample_project/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:55:13)","message":"PeriodicExportingMetricReader: metrics export failed (error Error: 14 UNAVAILABLE: No connection established. Last error: connect ECONNREFUSED 127.0.0.1:43158(2024-09-21T00:02:04.200Z))","name":"Error"}

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added
  • Documentation has been updated

@jj22ee jj22ee requested a review from a team as a code owner September 21, 2024 00:16
Copy link
Member

@pichlermarc pichlermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see:

if the attributes are pending AND the promise form waitForAsyncAttributes() resolves, then doExport() is called, and IF that rejects (that happens when when the exporter returns failure) then the the promise is unhandled.

Good catch 👍

I feel like the whole code-block should've really been:

    // Avoid scheduling a promise to make the behavior more predictable and easier to test
    if (resourceMetrics.resource.asyncAttributesPending) {
      try {
        await resourceMetrics.resource.waitForAsyncAttributes?.();
      } catch (err) {
        diag.debug('Error while resolving async portion of resource: ', err);
        return;
      }
    }

    await doExport();

to ensure that the error reprorted to globalErrorHandler actually has the correct stacktrace on it.

Please also add a changelog (./CHANGELOG.md) entry. Thanks for fixing this. 🙂

@jj22ee jj22ee force-pushed the metric-reader-fix-error-handling branch from ee2ed87 to bd39689 Compare September 23, 2024 17:21
@jj22ee
Copy link
Contributor Author

jj22ee commented Sep 23, 2024

ty! Added changelog.

Copy link

codecov bot commented Sep 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.92%. Comparing base (f8ab559) to head (bd39689).
Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5006      +/-   ##
==========================================
+ Coverage   93.39%   93.92%   +0.53%     
==========================================
  Files          46      310     +264     
  Lines         712     8138    +7426     
  Branches      120     1633    +1513     
==========================================
+ Hits          665     7644    +6979     
- Misses         47      494     +447     
Files with missing lines Coverage Δ
...etrics/src/export/PeriodicExportingMetricReader.ts 92.98% <ø> (ø)

... and 265 files with indirect coverage changes

@pichlermarc pichlermarc added this pull request to the merge queue Sep 24, 2024
Merged via the queue into open-telemetry:main with commit 5627d84 Sep 24, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants