fix(tracing): make spans resilient to performance clock drift #3434

dyladan · 2022-11-21T18:14:02Z

This is a simpler alternative to #3384

Described by the comment here: #3279 (comment)

codecov · 2022-11-21T18:35:06Z

Codecov Report

Merging #3434 (830c799) into main (2dcc898) will increase coverage by 0.02%.
The diff coverage is 96.96%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3434      +/-   ##
==========================================
+ Coverage   93.78%   93.80%   +0.02%     
==========================================
  Files         249      249              
  Lines        7621     7637      +16     
  Branches     1588     1589       +1     
==========================================
+ Hits         7147     7164      +17     
+ Misses        474      473       -1

Impacted Files	Coverage Δ
packages/opentelemetry-sdk-trace-base/src/Span.ts	`98.58% <95.65%> (-1.42%)`	⬇️
...s/opentelemetry-instrumentation-fetch/src/fetch.ts	`97.02% <100.00%> (+0.01%)`	⬆️
...emetry-instrumentation-xml-http-request/src/xhr.ts	`97.59% <100.00%> (+0.01%)`	⬆️
packages/opentelemetry-core/src/common/time.ts	`98.55% <100.00%> (+2.96%)`	⬆️
...-trace-base/src/platform/node/RandomIdGenerator.ts	`93.75% <0.00%> (+6.25%)`	⬆️

packages/opentelemetry-sdk-trace-base/src/Span.ts

t2t2 · 2022-11-22T18:35:45Z

packages/opentelemetry-sdk-trace-base/src/Span.ts

+      return epochMillisToHrTime(inp.valueOf());
+    }
+
+    if (isTimeInputHrTime(inp)) {


This version seems to be susceptible to drift in fetch instrumentation (undefined start + hrTime end):

opentelemetry-js/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts

Line 281 in 3290b25

const endTime = core.hrTime();

Because hrTime() does getTimeOrigin() + performance.now() it would generate drifted HrTime

This is because an HrTime formatted timestamp is not corrected in any case. Currently, the start time also uses the performance timing API indirectly because a start time is not provided and the performance clock is used. This means that in the current state the whole span is shifted anyway.

My recommendation would be to:

Change core.hrTime to return an HrTime generated using Date.now

Update fetch instrumentation to not pass a time manually

Either of these changes would fix the issue here, but I actually recommend we do both.

Alternative to (2) would be to update fetch instrumentation to get start and end times from the performance timing API

@t2t2 does the updated version address your concern?

fetch and xhr have both been updated. I think this is resolved but I'll wait for @t2t2 to confirm.

t2t2 · 2022-11-22T18:38:18Z

like previous pr, codesandbox with the changes, artificial drift and most common instrumentations (with already fixed the _performanceOffset issue in first comment)

…try-js into simple-fix-timestamps

legendecas · 2023-01-09T03:02:56Z

The test (Span # should have an entered time for event) is unstable with the change.

dyladan · 2023-01-09T13:11:24Z

The test (Span # should have an entered time for event) is unstable with the change.

Yeah I see that. I'm trying to work on it now. ~~I think it was related to changing span end to be a Date~~

edit: I see the test relates to events not end. Still looking into it

dyladan · 2023-01-09T16:08:09Z

Looks like the test was providing time 0 to startTime for the span which caused the span start to use Date.now to generate a start time. It then provided 123 to the event time which caused a drift correction. The test was flaky because it depended on the drift correction being millisecond accurate.

dyladan · 2023-01-09T16:37:16Z

@legendecas should be stable now

dyladan · 2023-01-09T19:42:01Z

@t2t2 I updated the web instrumentations to use Date instead of performance.now to get span end times which I think should address your concerns. I would appreciate your review because you've been very helpful. Hopefully we can get this merged soon.

blumamir

The logic LGTM. Added a few non-blocking comments.

It seems there are so many logical branching here and was wondering if the test coverage is sufficient and covers all the potential permutations.

Each time can be of 5 types: undefined, Date.now(), number epoch, number performance now and hrTime. A span receives times via span constructor, event, and end. I think some of the common combinations or those that are specifically addressed by code can benefit from a dedicated unit test.

packages/opentelemetry-sdk-trace-base/src/Span.ts

blumamir · 2023-01-10T08:25:56Z

packages/opentelemetry-sdk-trace-base/src/Span.ts

@@ -203,15 +227,41 @@ export class Span implements api.Span, ReadableSpan {
    this._spanProcessor.onEnd(this);
  }

+  private _getTime(inp?: TimeInput): HrTime {
+    if (typeof inp === 'number' && inp < otperformance.now()) {


in core we have similar logic where we use origin for this check, and not now

// Must be a performance.now() if it's smaller than process start time. if (time < getTimeOrigin()) { return hrTime(time);

I believe they both work fine, but better to have consistency in the project, even maybe extract this logic into a function isTimeInputPerformanceNow(time: TimeInput) similar to the existing function isTimeInputHrTime, this is also self documenting and will make the comment below redundant

This change is actually very important to the functioning of the PR. I think the function in core should be deprecated.

The reason to use performance.now is that we can be much more sure that a given number is a performance timestamp. A number before timeOrigin may be ingested from historical logs or come from an inconsistent time source which has been corrected and timeOrigin is wrong.

That makes sense, thanks for the response.
My point is that we are doing the same computation in 2 different places in 2 different ways. This comment is a suggestion to unify it (and maybe extract the logic into a function while we are at it). Could be done in this PR, later one, or not at all

packages/opentelemetry-core/src/common/time.ts

dyladan requested a review from a team November 21, 2022 18:14

dyladan force-pushed the simple-fix-timestamps branch 2 times, most recently from abef3d2 to 5d8dd46 Compare November 21, 2022 18:18

fix(tracing): make spans resilient to performance clock drift

7083983

dyladan force-pushed the simple-fix-timestamps branch from 5d8dd46 to 7083983 Compare November 21, 2022 18:25

dyladan added 2 commits November 21, 2022 13:25

Fix changelog

5794fbc

Do not export getTimeOrigin

7890388

dyladan added 5 commits November 22, 2022 10:50

Apply shift to shim spans

a764b2b

Lint

e991a6c

Lint

4405fc0

Merge branch 'main' into simple-fix-timestamps

ee10895

Merge branch 'main' into simple-fix-timestamps

27d72df

t2t2 reviewed Nov 22, 2022

View reviewed changes

packages/opentelemetry-sdk-trace-base/src/Span.ts Outdated Show resolved Hide resolved

t2t2 reviewed Nov 22, 2022

View reviewed changes

dyladan added 3 commits November 22, 2022 14:38

Remove unused imports

44d9dc5

Merge branch 'simple-fix-timestamps' of github.com:dyladan/openteleme…

2a5fc2f

…try-js into simple-fix-timestamps

Fix drift calculation

0149f56

t2t2 mentioned this pull request Dec 6, 2022

Integrate otel's performance clock drift fix signalfx/splunk-otel-js-web#498

Merged

dyladan added 2 commits December 14, 2022 16:56

Merge remote-tracking branch 'origin/main' into simple-fix-timestamps

7c540fe

Remove bad import

7626ca2

JacksonWeber approved these changes Dec 15, 2022

View reviewed changes

MisterSquishy mentioned this pull request Dec 28, 2022

use Date.now() for instrument recording timestamps #3514

Merged

1 task

Merge remote-tracking branch 'origin/main' into simple-fix-timestamps

26a27f0

SimenB mentioned this pull request Jan 2, 2023

chore: release API 1.4.0 / SDK 1.9.0 / 0.35.0 #3516

Merged

legendecas mentioned this pull request Jan 3, 2023

fix(instrumentation): fix web instrumentation span clock drifts #3518

Closed

5 tasks

dyladan added 3 commits January 3, 2023 11:39

Merge remote-tracking branch 'origin/main' into simple-fix-timestamps

488b5da

Use Date.now for fetch span end

9cc92bf

Fetch changelog

8bddc96

dyladan added 5 commits January 6, 2023 11:16

Merge branch 'main' into simple-fix-timestamps

91cfd37

Test addHrTimes

63f3b2c

lint

e66b75f

lint

b848140

lint

f4c1be0

Fix flaky test

8c48a34

dyladan added 2 commits January 9, 2023 11:15

lint

6790d17

Use perf timer for perf API

3b61190

legendecas approved these changes Jan 9, 2023

View reviewed changes

dyladan added 3 commits January 9, 2023 14:17

Apply date fix to xhr

5614bcb

Changelog

f6d88aa

lint

e763118

pichlermarc approved these changes Jan 10, 2023

View reviewed changes

t2t2 approved these changes Jan 10, 2023

View reviewed changes

blumamir approved these changes Jan 10, 2023

View reviewed changes

dyladan added 3 commits January 10, 2023 08:21

Revert to previous addHrTimes impl

03c96c0

Review comments

fd2398d

Fix flaky test

830c799

dyladan merged commit 969bb62 into open-telemetry:main Jan 11, 2023

dyladan deleted the simple-fix-timestamps branch January 11, 2023 20:37

This was referenced Jan 24, 2023

fix(tracing): make spans resilient to performance clock drift #3384

Closed

Provide a supported way to get anchored clock times #3279

Closed

pichlermarc mentioned this pull request Mar 14, 2023

Logs SDK #3549

Merged

4 tasks

Flarna mentioned this pull request Apr 11, 2023

Fetch Instrumentation Start and End Times Can Be Off By Up to a Few Milliseconds #3719

Closed

dyladan mentioned this pull request May 17, 2023

feat(api-logs): add ObservedTimestamp to LogRecord #3787

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tracing): make spans resilient to performance clock drift #3434

fix(tracing): make spans resilient to performance clock drift #3434

dyladan commented Nov 21, 2022

codecov bot commented Nov 21, 2022 •

edited

Loading

t2t2 Nov 22, 2022

dyladan Nov 22, 2022 •

edited

Loading

dyladan Nov 22, 2022

legendecas Jan 4, 2023

dyladan Jan 9, 2023

t2t2 Jan 10, 2023

t2t2 commented Nov 22, 2022

legendecas commented Jan 9, 2023

dyladan commented Jan 9, 2023 •

edited

Loading

dyladan commented Jan 9, 2023

dyladan commented Jan 9, 2023

dyladan commented Jan 9, 2023

blumamir left a comment

blumamir Jan 10, 2023

dyladan Jan 10, 2023

blumamir Jan 10, 2023

fix(tracing): make spans resilient to performance clock drift #3434

fix(tracing): make spans resilient to performance clock drift #3434

Conversation

dyladan commented Nov 21, 2022

codecov bot commented Nov 21, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

dyladan Nov 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t2t2 commented Nov 22, 2022

legendecas commented Jan 9, 2023

dyladan commented Jan 9, 2023 • edited Loading

dyladan commented Jan 9, 2023

dyladan commented Jan 9, 2023

dyladan commented Jan 9, 2023

blumamir left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 21, 2022 •

edited

Loading

dyladan Nov 22, 2022 •

edited

Loading

dyladan commented Jan 9, 2023 •

edited

Loading