Description
Edit: duplicate of #961.
Requirement - what kind of business use case are you trying to solve?
The current behavior of clock skew adjustments is surprising and, why well-meant, not beneficial.
Problem - what in Jaeger blocks you from solving the requirement?
Excuse the pun, I'm aware of the existing issues.
The clock skew adjuster currently makes the following assumptions:
- Spans are from different hosts, with parent and child span having roughly the same duration.
- Clock Skew is something unknown to people using monitoring
- It is best to hide this fact and instead of relying on administrators to fix this and offer a best guess workaround as an alternative, it's best to make things look “nice”
Considering 1.:
The algorithm uses the not well documented ip
tag of the process to distinguish between hosts and assumes a missing tag indicates a different host.
People using OpenTracing or OpenCensus usually only know about the service name, so a missing ip
tag is expected, meaning the algorithm compensates nearly all spans. See
- Clock skew adjuster and spans from web/mobile #722
- Make clock skew adjustment transparent #961
- Adjuster fixes. #606
Then it compares durations, which makes only sense one assumes that spans roughly have the same size.
Most child spans would be shorter, so this is rarely the case. If a child span is a few nanoseconds longer I don't care, if it is really longer it should be flagged, but not shifted. It's wrong data and really hard to diagnose if it is processed afterwards.
Then the algorithm checks if the end time of the child span is after the end time of the parent. While technically wrong, it is pretty easy to close the child span a couple of nanoseconds too late - probably a non-issue considering tracing, and at most something that should be a warning instead of moving the span around.
Lastly it assumes that span duration somehow correlates to network latency.
Proposal - what do you suggest to solve the problem or improve the existing situation?
If clock skew is an issue, the span context should try to compensate.
Considering 2. and 3. it would IMHO be best to not mess with the data by default and offer beautification as an opt-in workaround, which is different than the currently solutions by turning it on by default and searching for a way to get rid of it.
Also, it might be useful to see clock skew by default, because it may affect other areas too.
Any open questions to address
I saw very few comments where this skew adjustent is actually wanted. Maybe some of the supporters can chime in?
Activity