-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add conventions for log correlation #114
Changes from all commits
f8cc2cf
0a724be
ea90540
058ae05
3a33eb3
a86472b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,149 @@ | ||||||
# Conventions for Trace and Resource Association in Logs | ||||||
|
||||||
Create standards for correlating traces and resources in existing text logs. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
Traces and logs present two separate perspectives on what was occurring when an | ||||||
operation was executed. In order to tie together the two perspectives, certain | ||||||
information needs to be added to the logs when they were generated within an | ||||||
open span. This document lays out the information that needs to be emitted in | ||||||
order to correlate the two data types, and how the data can be added to existing | ||||||
text log formats so that it can be recognized when the logs are parsed. | ||||||
|
||||||
## Explanation | ||||||
|
||||||
Two types of correlation need to happen to tie a log record into its full | ||||||
execution context. The first is Request Correlation, which ties the record to | ||||||
operations that were occurring when the record was created. The second is | ||||||
Resource Correlation, which associates the entry with where the event occurred | ||||||
such as a host, a pod, or a virtual machine. | ||||||
|
||||||
### Request Correlation | ||||||
|
||||||
Request correlation is achieved primarily with two values, a trace identifier | ||||||
and a span identifier. A trace may contain multiple spans, arranged as a tree, | ||||||
and may also contain links to other related spans. The combination of a trace | ||||||
identifier and a span identifier corresponds to a specific scope of work. In the | ||||||
tracing API, that scope can also contain attributes and events that describe | ||||||
what work the program being traced was currently performing. | ||||||
|
||||||
In most cases when traces and logs are used in tandem, the attributes of the | ||||||
current span do not need to be added to the log entry, because it would duplicate | ||||||
transmission of those values. As one span is likely associated with several (or | ||||||
many) log entries, it is more efficient to transmit span attributes with the | ||||||
span once rather than many times with each log entry. As a result, for most | ||||||
purposes the goal is to set the three values of traceid, spanid, and traceflags | ||||||
into each log entry that should be associated with that span. | ||||||
|
||||||
Request correlation fields only make sense when the log event occurred within | ||||||
an open span, so the fields in the table below are only required when the log | ||||||
event is to be correlated. | ||||||
|
||||||
| Field | Required | Format | ||||||
| :--------- | :------- | :-------------------------------------------------- | ||||||
| traceid | Yes | 16-byte numeric value as Base16-encoded string | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least in Python, the log format requires all fields to be present on the log record object or it throws an exception. Also, the log format cannot be changes dynamically depending on whether an active span (or fields on log record) is present or not. So in this case we can either leave the fields empty so that the formatted logs look like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @owais what would you prefer? Also, it seems like a fairly specific use case, could it be left to whoever configures the logger on how they want to represent the lack of trace context? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a slight preference for
It could but then backends will have to deal with multiple ways of representing nil values (0 vs '' vs null in JSON, etc). It would be nicer if the spec recommended one thing when the field cannot be omitted completely so analysis tools can have a standard way of detecting it. |
||||||
| spanid | Yes | 8-byte value represented as Base16-encoded string | ||||||
| traceflags | No | 8-bit numeric value as a Base16-encoded string | ||||||
|
||||||
`traceflags` is a numeric field that corresponds to the W3C trace flags | ||||||
[definition](https://www.w3.org/TR/trace-context/#trace-flags), and the | ||||||
definition for OpenTelemetry logs should track updates to that definition | ||||||
as they are made. | ||||||
|
||||||
### Resource Correlation | ||||||
|
||||||
As important as what was occurring in a program’s execution is, where it was | ||||||
occurring is just as important. Resource correlation allows a log entry to be | ||||||
associated with an infrastructure resource, and in turn system and program | ||||||
metrics that describe the wider program state. The form that the resource takes | ||||||
is also more diverse than the tracing scope: the resource may be a pod running | ||||||
in a Kubernetes cluster, a virtual machine running in a cloud, a serverless | ||||||
lambda, or an old-school server sitting in a data center. An application | ||||||
environment may be an orchestration of multiple types of resource working | ||||||
together. | ||||||
|
||||||
From a logging standpoint, a resource is also almost always a constant within an | ||||||
application process- a container may not have the same identifier on every run, | ||||||
but it does keep that identifier while it’s running, and that resource | ||||||
identifier is constant for every log entry created on that resource. As a | ||||||
result, resource correlation may not happen at the log entry level, so we may or | ||||||
may not put the resource correlation information in the log entry itself. | ||||||
|
||||||
Resource information may be managed as part of the log ingestion process- for | ||||||
example a Docker logging driver will know which container logs came from. | ||||||
Full resource information may also not be available, as when logs are | ||||||
aggregated by syslog or a similar system that has less context available to it. | ||||||
As a result resource correlation information in the logs entries themselves | ||||||
should be considered optional. If included, the resource information should | ||||||
follow the [semantic conventions for resources](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions). | ||||||
|
||||||
If the log entry is expressed in key-value pairs, any resource keys should be | ||||||
prepended with ‘resource’- for example, `resource.service.name=”shoppingcart”`. | ||||||
If the entry is expressed in JSON, the resource key-values should be placed in | ||||||
an object named “resource” at the top level of the object. | ||||||
|
||||||
### Correlation Context | ||||||
|
||||||
A Correlation Context is a set of key-value pairs that is shared amongst the | ||||||
spans of a distributed trace. Like span attributes, the correlation context may | ||||||
already be carried with span information, so duplicating this information may be | ||||||
redundant. In certain cases it may be important to associate this context with | ||||||
log entries. When the context is embedded in a log entry, the key-value | ||||||
pairs should be placed in a 'ctx' namespace. Where key-value pairs are | ||||||
supported, embed the correlation key as “correlation.key_name”. In JSON or | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
other formats that allow nested structures, the key-value pairs should be | ||||||
placed in an object named ‘correlation’ at the top of the object. | ||||||
yurishkuro marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
## Internal details | ||||||
|
||||||
### Examples | ||||||
|
||||||
#### Key-Value Pairs | ||||||
|
||||||
```text | ||||||
2020-05-20 20:13:31 INFO Message logged. resource.hostname=myhost | ||||||
ctx.user=djones traceid=0354af75138b12921 spanid=14c902d73a traceflags=01 | ||||||
``` | ||||||
|
||||||
#### JSON | ||||||
|
||||||
```json | ||||||
{ | ||||||
"time": "2020-05-20 20:13:31", | ||||||
"msg": "Message logged.", | ||||||
"level": "INFO", | ||||||
"ctx": { | ||||||
"user": "djones" | ||||||
}, | ||||||
"resource": { | ||||||
"hostname":"myhost" | ||||||
}, | ||||||
"traceid": "0354af75138b12921", | ||||||
"spanid": "14c902d73a", | ||||||
"traceflags": "01" | ||||||
} | ||||||
``` | ||||||
|
||||||
### Custom Format | ||||||
|
||||||
Custom formats that don’t allow for automatic parsing of key-value pairs can be | ||||||
used, but they will require synchronization between the output format and the | ||||||
extraction mechanism. These types of extractions may have the advantage of being | ||||||
less verbose, but they also have the disadvantage of requiring setting up a | ||||||
custom extraction process, and may be more fragile. Since this approach is | ||||||
vendor-dependent, there is little guidance that can be provided by | ||||||
OpenTelemetry. | ||||||
|
||||||
## Prior art and alternatives | ||||||
|
||||||
Elastic Common Schema [has standards](https://www.elastic.co/guide/en/ecs/current/ecs-tracing.html#ecs-tracing) | ||||||
for adding trace information to JSON log formats, but they do not support the | ||||||
full OpenTelemetry correlation model. | ||||||
|
||||||
## Future possibilities | ||||||
|
||||||
A stronger specification should be created for logs that are generated by | ||||||
OpenTelemetry instrumentation and adapters that support conversion from and | ||||||
to OpenTelemetry's internal logging models. As that specification is created, | ||||||
they should be kept in sync with these conventions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refers to base-16 encoded string. However, what if my logging medium is binary and allows proper representation of arbitrary by sequences? Do I still have to use base-16 strings or I can just emit the bytes? If this recommendation is for text medium (such as text log files) it may be worth calling out specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the underlying presumption with this document is that if certain fields are set in certain ways, we will be able to recognize them and use them when the log is processed. I don't think that's going to be true one way or another with a binary representation, so I'll add the text-based caveat to the introduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics of Required are unclear here. Some logs may be emitted when there is no trace context present, so they cannot contain trace-id. And the whole feature of trace/log correlation is technically optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clarify, thanks.