Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: logtospanconnector #23182

Closed
2 tasks
kamalmarhubi opened this issue Jun 6, 2023 · 6 comments
Closed
2 tasks

New component: logtospanconnector #23182

kamalmarhubi opened this issue Jun 6, 2023 · 6 comments
Labels

Comments

@kamalmarhubi
Copy link
Contributor

kamalmarhubi commented Jun 6, 2023

The purpose and use-cases of the new component

The logtospanconnector would emit spans based on logs. This can be useful if program doesn't emit spans natively, or if its spans cannot be customised as much as its logs can. For example, many HTTP servers either do not include tracing, or do not offer as much customisation of attributes as they do for access logs. Similarly, database servers can be configured to log statements, where one can "smuggle" trace context via something like sqlcommenter or marginalia-style comments.

Running the OpenTelemetry collector to process these logs into spans would make it possible to meet those programs where they are, and improve observability overall.

Example configuration for the component

There are a few main things that you need to extract from the log record that aren't part of the logs data model:

  • ParentSpanID
  • StartTime and EndTime (one of these is likely the log record's Timestamp though—probably EndTime)
  • Name

The span's trace context can be assumed to live in the log record's TraceID and SpanID fields.

I've gone back and forth on configuration style for between:

  • a minimal connector that requires pairing with one or two instances of the transformprocessor (preprocess logs, postprocess spans), and possibly a routing processor to the correct logtospan pipeline if the collector is receiving logs from multiple sources
    • this is more orthogonal, but I found it to be slightly annoying in practice as you almost always need to do some transform work to compute the span's start or end time from a duration field, and probably have to do some work to extract the trace context and parent context, and to compute a name
  • something like the transformprocessor's list of statements, but the context has both a log record and a span on it
    • this is nice because you don't have to add several components, but it's a bit duplicative with the other processors
    • it also allows using a cache scratch space for work like computing the end time from a duration field
    • ideally the log record in the context would not allow mutation so that the processor can honestly say MutatesData: false
      • this is easy to do by failing calls to Set during execution, but ideally it configs that attempt to call set on log record fields in the context would be disallowed at startup

Assuming the more featureful version, then for a sidicar collector to an HTTP server you could have something like:

receivers:
  otlp:
exporters:
  otlp:

connectors:
  logtospan:
    statements:
      - set(span.trace_id, log.trace_id)
      - set(span.span_id, log.span_id)
      - set(span.attributes, log.attributes)
      - set(cache["duration_nanos"], span.attributes["latencyMs"] * 1000000)
      - set(span.end_time_unix_nano, log.time_unix_nano)
      - set(span.start_time_unix_nano, span.end_time_unix_nano - cache["duration_nanos"])
      - delete(span.attributes["latencyMs"])
      - set(cache["parent_context"], span.attributes["http_traceparent"])
      - replace_pattern(cache["parent_context"], "^(?:[0-9a-f]{2})-(?:[0-9a-f]{32})-(?P<parent_span_id>[0-9a-f]{16})-(?:[0-9a-f]{2})$$", "$$parent_span_id")
      - set(span.parent_span_id.string, cache["parent_context"])
      - delete(span.attributes["http_traceparent"])

service:
  pipelines:
    logs:
      receivers: [otlp]
      exporters: [logtospan]
    traces:
      receivers: [logtospan]
      exporters: [otlp]

Telemetry data types supported

This is a connector that accepts logs and emits traces.

Exporter Pipeline Type Receiver Pipeline Type
logs traces

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

We've been processing Google Cloud Load Balancer logs into spans with a custom pubsub client for about a year. I have been working on a replacement using a custom build of the opentelemetry collector with a connector like this, and realised it may be interesting to upstream it.

If this is interesting enough to upstream, I'll need to workshop the config style a bit. As mentioned above, I lean towards the transform-style, which is what I've got implemented so far.

@djaglowski
Copy link
Member

This is a cool use case, exactly the kind for which connectors were designed.

It does seem as though the biggest challenge here is bridging the gap between the data models.

There are a few main things that you need to extract from the log record that aren't part of the logs data model:

  • ParentSpanID
  • StartTime and EndTime (one of these is likely the log record's Timestamp though—probably EndTime)
  • Name

Is there any set of sane defaults for these that would allow the connector to work "out of the box"? i.e. with no pre-processing necessary? If there is, I assume there would be some limitations on the usefulness of the output, but still it would be nice if users could drop it in and then opt in to more refined capabilities.

The span's trace context can be assumed to live in the log record's TraceID and SpanID fields.

Since these aren't required fields in the log data model, what would be the behavior if they are empty? Dropping such logs seems reasonable to me in this case but I'm curious if there is any alternative.

a minimal connector that requires pairing with one or two instances of the transformprocessor (preprocess logs, postprocess spans), and possibly a routing processor to the correct logtospan pipeline if the collector is receiving logs from multiple sources

If the connector could work "out of the box", even in a limited form, this option might make sense because the user could start with just the connector, and then put in a pre-processor and iterate on it to take advantage of additional mapping capabilities. However, it seems the connector would be extremely limited. Depending on a tight coupling with another component is not a great approach.

something like the transformprocessor's list of statements, but the context has both a log record and a span on it

I like where you're going with this, but it may be overly ambitious extend/recreate the necessary grammar.

There might be a middle road that allows the connector to be more self-contained but also minimizes complexity and overlap with processors. Rather than a generalized set of "statements", could the configuration directly specify how the mapping should work? I think we would still need to borrow OTTL's syntax for referring to fields, and there would be some cases where the user needs to pre-process.

  # keys are span fields. values are log fields
  logtospan:

    # defaults to 'trace_id'
    trace_id: trace_id

    # defaults to 'span_id'
    span_id: span_id 

    # defaults to all attributes. If user lists any attributes, only those are copied
    copy_attributes: []
    #  - attributes["foo"]
    #  - attributes["bar"]
    
    # specify exactly 2 of 'start_time_unix_nano', 'end_time_unix_nano', 'duration'
    # 'start_time_unix_nano' = 'end_time_unix_nano' - duration
    # 'end_time_unix_nano' = 'start_time_unix_nano' + duration
    # start_time_unix_nano:
    end_time_unix_nano: time_unix_nano
    duration:   # could support some basic parsing
      parse_from: attributes["latencyMs"]
      unit: ms

    # anything fancy here probably requires preprocessing, but maybe there's a creative solution?
    parent_span_id.string: attributes["parent_span_id"]

    # maybe a reasonable default?
    name: attributes["event.name"]

    # defaults to empty list. Applied after all other mapping is complete
    drop_attributes: []
    #  - attributes["parent_span_id"]
    #  - attributes["latencyMs"]

  # Same as above, without comments and defaults
  logtospan/min_config:
    end_time_unix_nano: time_unix_nano
    duration:   # could support some basic parsing
      parse_from: attributes["latencyMs"]
      unit: ms
    parent_span_id.string: attributes["parent_span_id"]

What do you think?

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • needs: Github issue template generation code needs this to generate the corresponding labels.

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@kamalmarhubi
Copy link
Contributor Author

@djaglowski

Is there any set of sane defaults for these that would allow the connector to work "out of the box"? i.e. with no pre-processing necessary?

I don't think there can be, unless we expect this to be used primarily for a root span. Even then we'd ideally want to get the duration or end time from somewhere.

something like the transformprocessor's list of statements, but the context has both a log record and a span on it

I like where you're going with this, but it may be overly ambitious extend/recreate the necessary grammar.

I hackishly implemented it this way a while back, though didn't get around to setting it up in our pipeline. If you're interested I could rebase / tidy up and push that branch up as a draft PR.

My concern with the middle road approach is that it's recreating a non-trivial amount of the OTTL transform, but in an ad hoc way.

@github-actions github-actions bot removed the Stale label Sep 9, 2023
@atoulme atoulme removed the needs triage New item requiring triage label Oct 4, 2023
Copy link
Contributor

github-actions bot commented Dec 4, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

github-actions bot commented Feb 5, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

github-actions bot commented Apr 5, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants