Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the mapping between Google Cloud Logging and OpenTelemetry #3774

Open
alexvanboxel opened this issue Nov 19, 2023 · 13 comments
Open
Assignees
Labels
spec:logs Related to the specification/logs directory triage:deciding:needs-info Not enough information. It is left open to provide the author with time to add more details

Comments

@alexvanboxel
Copy link

What are you trying to achieve?

With this draft collector PR the PubsubReceiver will be able to handle the native translation of Google Cloud LogEntry (this is a feature provided by Google Cloud Logging: see View logs routed to Pub/Sub). We need more details on all the mapping from the LogEntry to the OpenTelemetry fields and attributes.

Additional context.

See this issue on the collector: open-telemetry/opentelemetry-collector-contrib#23184

@alexvanboxel
Copy link
Author

I've opened a draft PR #3775 on the proposed clarifications. It would be great to have some involvement for Google contributors.

@jack-berg
Copy link
Member

This makes sense, but the document modified in #3775 isn't normative. Perhaps this type of language belongs in a new document in the compatibility directory. Additionally, the new attributes defined should probably be defined and added to the semantic-conventions.

@jack-berg jack-berg assigned jsuereth and unassigned reyang Nov 29, 2023
@jsuereth
Copy link
Contributor

Hey @alexvanboxel,

I really like your suggested PR in that it adds the read side of the OLTP<-> Google Cloud Logging Mapping.

We've been having a lot of internal conversations around this mapping and have a (similar) proposed spec. @jack-berg and I discussed where this documentation might belong (see his comment above). I think we have a few options:

  1. We can create a new compatibility section that outlines this mapping.
  2. We could put this document in the collector near the receiver to denote what it does to GCP data.

Which would you prefer?

@alexvanboxel
Copy link
Author

I think a new compatibility section is a great idea. Does this mean that all the sections in the datamodel-appendix would move to the compatibility section?

@alexvanboxel
Copy link
Author

Additionally, the new attributes defined should probably be defined and added to the semantic-conventions.

I don't think I've added anything new. I've used the semcon that was available (except the gcp.*). I want to go further and define the mapping for the HTTP structure, but although HTTP is described as stable, it is not clearly defined for logs yet.

@alexvanboxel
Copy link
Author

@jsuereth I've moved it in the branch to the compatibility section

@alexvanboxel
Copy link
Author

@jack-berg and @jsuereth doesn't it make sense to have a compatibility section in the logs directory itself and that all model-appendix move (one by one) to that directory?

@kamalmarhubi
Copy link

It was silly of me to comment on the already-closed PR, so pasting my comments here.

on sourceLocation:

looking at docs for LogEntrySourceLocation I this ought to be mapped to the equivalent semconvcode.* attributes as follows:

  • sourceLocation.file => code.filepath
  • sourceLocation.line => code.lineno

Less sure of what to do with sourceLocation.function, which is described as

Human-readable name of the function or method being invoked, with optional context such as the class or package name. This information may be used in contexts such as the logs viewer, where a file and line number are less meaningful. The format can vary by language. For example: qual.if.ied.Class.method (Java), dir/package.func (Go), function (Python).

it ought to go into the code.namespace and code.function attributes. if you happen to have a sample of data across different languages for which gcp has a logging integration, perhaps the right thing to do can be figured out based on that? failing that, a reasonable best effort might be (python style split / indexing):

  • code.function gets sourceLocation.function.split(".")[-1]
  • code.namespace gets ".".join(sourceLocation.function.split(".")[:-1])




on httpRequest:

could be worth attempting to map these to http semconv? not sure why I didn't do it initially.

  • requestMethod => http.request.method
  • requestUrl => url.full
  • requestSize => see [0] below
  • status => http.response.status_code
  • responseSize => see [0] below
  • userAgent => user_agent.original
  • remoteIp => client.address
  • serverIp => server.address
  • referer => http.request.header.referer[0]
  • protocol => net.protocol.name (lower case), net.protocol.version

The remaining ones can be mapped to gcp.*:

  • latency
  • cacheLookup
  • cacheHit
  • cacheValidatedWithOriginServer
  • cacheFillBytes

[0] I had previously opened open-telemetry/semantic-conventions#84 to add "whole-request" size attributes, ie headers + body. I will rebase / update the PR this week to see if it gets any traction.

@austinlparker
Copy link
Member

@open-telemetry/collector-contrib-maintainer can this get moved to collector-contrib repo?

@austinlparker austinlparker added sig-issue A specific SIG should look into this before discussing at the spec and removed [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR labels Apr 30, 2024
@alexvanboxel
Copy link
Author

I can add the mapping in the collector README, but I feel the documentation should be either in spec/compatibility or either semconv/compatibiltity.

@mx-psi
Copy link
Member

mx-psi commented May 6, 2024

I guess we can do the same we did on #3964 (review). I still think collector-contrib is not the right place for this long term, but for now it seems reasonable to me

@alexvanboxel
Copy link
Author

I'm actively working on improving the stability of the gcp pubsub log -> otlp log; I will include the documentation in the same commit. But a side note: It only makes sense to add the conventions I use in the receiver; I hope Google (or any other provider) will document their mapping.

@trask
Copy link
Member

trask commented Oct 15, 2024

hey @alexvanboxel, what's the latest status of this? thanks

@trask trask added triage:deciding:needs-info Not enough information. It is left open to provide the author with time to add more details and removed sig-issue A specific SIG should look into this before discussing at the spec labels Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:logs Related to the specification/logs directory triage:deciding:needs-info Not enough information. It is left open to provide the author with time to add more details
Projects
None yet
Development

No branches or pull requests

8 participants