Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support finding JSON after the line start for otlpjsonfilereceiver #33846

Closed

Conversation

zeitlinger
Copy link
Member

@zeitlinger zeitlinger commented Jul 2, 2024

Description:

Add support finding JSON after the line start for otlpjsonfilereceiver

Example OTLP/JSON produced by the produced by the OTel Java Agent using https://opentelemetry.io/docs/languages/java/configuration/#logging-otlp-json-exporter

[otel.javaagent 2024-07-02 10:11:07:368 +0000] [BatchLogRecordProcessor_WorkerThread-1] INFO io.opentelemetry.exporter.logging.otlp.OtlpJsonLoggingLogRecordExporter - {"resource":{"attributes":[{"key":"container.id","value":{"stringValue":"075da3b7317ceccd6df58562684a8092040aacca8b5b0c49eacb33f1d2fe15b9"}},{"key":"deployment.environment","value":{"stringValue":"staging"}},{"key":"host.arch","value":{"stringValue":"amd64"}},{"key":"host.name","value":{"stringValue":"anti-fraud-7b498c4dcb-f5wqj"}},{"key":"os.description","value":{"stringValue":"Linux 6.5.0-41-generic"}},{"key":"os.type","value":{"stringValue":"linux"}},{"key":"process.command_args","value":{"arrayValue":{"values":[{"stringValue":"/opt/java/openjdk/bin/java"},{"stringValue":"-jar"},{"stringValue":"./app.jar"}]}}},{"key":"process.executable.path","value":{"stringValue":"/opt/java/openjdk/bin/java"}},{"key":"process.pid","value":{"intValue":"1"}},{"key":"process.runtime.description","value":{"stringValue":"Eclipse Adoptium OpenJDK 64-Bit Server VM 21.0.3+9-LTS"}},{"key":"process.runtime.name","value":{"stringValue":"OpenJDK Runtime Environment"}},{"key":"process.runtime.version","value":{"stringValue":"21.0.3+9-LTS"}},{"key":"service.instance.id","value":{"stringValue":"7e31966b-5668-4338-913a-5e2601d75e25"}},{"key":"service.name","value":{"stringValue":"anti-fraud"}},{"key":"service.namespace","value":{"stringValue":"shop"}},{"key":"service.version","value":{"stringValue":"1.1"}},{"key":"telemetry.distro.name","value":{"stringValue":"grafana-opentelemetry-java"}},{"key":"telemetry.distro.version","value":{"stringValue":"2.4.0-beta.1"}},{"key":"telemetry.sdk.language","value":{"stringValue":"java"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.38.0"}}]},"scopeLogs":[{"scope":{"name":"com.mycompany.antifraud.FraudDetectionController","attributes":[]},"logRecords":[{"timeUnixNano":"1719915066488000000","observedTimeUnixNano":"1719915066488267425","severityNumber":13,"severityText":"WARN","body":{"stringValue":"checkOrder(totalPrice=300, shippingCountry=, customerIpAddress=127.0.0.1) fraudScore=15, status=REJECTED"},"attributes":[{"key":"thread.id","value":{"intValue":"44"}},{"key":"thread.name","value":{"stringValue":"http-nio-8080-exec-1"}}],"flags":1,"traceId":"336f93f9f72b9fec3e4e01e38cb6a99c","spanId":"de97c85b1ee0669a"}]}]}

Testing:

unit tests

Documentation:

not needed - it just works in more cases, e.g. with OTel Java Agent

@zeitlinger zeitlinger changed the title Add support for regular expressions and signal disabling for otlpjsonfilereceiver Add support finding JSON after the line start for otlpjsonfilereceiver Jul 3, 2024
@zeitlinger
Copy link
Member Author

updated the pr so that it doesn't require any config

component: otlpjsonfilereceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add support finding JSON after the line start for otlpjsonfilereceiver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn’t seem to be what this change is about

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change

@@ -35,60 +36,6 @@ func TestDefaultConfig(t *testing.T) {
require.NoError(t, componenttest.CheckConfigStruct(cfg))
}

func TestFileTracesReceiver(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you are deleting this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is incorporated into TestFileReceiver as a parameterized test

@atoulme
Copy link
Contributor

atoulme commented Jul 6, 2024

I don’t understand the rationale for this change. I am also not sure why the java format is not compliant with the spec.

@zeitlinger
Copy link
Member Author

I don’t understand the rationale for this change.

In k8s, the log payload is packaged - see https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/container.md

The docker json encoding is not supported yet - I'll add that.

I am also not sure why the java format is not compliant with the spec.

it could be made compliant I guess, but a more lenient approach can support more existing implementations without any drawback - that's at least the idea.

@ChrsMark
Copy link
Member

Since #33912 is affected by this one I would like to ensure that I understand the use-case.

From what I can see in the unit-test the target "logs" would be read from the disk and are wrapped in a container format.
Then the encapsulated message is an otlpjson record.

So it would be something like

2024-07-03T05:55:54.936088091Z stderr F [otel.javaagent 2024-07-03 05:55:54:935 +0000] [BatchLogRecordProcessor_WorkerThread-1] INFO io.opentelemetry.exporter.logging.otlp.OtlpJsonLoggingLogRecordExporter - {otlpjson}

If that's correct, then the container format should be handled first using the container operator of the filelog receiver and apply any additional handling after. Something similar we would do in case we have an application which logs in json.
In that case the logs on the disk look like the following:

2024-07-03T05:55:54.936088091Z stderr F {json} 

In that case we would have the following:

    receivers:
      filelog:
        include:
        - /var/log/pods/*/*/*.log
        operators:
        - id: container-parser
          type: container
       -  type: json_parser
          if: 'body matches "^{.*}$"'
          .....# add any additional settings

Alternatively, the additional json parsing could be performed in a transform processor using the ParseJSON(body) function.

This looks to be the "natural" approach for unwrapping this type of content so to this current use-case I think the container format handling should happen first and then forward the body/content to any kind of further processing.
As @djaglowski mentioned in the SIG meeting as well maybe this otlpjson handling would best fit in a standalone processor/connector.

Let me know if I miss anything here:)

@zeitlinger
Copy link
Member Author

Thanks @jpkrohling for helping me understand how this would work

Here's a sample config of using a connector:

connectors:
    otlpjson:

processors:

receivers:
    filelogreceiver:

exporters:
    otlp:

service:
    pipeline:
        logs/raw:
            receivers:
            - filelog # raw log entry
            exporters:
            - otlpjson # converts raw into otlp json

        traces/otlp:
            receivers:
            - otlpjson # otlp json is visible -> converts to pdata
            exporters:
            - otlp
        metrics/otlp:
            receivers:
            - otlpjson
            exporters:
            - otlp
        logs/otlp:
            receivers:
            - otlpjson
            exporters:
            - otlp

wdyt?

@djaglowski
Copy link
Member

djaglowski commented Jul 11, 2024

If that's correct, then the container format should be handled first using the container operator of the filelog receiver and apply any additional handling after. Something similar we would do in case we have an application which logs in json. In that case the logs on the disk look like the following:

2024-07-03T05:55:54.936088091Z stderr F {json} 

In that case we would have the following:

    receivers:
      filelog:
        include:
        - /var/log/pods/*/*/*.log
        operators:
        - id: container-parser
          type: container
       -  type: json_parser
          if: 'body matches "^{.*}$"'
          .....# add any additional settings

Alternatively, the additional json parsing could be performed in a transform processor using the ParseJSON(body) function.

This looks to be the "natural" approach for unwrapping this type of content so to this current use-case I think the container format handling should happen first and then forward the body/content to any kind of further processing. As @djaglowski mentioned in the SIG meeting as well maybe this otlpjson handling would best fit in a standalone processor/connector.

The problem is that the otlpjson represents an entire plog.Logs which may contain many log records. The filelog operators are meant to work with one log at a time so it will be very difficult to reconstruct a plog.Logs from the json in this way.

This is why in the discussion yesterday it was suggested that a new processor (i.e. otlpjsonprocessor) would be useful. Basically, the filelog receiver would read the lines and peel off the container format. Then the processor would unmarshal the complete otlpjson payload.

So, using the same example:

2024-07-03T05:55:54.936088091Z stderr F [otel.javaagent 2024-07-03 05:55:54:935 +0000] [BatchLogRecordProcessor_WorkerThread-1] INFO io.opentelemetry.exporter.logging.otlp.OtlpJsonLoggingLogRecordExporter - {otlpjson containing 3 resources, each w/ a scope, each with 100 log records}

The following could be used to read and "unwrap" the otlpjson:

receivers:
      filelog:
        include:
        - /var/log/pods/*/*/*.log
        operators:
        - id: container-parser
          type: container

This gives you a plog.Logs that contains (in this example) only one plog.LogRecord, where the Body contains the complete otlpjson. Then, you can use the new otlpjsonprocessor to unmarshal that Body into a plog.Logs.

Notably, all of the information in the container wrapper would be discarded. The container parser populates some of these fields when unwrapping the otlpjson, but it is not meaningful when unmarshaling the otlpjson into a plog.Logs. In this use case, this "wrapper" information is basically saying "this is when this log was written", but the use case itself is to convey logs which are fully self-described.

Also, the current otlpjsonreceiver supports not only logs but also metrics and traces. Therefore, we actually need a otlpjsonconnector, which would allow a user to dump any type of otlpjson via docker logs, then reconstitute them in another collector. Again, the container wrapper is not meaningful in this case because each Body is actually a fully self described ptrace.Traces | pmetric.Metrics | plog.Logs. The "logs" sent into the connector are dropped, and the output is dependent on which types of pipelines the connector is used as a receiver. e.g. If the user has a file or files that may contain any type of signal, they can decide to only reconstitute the traces, or alternatively to ship the traces and logs to different backends.

Edit: I'm describing the functionality with some assumptions, but additional configuration in the connector could be useful. For example, maybe we want to offer config options to keep attributes from the parent plog.Logs.

@zeitlinger
Copy link
Member Author

@djaglowski sounds like you describe the same as I did in solution outline #33846 (comment)

Would this work?

@djaglowski
Copy link
Member

@djaglowski sounds like you describe the same as I did in solution outline #33846 (comment)

Would this work?

Yes, I think we're aligned. I'm willing to sponsor the new connector if you'll write up a new issue for it.

@zeitlinger
Copy link
Member Author

@djaglowski sounds like you describe the same as I did in solution outline #33846 (comment)
Would this work?

Yes, I think we're aligned. I'm willing to sponsor the new connector if you'll write up a new issue for it.

great - I just need to find someone to implement it 😄

@ChrsMark
Copy link
Member

Verifying some details for my own benefit. Apologies if that's trivial.

Is this correct that the potential otlpjsonconnector should only handle valid otlp json input (same as the otlpjsonfilereceiver) ?

From the provided description's example

[otel.javaagent 2024-07-02 10:11:07:368 +0000] [BatchLogRecordProcessor_WorkerThread-1] INFO io.opentelemetry.exporter.logging.otlp.OtlpJsonLoggingLogRecordExporter - {"resource":{"attributes":[{"key":"container.id","value":{"stringValue":"075da3b7317ceccd6df58562684a8092040aacca8b5b0c49eacb33f1d2fe15b9"}},{"key":"deployment.environment","value":{"stringValue":"staging"}},{"key":"host.arch","value":{"stringValue":"amd64"}},{"key":"host.name","value":{"stringValue":"anti-fraud-7b498c4dcb-f5wqj"}},{"key":"os.description","value":{"stringValue":"Linux 6.5.0-41-generic"}},{"key":"os.type","value":{"stringValue":"linux"}},{"key":"process.command_args","value":{"arrayValue":{"values":[{"stringValue":"/opt/java/openjdk/bin/java"},{"stringValue":"-jar"},{"stringValue":"./app.jar"}]}}},{"key":"process.executable.path","value":{"stringValue":"/opt/java/openjdk/bin/java"}},{"key":"process.pid","value":{"intValue":"1"}},{"key":"process.runtime.description","value":{"stringValue":"Eclipse Adoptium OpenJDK 64-Bit Server VM 21.0.3+9-LTS"}},{"key":"process.runtime.name","value":{"stringValue":"OpenJDK Runtime Environment"}},{"key":"process.runtime.version","value":{"stringValue":"21.0.3+9-LTS"}},{"key":"service.instance.id","value":{"stringValue":"7e31966b-5668-4338-913a-5e2601d75e25"}},{"key":"service.name","value":{"stringValue":"anti-fraud"}},{"key":"service.namespace","value":{"stringValue":"shop"}},{"key":"service.version","value":{"stringValue":"1.1"}},{"key":"telemetry.distro.name","value":{"stringValue":"grafana-opentelemetry-java"}},{"key":"telemetry.distro.version","value":{"stringValue":"2.4.0-beta.1"}},{"key":"telemetry.sdk.language","value":{"stringValue":"java"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.38.0"}}]},"scopeLogs":[{"scope":{"name":"com.mycompany.antifraud.FraudDetectionController","attributes":[]},"logRecords":[{"timeUnixNano":"1719915066488000000","observedTimeUnixNano":"1719915066488267425","severityNumber":13,"severityText":"WARN","body":{"stringValue":"checkOrder(totalPrice=300, shippingCountry=, customerIpAddress=127.0.0.1) fraudScore=15, status=REJECTED"},"attributes":[{"key":"thread.id","value":{"intValue":"44"}},{"key":"thread.name","value":{"stringValue":"http-nio-8080-exec-1"}}],"flags":1,"traceId":"336f93f9f72b9fec3e4e01e38cb6a99c","spanId":"de97c85b1ee0669a"}]}]}

the first part should be skipped (by the filelog receiver most probably) and only consider as otlp json the part after the -:

{"resource":{"attributes":[{"key":"container.id","value":{"stringValue":"075da3b7317ceccd6df58562684a8092040aacca8b5b0c49eacb33f1d2fe15b9"}},{"key":"deployment.environment","value":{"stringValue":"staging"}},{"key":"host.arch","value":{"stringValue":"amd64"}},{"key":"host.name","value":{"stringValue":"anti-fraud-7b498c4dcb-f5wqj"}},{"key":"os.description","value":{"stringValue":"Linux 6.5.0-41-generic"}},{"key":"os.type","value":{"stringValue":"linux"}},{"key":"process.command_args","value":{"arrayValue":{"values":[{"stringValue":"/opt/java/openjdk/bin/java"},{"stringValue":"-jar"},{"stringValue":"./app.jar"}]}}},{"key":"process.executable.path","value":{"stringValue":"/opt/java/openjdk/bin/java"}},{"key":"process.pid","value":{"intValue":"1"}},{"key":"process.runtime.description","value":{"stringValue":"Eclipse Adoptium OpenJDK 64-Bit Server VM 21.0.3+9-LTS"}},{"key":"process.runtime.name","value":{"stringValue":"OpenJDK Runtime Environment"}},{"key":"process.runtime.version","value":{"stringValue":"21.0.3+9-LTS"}},{"key":"service.instance.id","value":{"stringValue":"7e31966b-5668-4338-913a-5e2601d75e25"}},{"key":"service.name","value":{"stringValue":"anti-fraud"}},{"key":"service.namespace","value":{"stringValue":"shop"}},{"key":"service.version","value":{"stringValue":"1.1"}},{"key":"telemetry.distro.name","value":{"stringValue":"grafana-opentelemetry-java"}},{"key":"telemetry.distro.version","value":{"stringValue":"2.4.0-beta.1"}},{"key":"telemetry.sdk.language","value":{"stringValue":"java"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.38.0"}}]},"scopeLogs":[{"scope":{"name":"com.mycompany.antifraud.FraudDetectionController","attributes":[]},"logRecords":[{"timeUnixNano":"1719915066488000000","observedTimeUnixNano":"1719915066488267425","severityNumber":13,"severityText":"WARN","body":{"stringValue":"checkOrder(totalPrice=300, shippingCountry=, customerIpAddress=127.0.0.1) fraudScore=15, status=REJECTED"},"attributes":[{"key":"thread.id","value":{"intValue":"44"}},{"key":"thread.name","value":{"stringValue":"http-nio-8080-exec-1"}}],"flags":1,"traceId":"336f93f9f72b9fec3e4e01e38cb6a99c","spanId":"de97c85b1ee0669a"}]}]}

One note here that even that part cannot be parsed by the otlpjsonfilereceiver. I tried that by writing plain into a file and then parsing it from there.
Shouldn't that be

{"resourceLogs":[{"resource":{"attributes":[{"key":"container.id","value":{"stringValue":"075da3b7317ceccd6df58562684a8092040aacca8b5b0c49eacb33f1d2fe15b9"}},{"key":"deployment.environment","value":{"stringValue":"staging"}},{"key":"host.arch","value":{"stringValue":"amd64"}},{"key":"host.name","value":{"stringValue":"anti-fraud-7b498c4dcb-f5wqj"}},{"key":"os.description","value":{"stringValue":"Linux 6.5.0-41-generic"}},{"key":"os.type","value":{"stringValue":"linux"}},{"key":"process.command_args","value":{"arrayValue":{"values":[{"stringValue":"/opt/java/openjdk/bin/java"},{"stringValue":"-jar"},{"stringValue":"./app.jar"}]}}},{"key":"process.executable.path","value":{"stringValue":"/opt/java/openjdk/bin/java"}},{"key":"process.pid","value":{"intValue":"1"}},{"key":"process.runtime.description","value":{"stringValue":"Eclipse Adoptium OpenJDK 64-Bit Server VM 21.0.3+9-LTS"}},{"key":"process.runtime.name","value":{"stringValue":"OpenJDK Runtime Environment"}},{"key":"process.runtime.version","value":{"stringValue":"21.0.3+9-LTS"}},{"key":"service.instance.id","value":{"stringValue":"7e31966b-5668-4338-913a-5e2601d75e25"}},{"key":"service.name","value":{"stringValue":"anti-fraud"}},{"key":"service.namespace","value":{"stringValue":"shop"}},{"key":"service.version","value":{"stringValue":"1.1"}},{"key":"telemetry.distro.name","value":{"stringValue":"grafana-opentelemetry-java"}},{"key":"telemetry.distro.version","value":{"stringValue":"2.4.0-beta.1"}},{"key":"telemetry.sdk.language","value":{"stringValue":"java"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.38.0"}}]},"scopeLogs":[{"scope":{"name":"com.mycompany.antifraud.FraudDetectionController","attributes":[]},"logRecords":[{"timeUnixNano":"1719915066488000000","observedTimeUnixNano":"1719915066488267425","severityNumber":13,"severityText":"WARN","body":{"stringValue":"checkOrder(totalPrice=300, shippingCountry=, customerIpAddress=127.0.0.1) fraudScore=15, status=REJECTED"},"attributes":[{"key":"thread.id","value":{"intValue":"44"}},{"key":"thread.name","value":{"stringValue":"http-nio-8080-exec-1"}}],"flags":1,"traceId":"336f93f9f72b9fec3e4e01e38cb6a99c","spanId":"de97c85b1ee0669a"}]}]}]}

?

Just wanted to verify this because trying out the original example brought me some confusion.
I think that was implied by @djaglowski through the {otlpjson containing 3 resources, each w/ a scope, each with 100 log records} example but still there is some misalignment with the original one. I think we should verify the use-case before moving on to the proposal/implementation.

ps: Happy to help with the implementation if you are still looking for someone.

@djaglowski
Copy link
Member

I didn't check that the original format was valid otlpjson, but I agree that the connector should accept otlpjson only, or at least by default. If there are other well established text formats then maybe those can be supported later too but then I think we're talking about a more generalized connector.

@ChrsMark
Copy link
Member

ChrsMark commented Jul 18, 2024

I now see that the {resourceLogs: [ part is injected as part of this patch.

So correct me if I'm wrong here here but the issue is that even if we parse out the [otel.javaagent 2024-07-02 10:11:07:368 +0000] [BatchLogRecordProcessor_WorkerThread-1] INFO io.opentelemetry.exporter.logging.otlp.OtlpJsonLoggingLogRecordExporter - part with a filelog's operator (like the regexp one) then the remaining json does not look valid otlpjson cause the top level resourceLogs|resourceMetrics|resourceTraces would be missing.

@zeitlinger if you could provide more details about the specific use-case that would be helpful.

edit: I have worked on something simple to illustrate the point for this connector: https://github.com/ChrsMark/otlpjsonconnector. If we agree to ship this component, I'm happy to take it through the proper "New Component" process and make it part of the contrib repo.

@zeitlinger
Copy link
Member Author

I didn't check that the original format was valid otlpjson, but I agree that the connector should accept otlpjson only, or at least by default.

yes, that's the idea
any processing to get valid otlpjson should be done in a previous operator, e.g. container or regex.

@zeitlinger
Copy link
Member Author

then the remaining json does not look valid otlpjson cause the top level resourceLogs|resourceMetrics|resourceTraces would be missing.

correct - so that would have to be fixed in a previous operator (maybe regex with an additional feature to reference capture groups) OR the a change to the java app
either way - this would be outside the scope of this ticket (now that I've learned more)

@zeitlinger if you could provide more details about the specific use-case that would be helpful.

basically https://opentelemetry.io/docs/specs/otel/protocol/file-exporter/
supporting anything more like the current java implementation is just a bonus

edit: I have worked on something simple to illustrate the point for this connector: https://github.com/ChrsMark/otlpjsonconnector. If we agree to ship this component, I'm happy to take it through the proper "New Component" process and make it part of the contrib repo.

that would be awesome 😄

@ChrsMark
Copy link
Member

Thank's for clarifying @zeitlinger!

Just for the records (and in case we need this for further testing), this kind of logs can be produced by the adService of the opentelemetry-demo by setting the OTEL_LOGS_EXPORTER at https://github.com/open-telemetry/opentelemetry-helm-charts/blob/a7477afa3e2153155cda740295d0551366ee79eb/charts/opentelemetry-demo/values.yaml#L187 to logging-otlp:

2024-07-19T10:19:10.666154668Z stderr F [otel.javaagent 2024-07-19 10:19:10:665 +0000] [BatchLogRecordProcessor_WorkerThread-1] INFO io.opentelemetry.exporter.logging.otlp.OtlpJsonLoggingLogRecordExporter - {"resource":{"attributes":[{"key":"container.id","value":{"stringValue":"74f29844c933d5844860485a10c830d3a0bd26b4493bd8f0f07fbe6238e8f0b6"}},{"key":"host.arch","value":{"stringValue":"amd64"}},{"key":"host.name","value":{"stringValue":"my-otel-demo-adservice-5c5f6df74b-bjvr5"}},{"key":"os.description","value":{"stringValue":"Linux 5.15.0-113-generic"}},{"key":"os.type","value":{"stringValue":"linux"}},{"key":"process.command_line","value":{"stringValue":"/opt/java/openjdk/bin/java -javaagent:/usr/src/app/opentelemetry-javaagent.jar oteldemo.AdService"}},{"key":"process.executable.path","value":{"stringValue":"/opt/java/openjdk/bin/java"}},{"key":"process.pid","value":{"intValue":"1"}},{"key":"process.runtime.description","value":{"stringValue":"Eclipse Adoptium OpenJDK 64-Bit Server VM 21.0.3+9-LTS"}},{"key":"process.runtime.name","value":{"stringValue":"OpenJDK Runtime Environment"}},{"key":"process.runtime.version","value":{"stringValue":"21.0.3+9-LTS"}},{"key":"service.instance.id","value":{"stringValue":"80efb175-53a0-4bc4-b3f2-60bbaf0e2713"}},{"key":"service.name","value":{"stringValue":"adservice"}},{"key":"service.namespace","value":{"stringValue":"opentelemetry-demo"}},{"key":"service.version","value":{"stringValue":"1.11.0"}},{"key":"telemetry.distro.name","value":{"stringValue":"elastic"}},{"key":"telemetry.distro.version","value":{"stringValue":"0.4.0"}},{"key":"telemetry.sdk.language","value":{"stringValue":"java"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.38.0"}}]},"scopeLogs":[{"scope":{"name":"oteldemo.AdService","attributes":[]},"logRecords":[{"timeUnixNano":"1721384350240315569","observedTimeUnixNano":"1721384350240331398","severityNumber":9,"severityText":"INFO","body":{"stringValue":"Targeted ad request received for [binoculars]"},"attributes":[],"flags":1,"traceId":"951843d689a86e3336ea6b872516c1ad","spanId":"548686bd37a34d7f"}]}],"schemaUrl":"https://opentelemetry.io/schemas/1.24.0"}

correct - so that would have to be fixed in a previous operator (maybe regex with an additional feature to reference capture groups) OR the a change to the java app

I'm not sure if OtlpJsonLoggingLogRecordExporter should add the top level resourceLogs key but maybe that's sth to be clarified.

@zeitlinger
Copy link
Member Author

I'm not sure if OtlpJsonLoggingLogRecordExporter should add the top level resourceLogs key but maybe that's sth to be clarified.

I think it should - or a new exporter should. I can take care of figuring this out.

@zeitlinger
Copy link
Member Author

I'm not sure if OtlpJsonLoggingLogRecordExporter should add the top level resourceLogs key but maybe that's sth to be clarified.

I think it should - or a new exporter should. I can take care of figuring this out.

See open-telemetry/opentelemetry-specification#3817

@ChrsMark ChrsMark mentioned this pull request Jul 23, 2024
3 tasks
@ChrsMark
Copy link
Member

@zeitlinger using the new connector and having some transforms taking place does the trick for me.

Using the following config:

receivers:
  filelog:
    include:
      - /var/log/pods/prod_my-target-pod_49cc7c1fd3702c40b2686ea7486091d3/my-target-pod/1.log
    include_file_path: true
    operators:
    - id: container-parser
      type: container

exporters:
  debug:
    verbosity: detailed


processors: 
  transform:
    log_statements:
    - context: log
      statements:
      - merge_maps(cache,ExtractPatterns(body,"io.opentelemetry.exporter.logging.otlp.OtlpJsonLoggingLogRecordExporter - (?P<log>.*)"), "upsert") where body != nil
      - set(body,cache["log"])
      - merge_maps(cache,ParseJSON(body), "upsert") where body!= nil
      - delete_key(cache, "schemaUrl")
      - set(body,Concat(["{\"resourceLogs\":[",cache,"]}"], ""))

connectors:
  otlpjson:

service:
  pipelines:
    logs/raw:
      receivers: [filelog]
      processors: [transform]
      exporters: [otlpjson]
    metrics/otlp:
      receivers: [ otlpjson ]
      exporters: [ debug ]
    logs/otlp:
      receivers: [ otlpjson ]
      exporters: [ debug ]
    traces/otlp:
      receivers: [ otlpjson ]
      exporters: [ debug ]

Then write some sample logs in container format:

echo '2024-07-19T10:19:10.666154668Z stderr F [otel.javaagent 2024-07-19 10:19:10:665 +0000] [BatchLogRecordProcessor_WorkerThread-1] INFO io.opentelemetry.exporter.logging.otlp.OtlpJsonLoggingLogRecordExporter - {"resource":{"attributes":[{"key":"container.id","value":{"stringValue":"74f29844c933d5844860485a10c830d3a0bd26b4493bd8f0f07fbe6238e8f0b6"}},{"key":"host.arch","value":{"stringValue":"amd64"}},{"key":"host.name","value":{"stringValue":"my-otel-demo-adservice-5c5f6df74b-bjvr5"}},{"key":"os.description","value":{"stringValue":"Linux 5.15.0-113-generic"}},{"key":"os.type","value":{"stringValue":"linux"}},{"key":"process.command_line","value":{"stringValue":"/opt/java/openjdk/bin/java -javaagent:/usr/src/app/opentelemetry-javaagent.jar oteldemo.AdService"}},{"key":"process.executable.path","value":{"stringValue":"/opt/java/openjdk/bin/java"}},{"key":"process.pid","value":{"intValue":"1"}},{"key":"process.runtime.description","value":{"stringValue":"Eclipse Adoptium OpenJDK 64-Bit Server VM 21.0.3+9-LTS"}},{"key":"process.runtime.name","value":{"stringValue":"OpenJDK Runtime Environment"}},{"key":"process.runtime.version","value":{"stringValue":"21.0.3+9-LTS"}},{"key":"service.instance.id","value":{"stringValue":"80efb175-53a0-4bc4-b3f2-60bbaf0e2713"}},{"key":"service.name","value":{"stringValue":"adservice"}},{"key":"service.namespace","value":{"stringValue":"opentelemetry-demo"}},{"key":"service.version","value":{"stringValue":"1.11.0"}},{"key":"telemetry.distro.name","value":{"stringValue":"elastic"}},{"key":"telemetry.distro.version","value":{"stringValue":"0.4.0"}},{"key":"telemetry.sdk.language","value":{"stringValue":"java"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.38.0"}}]},"scopeLogs":[{"scope":{"name":"oteldemo.AdService","attributes":[]},"logRecords":[{"timeUnixNano":"1721384350240315569","observedTimeUnixNano":"1721384350240331398","severityNumber":9,"severityText":"INFO","body":{"stringValue":"Targeted ad request received for [binoculars]"},"attributes":[],"flags":1,"traceId":"951843d689a86e3336ea6b872516c1ad","spanId":"548686bd37a34d7f"}]}],"schemaUrl":"https://opentelemetry.io/schemas/1.24.0"}' >> /var/log/pods/prod_my-target-pod_49cc7c1fd3702c40b2686ea7486091d3/my-target-pod/1.log

I see:

2024-07-26T16:08:39.327+0300	info	ResourceLog #0
Resource SchemaURL: 
Resource attributes:
     -> container.id: Str(74f29844c933d5844860485a10c830d3a0bd26b4493bd8f0f07fbe6238e8f0b6)
     -> host.arch: Str(amd64)
     -> host.name: Str(my-otel-demo-adservice-5c5f6df74b-bjvr5)
     -> os.description: Str(Linux 5.15.0-113-generic)
     -> os.type: Str(linux)
     -> process.command_line: Str(/opt/java/openjdk/bin/java -javaagent:/usr/src/app/opentelemetry-javaagent.jar oteldemo.AdService)
     -> process.executable.path: Str(/opt/java/openjdk/bin/java)
     -> process.pid: Int(1)
     -> process.runtime.description: Str(Eclipse Adoptium OpenJDK 64-Bit Server VM 21.0.3+9-LTS)
     -> process.runtime.name: Str(OpenJDK Runtime Environment)
     -> process.runtime.version: Str(21.0.3+9-LTS)
     -> service.instance.id: Str(80efb175-53a0-4bc4-b3f2-60bbaf0e2713)
     -> service.name: Str(adservice)
     -> service.namespace: Str(opentelemetry-demo)
     -> service.version: Str(1.11.0)
     -> telemetry.distro.name: Str(elastic)
     -> telemetry.distro.version: Str(0.4.0)
     -> telemetry.sdk.language: Str(java)
     -> telemetry.sdk.name: Str(opentelemetry)
     -> telemetry.sdk.version: Str(1.38.0)
ScopeLogs #0
ScopeLogs SchemaURL: 
InstrumentationScope oteldemo.AdService 
LogRecord #0
ObservedTimestamp: 2024-07-19 10:19:10.240331398 +0000 UTC
Timestamp: 2024-07-19 10:19:10.240315569 +0000 UTC
SeverityText: INFO
SeverityNumber: Info(9)
Body: Str(Targeted ad request received for [binoculars])
Trace ID: 951843d689a86e3336ea6b872516c1ad
Span ID: 548686bd37a34d7f
Flags: 1
	{"kind": "exporter", "data_type": "logs", "name": "debug"

There are might be corner cases to handle through configuration, specially for the ottl/transform part, but the point is that the specific case can now be supported.
I will also try it with the opentelemetry-demo by setting the OTEL_LOGS_EXPORTER once the distro's image is updated.
We can consider closing this PR if there is nothing pending here.

@zeitlinger
Copy link
Member Author

@ChrsMark great - I didn't know that this is possible

for reference, I've added a working example here that also ignores other lines: https://github.com/zeitlinger/otelcol-cookbook/tree/main/otlp-json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants