Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver otlp/http] Non-OTLP conform json messages are accepted with HTTP 200 OK and no error #10546

Closed
schlecl opened this issue Jul 5, 2024 · 3 comments
Labels

Comments

@schlecl
Copy link

schlecl commented Jul 5, 2024

Describe the bug
Hi, currently the otlp http receiver accepts all incoming json-log-messages with a HTTP 200 status code, even when the json message is not a valid OTLP-message. There is also no error message in the collector-logs regarding the failed parsing of the message, the wrong messages are just silently consumed.

Steps to reproduce

  1. Deploy the otel collector to Kubernetes using the Open Telemetry Helm chart version 0.97.1 and the following values:
mode: statefulset
replicaCount: 1
ports:
  jaeger-compact:
    enabled: false
  jaeger-thrift:
    enabled: false
  jaeger-grpc:
    enabled: false
  zipkin:
    enabled: false

image:
  repository: "otel/opentelemetry-collector-contrib"

config:      
  extensions:
    health_check: {}
    
  receivers:
    nop:
    jaeger: null
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318
    prometheus: null
    zipkin: null

  exporters:
    debug/logs:
      verbosity: detailed
    debug/traces:
      verbosity: basic
    debug/metrics:
      verbosity: basic
      
  service:
    telemetry:
      logs:
        level: "debug"
      traces: null
      metrics: null
    extensions: 
      - health_check
    pipelines:
      logs:     
        receivers:
          - otlp
        processors: null
        exporters:
          - debug/logs
      traces: 
        receivers:
          - nop
        exporters:
          - debug/traces
      metrics:
        receivers:
          - nop
        exporters:
          - debug/metrics
  1. This will result in the following otel collector configuration:
exporters:
  debug: {}
  debug/logs:
    verbosity: detailed
  debug/metrics:
    verbosity: basic
  debug/traces:
    verbosity: basic
extensions:
  health_check:
    endpoint: ${env:MY_POD_IP}:13133
processors:
  batch: {}
  memory_limiter:
    check_interval: 5s
    limit_percentage: 80
    spike_limit_percentage: 25
receivers:
  nop: null
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        endpoint: ${env:MY_POD_IP}:4318
service:
  extensions:
  - health_check
  pipelines:
    logs:
      exporters:
      - debug/logs
      receivers:
      - otlp
    metrics:
      exporters:
      - debug/metrics
      processors:
      - memory_limiter
      - batch
      receivers:
      - nop
    traces:
      exporters:
      - debug/traces
      processors:
      - memory_limiter
      - batch
      receivers:
      - nop
  telemetry:
    logs:
      level: debug
    traces: null
  1. Start a curl-pod in your cluster: kubectl run -i --tty --rm debug --image=curlimages/curl --restart=Never -- sh
  2. Send a valid message to the otel collector service to verify, that it is working (in my example service name is "otel-collector-opentelemetry-collector" inside the "default" namespace):
curl -d '{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"postman_test"}},{"key":"service.namespace","value":{"stringValue":"postman_test_namespace"}},{"key":"service.version","value":{"stringValue":"1.0.0"}},{"key":"service.instance.id","value":{"stringValue":"e5f6g85-38e3-4dbd-86c7-bbad4kj549f"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.language","value":{"stringValue":"dotnet"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.7.0"}}]},"scopeLogs":[{"scope":{"name":"Postman.Test"},"logRecords":[{"timeUnixNano":"1718259678577989400","observedTimeUnixNano":"1718259678577989400","severityNumber":9,"severityText":"Information","body":{"stringValue":"Request\nMethod:GET\nPath:/v4/postman/test\n"},"attributes":[{"key":"{OriginalFormat}","value":{"stringValue":"{body}"}},{"key":"eventSubjectId","value":{"stringValue":"283guj73-2c73-4963-a0e0-66231vmz314f"}},{"key":"eventType","value":{"stringValue":"PostmanTest"}},{"key":"eventAction","value":{"stringValue":"Read"}}],"flags":1,"traceId":"209ea749a947dd9e8ff28537d13ce73a","spanId":"2d16328aed16dd87"},{"timeUnixNano":"1718259678592442800","observedTimeUnixNano":"1718259678592442800","severityNumber":9,"severityText":"Information","body":{"stringValue":"Response\nMethod:GET\nPath:/v4/postman/test\nStatusCode:200"},"attributes":[{"key":"{OriginalFormat}","value":{"stringValue":"{body}"}},{"key":"eventSubjectId","value":{"stringValue":"283guj73-2c73-4963-a0e0-66231vmz314f"}},{"key":"eventType","value":{"stringValue":"PostmanTest"}},{"key":"eventAction","value":{"stringValue":"Read"}}],"flags":1,"traceId":"209ea749a947dd9e8ff28537d13ce73a","spanId":"2d16328aed16dd87"}]}]}]}' -H "Content-Type: application/json" -X POST http://otel-collector-opentelemetry-collector.default.svc.cluster.local:4318/v1/logs

Returns HTTP 200 and the message will be printed inside the otel collector logs.
5. Send a invalid json-message to the otel collector log endpoint via curl

curl -d '{"invalid_otlp_json":"test"}' -H "Content-Type: application/json" -X POST http://otel-collector-opentelemetry-collector.default.svc.cluster.local:4318/v1/logs

Returns HTTP 200 and nothing appears inside the otel collector logs.

What did you expect to see?
I would expect to get a HTTP 400 back from the collector when sending messages with a wrong format, which can't be parsed.

What did you see instead?
Messages are accepted with HTTP 200 and silently ignored, without any error.

What version did you use?
otel/opentelemetry-collector-contrib:0.104.0-image

What config did you use?
Collector config:

exporters:
  debug: {}
  debug/logs:
    verbosity: detailed
  debug/metrics:
    verbosity: basic
  debug/traces:
    verbosity: basic
extensions:
  health_check:
    endpoint: ${env:MY_POD_IP}:13133
processors:
  batch: {}
  memory_limiter:
    check_interval: 5s
    limit_percentage: 80
    spike_limit_percentage: 25
receivers:
  nop: null
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        endpoint: ${env:MY_POD_IP}:4318
service:
  extensions:
  - health_check
  pipelines:
    logs:
      exporters:
      - debug/logs
      receivers:
      - otlp
    metrics:
      exporters:
      - debug/metrics
      processors:
      - memory_limiter
      - batch
      receivers:
      - nop
    traces:
      exporters:
      - debug/traces
      processors:
      - memory_limiter
      - batch
      receivers:
      - nop
  telemetry:
    logs:
      level: debug
    traces: null

Environment
Azure AKS cluster Kubernetes version 1.27.9

Additional context
Probably related to the open bug #4335

@schlecl schlecl added the bug Something isn't working label Jul 5, 2024
@mx-psi
Copy link
Member

mx-psi commented Jul 5, 2024

I believe this is the expected behavior. The OTLP specification says:

Newer versions of OTLP may add new fields to messages that will be ignored by clients and servers that do not understand these fields.

The intent is to preserve forwards compatibility. Since a future version of OTLP may have a key named invalid_otlp_json, the client should ignore the key instead of erroring out, so that the OTLP client and the OTLP server don't have to be in sync.

This property is not specific to OTLP, but rather it is the common practice in all Protobuf based communication.

Does that answer your question?

@schlecl
Copy link
Author

schlecl commented Jul 5, 2024

Thank you for clarifying this :) I also saw this sentence in the spec, but also the Bad data definition had this sentence in it, which confused me:

If the processing of the request fails because the request contains data that cannot be decoded or is otherwise invalid

Because of this sentence I assumed, that at least a check for the basic OTLP-structure would be executed, but I guess the "cannot be decoded" is for cases, that no valid json structure is sent.

@schlecl schlecl closed this as completed Jul 5, 2024
@mx-psi
Copy link
Member

mx-psi commented Jul 5, 2024

but I guess the "cannot be decoded" is for cases, that no valid json structure is sent.

Yes, I think that's what the spec refers to indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants