Otel can't handle messages from Databricks Diagnostic Tool with Event Hubs #33280

dannyamaya · 2024-05-28T20:05:36Z

Describe the bug
Messages from databricks sent through Event Hubs doesn't has the Time Grain value, you get this error for every new message.

Steps to reproduce
Activate the diagnostic tool for Databricks connect it to a eventhub and then to an Otel instance.

What did you expect to see?
Message should arrive with no problem, time grain parameter should be optional with a default value.

What did you see instead?

azureeventhubreceiver@v0.101.0/azureresourcemetrics_unmarshaler.go:104	Unhandled Time Grain	{"kind": "receiver", "name": "azureeventhub", "data_type": "metrics", "timegrain": ""}

What version did you use?
Latest Otel.

What config did you use?

extensions:
  health_check:
  zpages:
    endpoint: localhost:55679

receivers:
  otlp:
    protocols:
      grpc:
      http:

  fluentforward:
    endpoint: 0.0.0.0:8006

  prometheus:
    config:
      scrape_configs:
      - job_name: 'otelcol' # Gets mapped to service.name
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  prometheus/fluentd:
    config:
      scrape_configs:
      - job_name: 'fluentd' # Gets mapped to service.name
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:24231']
  
  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu:
      disk:
      filesystem:
      memory:
      network:
      # System load average metrics https://en.wikipedia.org/wiki/Load_(computing)
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Aggregated system process count metrics
      processes:
      # System processes metrics, disabled by default
      # process:  

  azureeventhub:
    connection: Endpoint=xxxxxxxx
    offset:
    format:

processors:
  batch: # Batches data when sending
  resourcedetection:
    detectors: [azure, system]
    timeout: 2s
    override: false
  groupbyattrs:
    keys:
    - service.name
    - service.version
    - host.name

  memory_limiter:
    check_interval: 2s
    limit_mib: 256              
 
exporters:
  splunk_hec/logs:
    token: "xxxxxxxxxxxxxx"
    endpoint: "xxxxxxxxxxxx"
    index: "telemetry_open_telemetry_log_event_nv"
    # max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""

  splunk_hec/traces:
    token: "xxxxxxxxxxxx"
    endpoint: "xxxxxxxxxxx"
    index: "telemetry_open_telemetry_trace_event_nv"
    # max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""
 
  splunk_hec/metrics:
    token: "xxxxxxxxxxxxxx"
    endpoint: "xxxxxxxxxxxxxx"
    index: "telemetry_open_telemetry_metric_nv"
    # max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""      

service:  
  extensions: []

  pipelines:
    logs:
      receivers: [otlp]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/logs]
    metrics:
      receivers: [hostmetrics, azureeventhub]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/metrics]
    traces:
      receivers: [otlp]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/traces]
  telemetry:
    logs:
      level: debug

Environment
Azure App service running Otel with Latest Otel Version.

Additional context
I already tried running Otel in pure Linux & Kubernetes.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-05-30T20:06:28Z

Pinging code owners for receiver/azureeventhub: @atoulme @cparkins. See Adding Labels via Comments if you do not have permissions to add labels yourself.

atoulme · 2024-05-30T20:07:44Z

@cparkins could we ingest the data point without setting the start timestamp?

cparkins · 2024-05-30T22:11:24Z

@atoulme
I think this issue may actually be a type mismatch.

@dannyamaya
When specifying the Diagnostic Settings for Databricks are there options under 'Metrics' or only 'Logs'?

According to the documentation only Logs are available:
https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/metrics-index

Also when I looked I could only see 'Logs'.
If this is truly logs data attaching the Event Hub to a log receiver should resolve the issue as it does not require a time grain.

dannyamaya · 2024-05-31T17:40:26Z

Yes, you're right Databricks can't support metrics by the date of this post so that's probably why otel can't handle those messages and shows that error, my bad, thanks for clarifying.

cparkins · 2024-05-31T18:08:07Z

No worries, it's probably not exactly clear that the mapping is done by the pipeline data type from the documentation. But that is how I wrote it to work.

github-actions · 2024-07-31T03:29:59Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/azureeventhub: @atoulme @cparkins

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-09-29T05:20:06Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

dannyamaya added the bug Something isn't working label May 28, 2024

mx-psi transferred this issue from open-telemetry/opentelemetry-collector May 29, 2024

atoulme added the receiver/azureeventhub label May 30, 2024

LucaLanziani mentioned this issue Jun 2, 2024

Weekly Report: 2024-05-26 - 2024-06-02 #33329

Closed

github-actions bot added the Stale label Jul 31, 2024

github-actions bot added the closed as inactive label Sep 29, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Otel can't handle messages from Databricks Diagnostic Tool with Event Hubs #33280

Otel can't handle messages from Databricks Diagnostic Tool with Event Hubs #33280

dannyamaya commented May 28, 2024 •

edited by atoulme

Loading

github-actions bot commented May 30, 2024

atoulme commented May 30, 2024

cparkins commented May 30, 2024

dannyamaya commented May 31, 2024

cparkins commented May 31, 2024

github-actions bot commented Jul 31, 2024

github-actions bot commented Sep 29, 2024

Otel can't handle messages from Databricks Diagnostic Tool with Event Hubs #33280

Otel can't handle messages from Databricks Diagnostic Tool with Event Hubs #33280

Comments

dannyamaya commented May 28, 2024 • edited by atoulme Loading

github-actions bot commented May 30, 2024

atoulme commented May 30, 2024

cparkins commented May 30, 2024

dannyamaya commented May 31, 2024

cparkins commented May 31, 2024

github-actions bot commented Jul 31, 2024

github-actions bot commented Sep 29, 2024

dannyamaya commented May 28, 2024 •

edited by atoulme

Loading