Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Otel can't handle messages from Databricks Diagnostic Tool with Event Hubs #33280

Closed
dannyamaya opened this issue May 28, 2024 · 7 comments
Closed

Comments

@dannyamaya
Copy link

dannyamaya commented May 28, 2024

Describe the bug
Messages from databricks sent through Event Hubs doesn't has the Time Grain value, you get this error for every new message.

Steps to reproduce
Activate the diagnostic tool for Databricks connect it to a eventhub and then to an Otel instance.

What did you expect to see?
Message should arrive with no problem, time grain parameter should be optional with a default value.

What did you see instead?

azureeventhubreceiver@v0.101.0/azureresourcemetrics_unmarshaler.go:104	Unhandled Time Grain	{"kind": "receiver", "name": "azureeventhub", "data_type": "metrics", "timegrain": ""}

What version did you use?
Latest Otel.

What config did you use?

extensions:
  health_check:
  zpages:
    endpoint: localhost:55679

receivers:
  otlp:
    protocols:
      grpc:
      http:

  fluentforward:
    endpoint: 0.0.0.0:8006

  prometheus:
    config:
      scrape_configs:
      - job_name: 'otelcol' # Gets mapped to service.name
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  prometheus/fluentd:
    config:
      scrape_configs:
      - job_name: 'fluentd' # Gets mapped to service.name
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:24231']
  
  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu:
      disk:
      filesystem:
      memory:
      network:
      # System load average metrics https://en.wikipedia.org/wiki/Load_(computing)
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Aggregated system process count metrics
      processes:
      # System processes metrics, disabled by default
      # process:  

  azureeventhub:
    connection: Endpoint=xxxxxxxx
    offset:
    format:

processors:
  batch: # Batches data when sending
  resourcedetection:
    detectors: [azure, system]
    timeout: 2s
    override: false
  groupbyattrs:
    keys:
    - service.name
    - service.version
    - host.name

  memory_limiter:
    check_interval: 2s
    limit_mib: 256              
 
exporters:
  splunk_hec/logs:
    token: "xxxxxxxxxxxxxx"
    endpoint: "xxxxxxxxxxxx"
    index: "telemetry_open_telemetry_log_event_nv"
    # max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""

  splunk_hec/traces:
    token: "xxxxxxxxxxxx"
    endpoint: "xxxxxxxxxxx"
    index: "telemetry_open_telemetry_trace_event_nv"
    # max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""
 
  splunk_hec/metrics:
    token: "xxxxxxxxxxxxxx"
    endpoint: "xxxxxxxxxxxxxx"
    index: "telemetry_open_telemetry_metric_nv"
    # max_connections: 20
    disable_compression: false
    timeout: 10s
    tls:
      insecure_skip_verify: true
      ca_file: ""
      cert_file: ""
      key_file: ""      

service:  
  extensions: []

  pipelines:
    logs:
      receivers: [otlp]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/logs]
    metrics:
      receivers: [hostmetrics, azureeventhub]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/metrics]
    traces:
      receivers: [otlp]
      processors: [resourcedetection, groupbyattrs, memory_limiter, batch]
      exporters: [splunk_hec/traces]
  telemetry:
    logs:
      level: debug

Environment
Azure App service running Otel with Latest Otel Version.

Additional context
I already tried running Otel in pure Linux & Kubernetes.

@dannyamaya dannyamaya added the bug Something isn't working label May 28, 2024
@mx-psi mx-psi transferred this issue from open-telemetry/opentelemetry-collector May 29, 2024
Copy link
Contributor

Pinging code owners for receiver/azureeventhub: @atoulme @cparkins. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented May 30, 2024

@cparkins could we ingest the data point without setting the start timestamp?

@cparkins
Copy link
Contributor

@atoulme
I think this issue may actually be a type mismatch.

@dannyamaya
When specifying the Diagnostic Settings for Databricks are there options under 'Metrics' or only 'Logs'?

According to the documentation only Logs are available:
https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/metrics-index

Also when I looked I could only see 'Logs'.
If this is truly logs data attaching the Event Hub to a log receiver should resolve the issue as it does not require a time grain.

@dannyamaya
Copy link
Author

Yes, you're right Databricks can't support metrics by the date of this post so that's probably why otel can't handle those messages and shows that error, my bad, thanks for clarifying.

@cparkins
Copy link
Contributor

No worries, it's probably not exactly clear that the mapping is done by the pipeline data type from the documentation. But that is how I wrote it to work.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 31, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants