Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need an indication on how the LoadBalancingExporter works with a static list of hosts when a host is down #31209

Closed
alexchowle opened this issue Feb 13, 2024 · 6 comments
Labels
documentation Improvements or additions to documentation exporter/loadbalancing

Comments

@alexchowle
Copy link
Contributor

Component(s)

exporter/loadbalancing

Describe the issue you're reporting

With a static configuration of hosts in the LoadBalancingExporter configuration:

exporters:
  loadbalancing:
    routing_key: traceID
    protocol:
      otlp:
        tls:
          insecure: true
    resolver:
      static:
        hostnames:
          - host1:4317
          - host2:4317

If host2 has its Collector stopped, it seems like all Spans that would normally be load-balanced to it would just be dropped instead of re-routing to host1. If that is the case then the README.md should make it clear.

@alexchowle alexchowle added the needs triage New item requiring triage label Feb 13, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@alexchowle
Copy link
Contributor Author

alexchowle commented Feb 13, 2024

Same question for DNS and K8s, to be fair. It feels like the k8s one will cope but I'm unsure. would the DNS one need the "A" record to be changed if there was a host down?

@jpkrohling
Copy link
Member

This was discussed via Slack, here: https://cloud-native.slack.com/archives/C01N6P7KR6W/p1707838013838759

Here's a long version of what happens behind the scenes, and I appreciate if this could be summarized and added as part of the readme:

The load balancer exporter will create one exporter per endpoint, no matter the resolver (static, k8s, DNS). These exporters can be fine-tuned with options related to the sending queue and retry mechanisms. This means that if a network hiccup occurs and a data point cannot be delivered, the exporter will attempt to deliver it again periodically and might eventually fail. The load-balancing exporter will NOT attempt to re-route to a healthy endpoint.

Concretely:

  • if a host from the static host is down, all telemetry for it will fail to be delivered
  • if a scaling event happens and an endpoint is removed, the in-flight data destined to that endpoint will likely be retried until it eventually fails. Therefore, for highly elastic environments, it's probably a good idea to tweak the sending queue and retry mechanisms, perhaps even disabling it altogether
  • when using k8s, DNS, and likely other future resolvers (AWS cloud map is close to being added), topology changes are eventually reflected on the load-balancing exporter. Some resolvers will get changes quicker than others (k8s is quicker than DNS), but there's still a window of time where the topology has changed and the load-balancer wasn't updated

alexchowle added a commit to alexchowle/opentelemetry-collector-contrib that referenced this issue Feb 14, 2024
…o explain how topology changes can influence decisions around retry configuration, and how they can result in data loss.
@crobert-1 crobert-1 added the documentation Improvements or additions to documentation label Feb 26, 2024
@crobert-1
Copy link
Member

Removing needs triage as it looks like the question has been answered, and the code owner has approved putting this in the README.

@alexchowle Would you be interested in posting a PR with README updates that explain this functionality?

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Feb 26, 2024
@alexchowle
Copy link
Contributor Author

It's already been merged in

@crobert-1
Copy link
Member

crobert-1 commented Feb 26, 2024

Thank you! Sorry, missed the PR: #31271

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation exporter/loadbalancing
Projects
None yet
Development

No branches or pull requests

3 participants