Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple metric_readers causes deadlock #3697

Open
ekquasar opened this issue Nov 30, 2023 · 1 comment
Open

Multiple metric_readers causes deadlock #3697

ekquasar opened this issue Nov 30, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@ekquasar
Copy link

ekquasar commented Nov 30, 2023

Summary

Setting up multiple metrics readers like this:

mprovider = MeterProvider(resource=resource, metric_readers=[console_metric_reader, otlp_metric_reader])

causes a deadlock.

Caveat

It happens rarely, so this may not be tremendously high priority - but I'm reporting so if someone else hits this, they may add to the body of evidence pointing to the underlying issue.

Environment

Python 3.11.6 on macOS Sonoma v14.1.1

Steps to reproduce

1. Run a simple Flask web server with otel instrumentation:

console_metric_reader = PeriodicExportingMetricReader(ConsoleMetricExporter())
otlp_metric_reader = PeriodicExportingMetricReader(OTLPMetricExporter(endpoint=OTLP_ENDPOINT))
mprovider = MeterProvider(resource=resource, metric_readers=[console_metric_reader, otlp_metric_reader])
metrics.set_meter_provider(mprovider)
meter = metrics.get_meter("demo-meter")

2. Send requests to the endpoint

while true
  curl localhost:8080
  sleep 1
end

3. Wait for the deadlock

Unfortunately, it is not always reproducible. Sometimes the app must be running for >5 minutes. Sometimes it happens immediately.

Expected behavior

  1. Console exporter prints metrics to stdout
  2. OTLP exporter errors out print to stdout

What is the actual behavior?

When the deadlock occurs, the traceback looks like this:

...
  File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1123, in join
    self._wait_for_tstate_lock(timeout=max(timeout, 0))
  File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1139, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
@ekquasar ekquasar added the bug Something isn't working label Nov 30, 2023
@aabmass aabmass transferred this issue from open-telemetry/opentelemetry-python-contrib Feb 21, 2024
@ocelotl
Copy link
Contributor

ocelotl commented Jun 28, 2024

If this happens again, please include the complete traceback, do not leave out any part since that information can actually show us the where in our code is the failure coming from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants