Open
Description
I am running Django coupled with gunicorn (gevent worker). When gunicorn worker is being auto-restarted (max-requests value is reached), APM client gets stuck most of the time:
Following is the stack trace:
[2023-01-05 13:32:29 +0000] [10] [INFO] Autorestarting worker after current request.
[2023-01-05 13:33:30 +0000] [10] [WARNING] Worker graceful timeout (pid:10)
Exception ignored in: <function AsyncResult.__del__ at 0x7f4e7c013520>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/celery/result.py", line 391, in __del__
self.backend.remove_pending_result(self)
File "/usr/local/lib/python3.10/site-packages/celery/backends/asynchronous.py", line 186, in remove_pending_result
self.on_result_fulfilled(result)
File "/usr/local/lib/python3.10/site-packages/celery/backends/asynchronous.py", line 194, in on_result_fulfilled
self.result_consumer.cancel_for(result.id)
File "/usr/local/lib/python3.10/site-packages/celery/backends/redis.py", line 175, in cancel_for
self._pubsub.unsubscribe(key)
File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 3600, in unsubscribe
return self.execute_command('UNSUBSCRIBE', *args)
File "/usr/local/lib/python3.10/site-packages/elasticapm/instrumentation/packages/base.py", line 211, in call_if_sampling
return self.call(module, method, wrapped, instance, args, kwargs)
File "/usr/local/lib/python3.10/site-packages/elasticapm/instrumentation/packages/redis.py", line 65, in call
with capture_span(wrapped_name, span_type="db", span_subtype="redis", span_action="query", leaf=True):
File "/usr/local/lib/python3.10/site-packages/elasticapm/traces.py", line 1076, in __exit__
self.handle_exit(exc_type, exc_val, exc_tb)
File "/usr/local/lib/python3.10/site-packages/elasticapm/traces.py", line 1103, in handle_exit
span = transaction.end_span(self.skip_frames, duration=self.duration, outcome=outcome)
File "/usr/local/lib/python3.10/site-packages/elasticapm/traces.py", line 373, in end_span
span.end(skip_frames=skip_frames, duration=duration)
File "/usr/local/lib/python3.10/site-packages/elasticapm/traces.py", line 698, in end
p.child_ended(self)
File "/usr/local/lib/python3.10/site-packages/elasticapm/traces.py", line 116, in child_ended
with self.compression_buffer_lock:
File "src/gevent/_semaphore.py", line 278, in gevent._gevent_c_semaphore.Semaphore.__enter__
File "src/gevent/_semaphore.py", line 279, in gevent._gevent_c_semaphore.Semaphore.__enter__
File "src/gevent/_semaphore.py", line 180, in gevent._gevent_c_semaphore.Semaphore.acquire
File "/usr/local/lib/python3.10/site-packages/gevent/thread.py", line 121, in acquire
acquired = BoundedSemaphore.acquire(self, blocking, timeout)
File "src/gevent/_semaphore.py", line 180, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_semaphore.py", line 249, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
greenlet.GreenletExit:
[2023-01-05 13:33:32 +0000] [10] [INFO] Worker exiting (pid: 10)
[2023-01-05 13:33:32 +0000] [160] [INFO] Booting worker with pid: 160
I suspect a problem of monkey patching (due to gevent worker being used). When I do not use apm, this is never stuck and gunicorn can autorestart without SIGKILL/SIGTERM.
I also got the following error which seems similar:
[2023-01-05 13:16:16 +0000] [12] [INFO] Autorestarting worker after current request.
[2023-01-05 13:17:16 +0000] [12] [WARNING] Worker graceful timeout (pid:12)
[2023-01-05 13:17:18 +0000] [12] [INFO] Worker exiting (pid: 12)
[2023-01-05 13:18:16 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:12)
Exception ignored in atexit callback: <bound method Client.close of <elasticapm.contrib.django.client.DjangoClient object at 0x7f4e6efece80>>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/elasticapm/base.py", line 331, in close
manager.stop_thread()
File "/usr/local/lib/python3.10/site-packages/elasticapm/metrics/base_metrics.py", line 118, in stop_thread
self.collect()
File "/usr/local/lib/python3.10/site-packages/elasticapm/metrics/base_metrics.py", line 101, in collect
self.client.queue(constants.METRICSET, data)
File "/usr/local/lib/python3.10/site-packages/elasticapm/base.py", line 288, in queue
self._transport.queue(event_type, data, flush)
File "/usr/local/lib/python3.10/site-packages/elasticapm/transport/base.py", line 106, in queue
self._event_queue.put((event_type, data, flush), block=False, **kwargs)
File "/usr/local/lib/python3.10/queue.py", line 133, in put
with self.not_full:
File "/usr/local/lib/python3.10/threading.py", line 265, in __enter__
return self._lock.__enter__()
File "src/gevent/_semaphore.py", line 278, in gevent._gevent_c_semaphore.Semaphore.__enter__
File "src/gevent/_semaphore.py", line 279, in gevent._gevent_c_semaphore.Semaphore.__enter__
File "src/gevent/_semaphore.py", line 180, in gevent._gevent_c_semaphore.Semaphore.acquire
File "/usr/local/lib/python3.10/site-packages/gevent/thread.py", line 121, in acquire
acquired = BoundedSemaphore.acquire(self, blocking, timeout)
File "src/gevent/_semaphore.py", line 180, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_semaphore.py", line 249, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 203, in handle_abort
sys.exit(1)
SystemExit: 1
The command I use to launch gunicorn is the following:
gunicorn config.wsgi:application --bind 0.0.0.0:8000 \
--worker-class=gevent \
--worker-connections=100 \
--workers=${GUNICORN_WORKERS} \
--timeout=60 \
--graceful-timeout=60 \
--max-requests=500
Environment
- OS: Linux Ubuntu 20.04
- Python version: 3.10.6
- Framework and version: Django 3.2
- APM Server version: 8.5.2
- Agent version: 6.13.2
Additional context
-
Agent config options
Click to expand
ELASTIC_APM = dict() ELASTIC_APM['DEBUG'] = False ELASTIC_APM['ENVIRONMENT'] = APP_ENVIRONMENT ELASTIC_APM['SECRET_TOKEN'] = ### ELASTIC_APM['SERVER_URL'] = env.str('APM_HOST', default='http://localhost:8200') ELASTIC_APM['DJANGO_TRANSACTION_NAME_FROM_ROUTE'] = True ELASTIC_APM['SERVICE_NAME'] = env.str('APM_SERVICE_NAME', default='Django') APM_LOGGING_HANDLER_NAME = 'elasticapm' LOGGING['handlers'].update(**{ APM_LOGGING_HANDLER_NAME: { 'level': 'WARNING', 'class': 'elasticapm.contrib.django.handlers.LoggingHandler', } }) for logger in LOGGING['loggers'].keys(): LOGGING['loggers'][logger]['handlers'].append(APM_LOGGING_HANDLER_NAME)
-
requirements.txt
:Click to expand
Django==3.2.16 gunicorn==20.1.0 gevent==21.8.0 django-cors-headers==3.13.0 psycopg2-binary==2.9.5 djangorestframework==3.14.0 djangorestframework-gis==1.0 celery==5.1.2 redis==3.5.3 django-redis==5.2.0 django-environ==0.9.0 django-storages==1.13.2 sentry-sdk==1.12.1 elastic-apm==6.13.2 python-logstash-async==2.5.0 django-cleanup==6.0.0 google-cloud-pubsub==2.13.11 pyfcm==1.5.4 pymongo==4.3.3 packaging>=20.9