Skip to content

Garbage collector metrics lead to deadlock #363

@megabotan

Description

@megabotan

Story:
We have default gc metrics, for example python_gc_collected_objects_count.
Someone tries to get metrics value(for example http handler). This leads to MutexValue.get() call. Lock is aquired inside get() method.
If garbage collector decides to run at this moment, the callback updating gc metrics is triggered. Inside this callback metrics is need to be updated, but it cant get the lock which is already taken. Deadlock.

Snippet reproducing this story: (if you try to open localhost:8000 it will fail).

import gc
import time
from threading import Lock
from prometheus_client import values
from prometheus_client import start_http_server


class MutexValue(object):
    '''A float protected by a mutex.'''

    _multiprocess = False

    def __init__(self, typ, metric_name, name, labelnames, labelvalues, **kwargs):
        self._value = 0.0
        self._lock = Lock()

    def inc(self, amount):
        with self._lock:
            self._value += amount

    def set(self, value):
        with self._lock:
            self._value = value

    def get(self):
        with self._lock:
            gc.collect()    #<------------------------------ Added only this line
            return self._value


values.ValueClass = MutexValue

start_http_server(8000)
while True:
    time.sleep(1)

Do we really need lock in get() method?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions