Description
Hi there, an experience report and improvement suggestion.
I have a pretty simple exporter which queries a backend URL for JSON, grabs the JSON, stuffs it into prometheus metric families, which it then yields.
Sometimes it errors with stacktraces only when the request is finishing, with a stack traces that points at finish_request
rather than the code that added the bad float:
TypeError: ("float() argument must be a string or a number, not 'NoneType'", Metric(bom_wind_speed, Wind speed (km/h) from the Bureau of Meterology, gauge, , [Sample(name='bom_wind_speed', labels={'location': 'Sydney Airport'}, value=None, timestamp=None, exemplar=None), Sample(name='bom_wind_speed', labels={'location': 'Sydney - Observatory Hill'}, value=20, timestamp=None, exemplar=None)]))
Traceback (most recent call last):
File "/usr/local/lib/python3.5/socketserver.py", line 625, in process_request_thread
self.finish_request(request, client_address)
File "/usr/local/lib/python3.5/socketserver.py", line 354, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/local/lib/python3.5/socketserver.py", line 681, in __init__
self.handle()
File "/usr/local/lib/python3.5/http/server.py", line 422, in handle
self.handle_one_request()
File "/usr/local/lib/python3.5/http/server.py", line 410, in handle_one_request
method()
File "/usr/local/lib/python3.5/site-packages/prometheus_client/exposition.py", line 152, in do_GET
output = encoder(registry)
File "/usr/local/lib/python3.5/site-packages/prometheus_client/openmetrics/exposition.py", line 56, in generate_latest
floatToGoString(s.value),
File "/usr/local/lib/python3.5/site-packages/prometheus_client/utils.py", line 8, in floatToGoString
d = float(d)
TypeError: ("float() argument must be a string or a number, not 'NoneType'", Metric(bom_wind_speed, Wind speed (km/h) from the Bureau of Meterology, gauge, , [Sample(name='bom_wind_speed', labels={'location': 'Sydney Airport'}, value=None, timestamp=None, exemplar=None), Sample(name='bom_wind_speed', labels={'location': 'Sydney - Observatory Hill'}, value=20, timestamp=None, exemplar=None)]))
I've hit this kind of bug a few times in different exporters (I guess it's to be expected to get type errors in Python sometimes).
How about eagerly converting the value passed to add_metric
to a float? Then the stack trace would point at the exact cause.
This might be a breaking change - that'd be a reasonable reason to reject this. But any code doing this will likely fail soon after, as soon as an attempt is made to serialize the metric, so maybe it'd be worth the change for better debuggability? What do you think?