-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic errors in query_range calls #329
Comments
… datapoints from /api/v1/query_range This matches Prometheus behaviour. This should fix jacksontj/promxy#329
… datapoints from /api/v1/query_range This matches Prometheus behaviour. This should fix jacksontj/promxy#329
Thanks for the report! It seems that VM has fixed the issue on their side, but this is definitely something promxy should handle. I have a fix in #331 so it won't panic in that situation. |
Thanks for response and for fixing my issue! I am going to test it tomorrow. BTW: Is there any way to enable more detailed logs to make debugging similar problems easier? In this case only stack trace was logged (I set log level to debug). I had a suspicion there is a problem with parsing some responses returned from VictoriaMetrics - but without actual response logs I didn't find a way to prove it. |
If you enable trace level logging it will log the parsed output from the downstreams (in this case it would have shown that one had no points in it); but getting more detailed than that would be difficult as the upstream client lib (from prometheus) doesn't log anything else :/ So far the trace level logging has been sufficient for most issues I've seen. |
I tested new released version. Unfortunately problem still occurs. It's quite strange that stacktrace looks similar:
I analyzed code and it makes no sense to me. In merge.go:189 there is validation you added. I can't see a reason for "out of bounds" error in this line. I even build promxy myself from master branch to be sure I use newest version. |
The stack trace here seems to be running the old code; line 189 is a return (meaning it can't generate that panic). When you tested with the new release version; I assume you pulled the binary off of the releases page? If so maybe something is wrong with the build :/ |
Yes. I downloaded binary file from releases page. I even verified checksum of my binary with checksum file available on releases page. Then I created my own binary from master branch. I'am aware of the fact that stacktrace suggests I still use old binary and I'm really confused :(. I will continue my research tomorrow. |
I had a good idea to stop investigating this problem yesterday. In the morning I discovered additional problem we have. It looks like after logrotation new log entries stopped to appear (but modification date of log file was updated!). It makes me think this problem still occurs. In the morning I found this logrotate problem and fixed it. Now I see everything works correctly with binary you released. Sorry for bothering you and again thanks for your help. |
No problem, I'm mostly just glad it actually fixed it -- otherwise that was
about to be the craziest bug ever :)
…On Wed, Jul 22, 2020 at 8:12 AM sfranek ***@***.***> wrote:
Closed #329 <#329>.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#329 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMLPHOYEL2CCNLN345QWL3R436V3ANCNFSM4PBDH2YQ>
.
|
We use Promxy with VictoriaMetrics as metrics storage. After VictoriaMetrics update to latest version (1.38.1) we encountered problems with query_range calls - Promxy started to crash during execution of such queries. We found following entry in Promxy log:
These crashes occur every few seconds. Luckily we started VictoriaMetrics update from our dev environment so we were able to discover it before going to production :) We use Promxy 0.0.57 but we also tried the latest version in our setup - crashes still occur. We also tried to inspect failing queries - unfortunately crashes looks nondeterministic. We tried to execute failing queries manually and we were not able to reproduce restarts.
Example of one of failing queries:
The text was updated successfully, but these errors were encountered: