Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic errors in query_range calls #329

Closed
franek1709 opened this issue Jul 19, 2020 · 8 comments · Fixed by #331
Closed

Panic errors in query_range calls #329

franek1709 opened this issue Jul 19, 2020 · 8 comments · Fixed by #331
Labels

Comments

@franek1709
Copy link

franek1709 commented Jul 19, 2020

We use Promxy with VictoriaMetrics as metrics storage. After VictoriaMetrics update to latest version (1.38.1) we encountered problems with query_range calls - Promxy started to crash during execution of such queries. We found following entry in Promxy log:

panic: runtime error: index out of range [0] with length 0

goroutine 66626 [running]:
github.com/jacksontj/promxy/pkg/promhttputil.MergeSampleStream(0x0, 0xc001b19480, 0xc001377fe0, 0xc002100710, 0x1, 0x0)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promhttputil/merge.go:189 +0x7b9
github.com/jacksontj/promxy/pkg/promhttputil.MergeValues.func2(0xc001377fe0)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promhttputil/merge.go:150 +0xcb
github.com/jacksontj/promxy/pkg/promhttputil.MergeValues(0x0, 0x270b8a0, 0xc001e3fc40, 0x270b8a0, 0xc001498220, 0x1, 0xc000ef9260, 0x2744ce0, 0xc000455ab0)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promhttputil/merge.go:162 +0x282
github.com/jacksontj/promxy/pkg/promclient.(*MultiAPI).QueryRange(0xc000997720, 0x2733f60, 0xc001111580, 0xc000268a00, 0x32, 0x0, 0xed6a5ee47, 0x3940c20, 0x0, 0xed6a5ee83, ...)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promclient/multi_api.go:410 +0x7d8
github.com/jacksontj/promxy/pkg/promclient.(*TimeTruncate).QueryRange(0xc0003487e0, 0x2733f60, 0xc001111580, 0xc000268a00, 0x32, 0x0, 0xed6a5ee47, 0x3940c20, 0x0, 0xed6a5ee83, ...)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promclient/time_truncate.go:34 +0x1cd

These crashes occur every few seconds. Luckily we started VictoriaMetrics update from our dev environment so we were able to discover it before going to production :) We use Promxy 0.0.57 but we also tried the latest version in our setup - crashes still occur. We also tried to inspect failing queries - unfortunately crashes looks nondeterministic. We tried to execute failing queries manually and we were not able to reproduce restarts.

Example of one of failing queries:

/api/v1/query_range?query=sum(process_descriptors_usage{service="test-service"}) by (instance, port)&end=1595148309&start=1595148249&step=60s
valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Jul 20, 2020
… datapoints from /api/v1/query_range

This matches Prometheus behaviour.

This should fix jacksontj/promxy#329
valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Jul 20, 2020
… datapoints from /api/v1/query_range

This matches Prometheus behaviour.

This should fix jacksontj/promxy#329
jacksontj added a commit that referenced this issue Jul 20, 2020
@jacksontj jacksontj added the bug label Jul 20, 2020
@jacksontj
Copy link
Owner

Thanks for the report! It seems that VM has fixed the issue on their side, but this is definitely something promxy should handle. I have a fix in #331 so it won't panic in that situation.

jacksontj added a commit that referenced this issue Jul 20, 2020
jacksontj added a commit that referenced this issue Jul 20, 2020
@franek1709
Copy link
Author

Thanks for response and for fixing my issue! I am going to test it tomorrow. BTW: Is there any way to enable more detailed logs to make debugging similar problems easier? In this case only stack trace was logged (I set log level to debug). I had a suspicion there is a problem with parsing some responses returned from VictoriaMetrics - but without actual response logs I didn't find a way to prove it.

@jacksontj
Copy link
Owner

If you enable trace level logging it will log the parsed output from the downstreams (in this case it would have shown that one had no points in it); but getting more detailed than that would be difficult as the upstream client lib (from prometheus) doesn't log anything else :/ So far the trace level logging has been sufficient for most issues I've seen.

@franek1709
Copy link
Author

I tested new released version. Unfortunately problem still occurs. It's quite strange that stacktrace looks similar:

panic: runtime error: index out of range [0] with length 0

goroutine 12045 [running]:
github.com/jacksontj/promxy/pkg/promhttputil.MergeSampleStream(0x0, 0xc0012d2c80, 0xc000f55b60, 0xc001db5450, 0x1, 0x0)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promhttputil/merge.go:189 +0x7b9
github.com/jacksontj/promxy/pkg/promhttputil.MergeValues.func2(0xc000f55b60)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promhttputil/merge.go:150 +0xcb
github.com/jacksontj/promxy/pkg/promhttputil.MergeValues(0x0, 0x270b8a0, 0xc0023df580, 0x270b8a0, 0xc0010e2a00, 0x1, 0xc001118540, 0x2744ce0, 0xc000288e00)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promhttputil/merge.go:162 +0x282
github.com/jacksontj/promxy/pkg/promclient.(*MultiAPI).QueryRange(0xc0003fad20, 0x2733f60, 0xc00024ca40, 0xc000f4a180, 0x32, 0x0, 0xed6a88d4b, 0x3940c20, 0x0, 0xed6a88d87, ...)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promclient/multi_api.go:410 +0x7d8
github.com/jacksontj/promxy/pkg/promclient.(*TimeTruncate).QueryRange(0xc0005f0820, 0x2733f60, 0xc00024ca40, 0xc000f4a180, 0x32, 0x0, 0xed6a88d4b, 0x3940c20, 0x0, 0xed6a88d87, ...)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/promclient/time_truncate.go:34 +0x1cd
github.com/jacksontj/promxy/pkg/proxystorage.(*ProxyStorage).NodeReplacer(0xc00019dc30, 0x2733f60, 0xc00024ca40, 0xc0006b3310, 0x26dd740, 0xc0001e8e80, 0xc0004b6f60, 0x0, 0xc00024ca00, 0xc000f50000)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/pkg/proxystorage/proxy.go:458 +0x1bbc
github.com/jacksontj/promxy/vendor/github.com/prometheus/prometheus/promql.Walk(0x2733f60, 0xc00024ca40, 0x26dfd60, 0xc00024c600, 0xc0006b3310, 0x26dd740, 0xc0001e8e80, 0xc00025ad90, 0x7, 0x7, ...)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/vendor/github.com/prometheus/prometheus/promql/ast.go:275 +0x186f
github.com/jacksontj/promxy/vendor/github.com/prometheus/prometheus/promql.Walk.func1(0x2733f60, 0xc00024ca40, 0x26dfd60, 0xc00024c600, 0xc0006b3310, 0xc00024c3c0, 0xc00019dd30, 0xc00117aa80, 0xc00025ad90, 0x7, ...)
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/vendor/github.com/prometheus/prometheus/promql/ast.go:328 +0xf1
created by github.com/jacksontj/promxy/vendor/github.com/prometheus/prometheus/promql.Walk
	/home/tjackson/workspace/golang/src/github.com/jacksontj/promxy/vendor/github.com/prometheus/prometheus/promql/ast.go:327 +0xd10

I analyzed code and it makes no sense to me. In merge.go:189 there is validation you added. I can't see a reason for "out of bounds" error in this line. I even build promxy myself from master branch to be sure I use newest version.
I'm still investigating it. Tomorrow I will try to update VictoriaMetrics to latest version built from master branch.

@jacksontj
Copy link
Owner

The stack trace here seems to be running the old code; line 189 is a return (meaning it can't generate that panic). When you tested with the new release version; I assume you pulled the binary off of the releases page? If so maybe something is wrong with the build :/

@jacksontj jacksontj reopened this Jul 21, 2020
@franek1709
Copy link
Author

Yes. I downloaded binary file from releases page. I even verified checksum of my binary with checksum file available on releases page. Then I created my own binary from master branch. I'am aware of the fact that stacktrace suggests I still use old binary and I'm really confused :(. I will continue my research tomorrow.

@franek1709
Copy link
Author

I had a good idea to stop investigating this problem yesterday. In the morning I discovered additional problem we have. It looks like after logrotation new log entries stopped to appear (but modification date of log file was updated!). It makes me think this problem still occurs. In the morning I found this logrotate problem and fixed it. Now I see everything works correctly with binary you released. Sorry for bothering you and again thanks for your help.

@jacksontj
Copy link
Owner

jacksontj commented Jul 22, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants