-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receive: fix serverAsClient.Series goroutines leak #6948
Receive: fix serverAsClient.Series goroutines leak #6948
Conversation
Thanks for this fix. Maybe you could also upload the goroutine dump when this leak happens? 🤔 just so people would be able to find this problem easily. I have also been chasing a goroutine leak but I'm not sure if it's the same problem. |
Yes sure, here is what I have:
|
56902d0
to
f7b0d36
Compare
You can use the |
I am not sure what you mean. The PR fixes the leak as tested in our staging environment. The leaked goroutine is already identified. We are using the Parca profiling tool to scrape the pprof endpoints. This is how it was found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to rebase and fix conflicts.
00c4da6
to
9350789
Compare
@fpetkovski PR is ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, but would beed @GiedriusS review since he is most familiar with this component.
Ack, let me know if you need anything @GiedriusS |
case r, ok := <-s.srv.recv: | ||
if !ok { | ||
return nil, io.EOF | ||
} | ||
return r, nil | ||
case err := <-s.srv.err: | ||
if err == nil { | ||
case err, ok := <-s.srv.err: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this lead to problems if s.srv.err
is closed while we are continuously calling Recv()
and there's something to read from s.srv.recv
? In other words, from my understanding select{}
chooses randomly if there are multiple possibilities so with s.srv.err
closed we could prematurely return io.EOF
, no? (https://go.dev/ref/spec#Select_statements, see second point). I think the same applies to the case r, ok
branch. My suggestion would be to give a priority to s.srv.recv
.
select {
case r, ok := <-s.srv.recv:
if !ok {
break
}
return r, nil
}
err, ok := <-s.srv.err
if ok {
return nil, err
}
return nil, io.EOF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the race scenario you describe can happen here:
- The
s.srv.err
channel can only be closed whens.srv.Series(in, inSrv)
returns. - The
Series()
method only returns after it has finished pushing data into the unbuffereds.srv.recv
channel. - Consequently,
Series()
only returns once the last bits of data have been consumed by theRecv()
caller. - Making the suggested race impossible.
Regarding the proposed change to give priority to s.srv.recv
, I see a potential issue if the Series()
call returns an error:
Recv()
will block on readings.srv.recv
which is not yet closed.Series()
will block trying to write the error into the unbufferederr
channel.- This results in neither channel being closed, leading to a deadlock.
WDYT @GiedriusS ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. We actually hit this in prod too :( could you rebase and let's merge it? 🙏
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
8b239ca
to
64f38ec
Compare
No worries @GiedriusS , happy to see we're not the only ones having this issue 😅 . I just rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took some time to dig into all of this again and it looks correct to me. Thank you and sorry for taking a lot of time to merge this
No worries, thanks for merging this. |
* fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)
* fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* compact: recover from panics (#7318) For #6775, it would be useful to know the exact block IDs to aid debugging. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Sidecar: wait for prometheus on startup (#7323) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: fix serverAsClient.Series goroutines leak (#6948) * fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Receive: fix stats (#7373) If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Query: dont pass query hints to avoid triggering pushdown (#7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Cut patch release v0.35.1 Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* compact: recover from panics (thanos-io#7318) For thanos-io#6775, it would be useful to know the exact block IDs to aid debugging. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Sidecar: wait for prometheus on startup (thanos-io#7323) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948) * fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Receive: fix stats (thanos-io#7373) If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Cut patch release v0.35.1 Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* compact: recover from panics (thanos-io#7318) For thanos-io#6775, it would be useful to know the exact block IDs to aid debugging. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Sidecar: wait for prometheus on startup (thanos-io#7323) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948) * fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Receive: fix stats (thanos-io#7373) If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Cut patch release v0.35.1 Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* compact: recover from panics (thanos-io#7318) For thanos-io#6775, it would be useful to know the exact block IDs to aid debugging. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Sidecar: wait for prometheus on startup (thanos-io#7323) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948) * fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Receive: fix stats (thanos-io#7373) If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Cut patch release v0.35.1 Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* compact: recover from panics (thanos-io#7318) For thanos-io#6775, it would be useful to know the exact block IDs to aid debugging. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Sidecar: wait for prometheus on startup (thanos-io#7323) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948) * fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Receive: fix stats (thanos-io#7373) If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Cut patch release v0.35.1 Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* compact: recover from panics (thanos-io#7318) For thanos-io#6775, it would be useful to know the exact block IDs to aid debugging. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Sidecar: wait for prometheus on startup (thanos-io#7323) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948) * fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Receive: fix stats (thanos-io#7373) If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Cut patch release v0.35.1 Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* compact: recover from panics (thanos-io#7318) For thanos-io#6775, it would be useful to know the exact block IDs to aid debugging. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Sidecar: wait for prometheus on startup (thanos-io#7323) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948) * fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Receive: fix stats (thanos-io#7373) If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Cut patch release v0.35.1 Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* compact: recover from panics (thanos-io#7318) For thanos-io#6775, it would be useful to know the exact block IDs to aid debugging. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Sidecar: wait for prometheus on startup (thanos-io#7323) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948) * fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Receive: fix stats (thanos-io#7373) If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Cut patch release v0.35.1 Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Series method in the serverAsClient struct incurs goroutines leak when the request is stopped due to a context cancellation.
When cancelling the context:
context.Canceled
This also happens in proxy_heap
lazyRespSet
andeagerRespSet
structures where the same context Done channel is again consumed in parallel to Recv() callsI have removed parallel
Recv()
andctx.Done()
consumptions for thestorepb.SeriesResponse
types of clients.I haven't included new tests because testing goroutines leaks in a deterministic manner is complicated, but don't hesitate if you have suggestions. However I validated this leak locally with a custom unit test.
This fix should reduce Receiver memory usage and unwanted crashes when OOMing.
Changes
Verification