Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receive: fix serverAsClient.Series goroutines leak #6948

Merged
merged 6 commits into from
May 20, 2024

Conversation

thibaultmg
Copy link
Contributor

@thibaultmg thibaultmg commented Dec 1, 2023

Series method in the serverAsClient struct incurs goroutines leak when the request is stopped due to a context cancellation.
When cancelling the context:

This also happens in proxy_heap lazyRespSet and eagerRespSet structures where the same context Done channel is again consumed in parallel to Recv() calls

I have removed parallel Recv() and ctx.Done() consumptions for the storepb.SeriesResponse types of clients.

I haven't included new tests because testing goroutines leaks in a deterministic manner is complicated, but don't hesitate if you have suggestions. However I validated this leak locally with a custom unit test.

This fix should reduce Receiver memory usage and unwanted crashes when OOMing.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@thibaultmg thibaultmg marked this pull request as ready for review December 1, 2023 10:57
@GiedriusS
Copy link
Member

GiedriusS commented Dec 1, 2023

Thanks for this fix. Maybe you could also upload the goroutine dump when this leak happens? 🤔 just so people would be able to find this problem easily. I have also been chasing a goroutine leak but I'm not sure if it's the same problem.

@thibaultmg
Copy link
Contributor Author

thibaultmg commented Dec 1, 2023

Thanks for this fix. Maybe you could also upload the goroutine dump when this leak happens? 🤔 just so people would be able to find this problem easily. I have also been chasing a goroutine leak but I'm not sure if it's the same problem.

Yes sure, here is what I have:

Flat Cumulative Name
449.31k 449.31k runtime.gopark
1 1 github.com/prometheus/prometheus/tsdb/encoding.(*Decbuf).Uvarint64
0 449.31k runtime.chansend
0 449.31k runtime.chansend1
0 449.31k github.com/thanos-io/thanos/pkg/store/storepb.serverAsClient.Series.func1
0 1 github.com/prometheus/prometheus/tsdb/encoding.(*Decbuf).Uvarint
0 1 github.com/prometheus/prometheus/tsdb/index.(*Decoder).Series
0 1 github.com/prometheus/prometheus/tsdb/index.(*Reader).Series
0 1 github.com/prometheus/prometheus/tsdb.blockIndexReader.Series
0 1 github.com/prometheus/prometheus/tsdb.(*blockBaseSeriesSet).Next
0 1 github.com/prometheus/prometheus/storage.(*genericMergeSeriesSet).Next
0 1 github.com/prometheus/prometheus/storage.(*lazyGenericSeriesSet).Next
0 1 github.com/thanos-io/thanos/pkg/store.(*TSDBStore).Series
0 1 github.com/thanos-io/thanos/pkg/store.(*recoverableStoreServer).Series

@thibaultmg thibaultmg changed the title Draft: Receive: fix serverAsClient.Series goroutines leak Receive: fix serverAsClient.Series goroutines leak Dec 1, 2023
@thibaultmg thibaultmg force-pushed the fix_goroutines_leak_series_req branch 2 times, most recently from 56902d0 to f7b0d36 Compare April 4, 2024 09:11
@thibaultmg
Copy link
Contributor Author

thibaultmg commented Apr 5, 2024

Hello, I have done a more in-depth study on the root cause of the goroutines leak, that still occurs with v34.1.
Here is what happens:

  • We have set storeAPI limits that get regularly triggered on the receive
  • While streaming the response, the limiter gets triggered, returns an error from limitedServer.Send()
  • This error in turn is propagated to ProxyStore.Series() that returns with an error
  • The Series() error is propagated to the gRPC server that returns an error to the client and cancels the context of the request.
  • At that point, there are three goroutines still running:
    • G1: lazyRespSet with handleRecvResponse()
    • G2: TSDBStore.Series() that blocks on inProcessStream.Send()
    • G3: spawned by serverAsClient.Series(), waiting for the Series() call to return in inSrv.err <- s.srv.Series(in, inSrv)

From there, we have several possible execution orders and several end up leaving G3 dangling such as:

  • G1 first reads the ctx.Done() channel and returns. G2 reads the ctx.Done() channel from the inProcessStream.Send() and returns. G3 has the Series() call return and blocks trying to write in the err channel that is not consumed anymore by G1. G3 is left dangling.

This can be seen on this graph (the container just started) where in ~50% of the cases where the request is limited, a goroutine is leaked.
Screenshot 2024-03-28 at 17 17 13

I have deployed this PR for the past 24h in our staging environment and it prevents the leak without breaking other parts.

My questions are then:

  • Do you want me to reinsert the unit test showing the leak (it is in one of the commits)?
  • Units test are failing for what seems to be unrelated reasons, what do you think I should do about it? Fixed with main merge.
  • Do you need additional information to make this PR move forward?

Thanks 😄

pkg/store/proxy_heap.go Outdated Show resolved Hide resolved
@fpetkovski
Copy link
Contributor

You can use the /debug/pprof/goroutine endpoint to see which goroutines are running when this happens.

@thibaultmg
Copy link
Contributor Author

You can use the /debug/pprof/goroutine endpoint to see which goroutines are running when this happens.

I am not sure what you mean. The PR fixes the leak as tested in our staging environment. The leaked goroutine is already identified. We are using the Parca profiling tool to scrape the pprof endpoints. This is how it was found.

fpetkovski
fpetkovski previously approved these changes Apr 12, 2024
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to rebase and fix conflicts.

@thibaultmg thibaultmg force-pushed the fix_goroutines_leak_series_req branch 2 times, most recently from 00c4da6 to 9350789 Compare April 15, 2024 08:51
@thibaultmg
Copy link
Contributor Author

@fpetkovski PR is ready.

fpetkovski
fpetkovski previously approved these changes Apr 16, 2024
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, but would beed @GiedriusS review since he is most familiar with this component.

@thibaultmg
Copy link
Contributor Author

Ack, let me know if you need anything @GiedriusS

case r, ok := <-s.srv.recv:
if !ok {
return nil, io.EOF
}
return r, nil
case err := <-s.srv.err:
if err == nil {
case err, ok := <-s.srv.err:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this lead to problems if s.srv.err is closed while we are continuously calling Recv() and there's something to read from s.srv.recv? In other words, from my understanding select{} chooses randomly if there are multiple possibilities so with s.srv.err closed we could prematurely return io.EOF, no? (https://go.dev/ref/spec#Select_statements, see second point). I think the same applies to the case r, ok branch. My suggestion would be to give a priority to s.srv.recv.

select {
  case r, ok := <-s.srv.recv:
    if !ok {
      break
    }
    return r, nil
}

err, ok := <-s.srv.err
if ok {
  return nil, err
}
return nil, io.EOF

Copy link
Contributor Author

@thibaultmg thibaultmg May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the race scenario you describe can happen here:

  • The s.srv.err channel can only be closed when s.srv.Series(in, inSrv) returns.
  • The Series() method only returns after it has finished pushing data into the unbuffered s.srv.recv channel.
  • Consequently, Series() only returns once the last bits of data have been consumed by the Recv() caller.
  • Making the suggested race impossible.

Regarding the proposed change to give priority to s.srv.recv, I see a potential issue if the Series() call returns an error:

  • Recv() will block on reading s.srv.recv which is not yet closed.
  • Series() will block trying to write the error into the unbuffered err channel.
  • This results in neither channel being closed, leading to a deadlock.

WDYT @GiedriusS ?

GiedriusS
GiedriusS previously approved these changes May 16, 2024
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. We actually hit this in prod too :( could you rebase and let's merge it? 🙏

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
@thibaultmg thibaultmg dismissed stale reviews from GiedriusS and fpetkovski via 64f38ec May 16, 2024 14:19
@thibaultmg thibaultmg force-pushed the fix_goroutines_leak_series_req branch from 8b239ca to 64f38ec Compare May 16, 2024 14:19
@thibaultmg
Copy link
Contributor Author

No worries @GiedriusS , happy to see we're not the only ones having this issue 😅 . I just rebased.

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took some time to dig into all of this again and it looks correct to me. Thank you and sorry for taking a lot of time to merge this

@GiedriusS GiedriusS merged commit 258154a into thanos-io:main May 20, 2024
20 checks passed
@thibaultmg
Copy link
Contributor Author

No worries, thanks for merging this.

GiedriusS pushed a commit to vinted/thanos that referenced this pull request May 22, 2024
* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
GiedriusS added a commit to vinted/thanos that referenced this pull request May 23, 2024
Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)
saswatamcode pushed a commit to saswatamcode/thanos that referenced this pull request May 28, 2024
* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
saswatamcode added a commit that referenced this pull request May 28, 2024
* compact: recover from panics (#7318)

For #6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Sidecar: wait for prometheus on startup (#7323)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: fix serverAsClient.Series goroutines leak (#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Receive: fix stats (#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Query: dont pass query hints to avoid triggering pushdown (#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
jnyi pushed a commit to jnyi/thanos that referenced this pull request Jun 1, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
jnyi pushed a commit to jnyi/thanos that referenced this pull request Jun 4, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
jnyi pushed a commit to jnyi/thanos that referenced this pull request Oct 17, 2024
* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
jnyi pushed a commit to jnyi/thanos that referenced this pull request Oct 17, 2024
* compact: recover from panics (thanos-io#7318)

For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Sidecar: wait for prometheus on startup (thanos-io#7323)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948)

* fix serverAsClient goroutines leak

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix lint

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* delete invalid comment

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove temp dev test

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* remove timer channel drain

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Receive: fix stats (thanos-io#7373)

If we account stats for remote write and local writes we will count them
twice since the remote write will be counted locally again by the remote
receiver instance.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382)

* *: Ensure objstore flag values are masked & disable debug/pprof/cmdline

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* small fix

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392)

If we have a new querier it will create query hints even without the
pushdown feature being present anymore. Old sidecars will then trigger
query pushdown which leads to broken max,min,max_over_time and
min_over_time.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Cut patch release v0.35.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants