-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
receive: Add liveness and readiness probe #1537
Conversation
37839f0
to
4d3d4ef
Compare
cc @FUSAKLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks!
cmd/thanos/receive.go
Outdated
@@ -278,6 +284,7 @@ func runReceive( | |||
s := newStoreGRPCServer(logger, reg, tracer, tsdbStore, opts) | |||
|
|||
level.Info(logger).Log("msg", "listening for StoreAPI gRPC", "address", grpcBindAddr) | |||
statusProber.SetReady() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the receiver should probably not be ready until the TSDB is ready? Also not sure about the hashring and the receive interface is also not guarantied to be up at this point.
Maybe this will require some more complex condition for the ready state 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FUSAKLA For the TSDB, it's ready at this stage if you check line 270
. It runs after TSDB
is open.
For receive interface, I thought if something goes south it'll change liveness state so, readiness won't be needed.
I guess I need to double-check the hashring readiness.
I'll have another look at it.
105037a
to
43c5a8a
Compare
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
43c5a8a
to
c4279bb
Compare
@@ -277,6 +290,8 @@ func runReceive( | |||
} | |||
s := newStoreGRPCServer(logger, reg, tracer, tsdbStore, opts) | |||
|
|||
// Wait hashring to be ready before start serving metrics | |||
<-hashringReady |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we waiting for the hashring to be ready before serving metrics from the store? These things are entirely independent IMO
* Add prober to receive Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entries Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update README Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Add prober to receive Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entries Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update README Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
* Some updates to compact docs Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * some formatting Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Update docs/components/compact.md accept PR suggestions Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add metalmatze to list of maintainers (#1547) Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * resolve comments Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * resolve last comment Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * receive: Add liveness and readiness probe (#1537) * Add prober to receive Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entries Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update README Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * downsample: Add liveness and readiness probe (#1540) * Add readiness and liveness probes for downsampler Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entry Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Set ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update CHANGELOG Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Clean CHANGELOG Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Document the dnssrvnoa option (#1551) Signed-off-by: Antonio Santos <antonio@santosvelasco.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * feat store: added readiness and livenes prober (#1460) Signed-off-by: Martin Chodur <m.chodur@seznam.cz> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add Hotstar to adopters. (#1553) It's the largest streaming service in India that does cricket and GoT for India. They have insane scale and are using Thanos to scale their Prometheus. Spoke to them offline about adding the logo and will get a signoff here too. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Fix hotstar logo in the adoptor's list (#1558) Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552) Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Compactor: Fix for #844 - Ignore object if it is the current directory (#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> * Add full-stop Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Adding doc explaining the importance of groups for compactor (#1555) Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add blank line for list (#1566) The format of these files is wrong in the web. Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Refactor compactor constants, fix bucket column (#1561) * compact: unify different time constants Use downsample.* constants where possible. Move the downsampling time ranges into constants and use them as well. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * bucket: refactor column calculation into compact Fix the column's name and name it UNTIL-DOWN because that is what it actually shows - time until the next downsampling. Move out the calculation into a separate function into the compact package. Ideally we could use the retention policies in this calculation as well but the `bucket` subcommand knows nothing about them :-( Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * compact: fix issues with naming Reorder the constants and fix mistakes. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * remove duplicate Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
* Add prober to receive Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entries Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update README Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
* Some updates to compact docs Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * some formatting Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Update docs/components/compact.md accept PR suggestions Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add metalmatze to list of maintainers (#1547) Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * resolve comments Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * resolve last comment Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * receive: Add liveness and readiness probe (#1537) * Add prober to receive Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entries Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update README Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * downsample: Add liveness and readiness probe (#1540) * Add readiness and liveness probes for downsampler Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entry Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Set ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update CHANGELOG Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Clean CHANGELOG Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Document the dnssrvnoa option (#1551) Signed-off-by: Antonio Santos <antonio@santosvelasco.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * feat store: added readiness and livenes prober (#1460) Signed-off-by: Martin Chodur <m.chodur@seznam.cz> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add Hotstar to adopters. (#1553) It's the largest streaming service in India that does cricket and GoT for India. They have insane scale and are using Thanos to scale their Prometheus. Spoke to them offline about adding the logo and will get a signoff here too. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Fix hotstar logo in the adoptor's list (#1558) Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552) Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Compactor: Fix for #844 - Ignore object if it is the current directory (#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> * Add full-stop Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Adding doc explaining the importance of groups for compactor (#1555) Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add blank line for list (#1566) The format of these files is wrong in the web. Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Refactor compactor constants, fix bucket column (#1561) * compact: unify different time constants Use downsample.* constants where possible. Move the downsampling time ranges into constants and use them as well. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * bucket: refactor column calculation into compact Fix the column's name and name it UNTIL-DOWN because that is what it actually shows - time until the next downsampling. Move out the calculation into a separate function into the compact package. Ideally we could use the retention policies in this calculation as well but the `bucket` subcommand knows nothing about them :-( Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * compact: fix issues with naming Reorder the constants and fix mistakes. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * remove duplicate Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
This PR,
/-/healthy
endpoint for liveness checks./-/ready
endpoint for readiness checks.Changes
/-/healthy
endpoint for liveness checks./-/ready
endpoint for readiness checks.prober.Prober
for readiness and liveness endpoints.Verification
make test
Started
thanos receive
and made a request to related endpoints.