-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add compare flags func to compare flags between prometheus and sidecar #838
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This generally looks awesome, just minor nits (:
Hi, @bwplotka curl http://localhost:9090/version
{
"version": "2.1.0",
"revision": "85f23d82a045d103ea7f3c89a91fba4a93e6367a",
"branch": "HEAD",
"buildUser": "root@6e784304d3ff",
"buildDate": "20180119-12:01:23",
"goVersion": "go1.9.2"
} If version is older than 2.2.xxx, check the flags. Otherwise skip the check and print a warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Just few suggestions on logging severity and some nitpicking.
e02c52a
to
b0ce5e1
Compare
Hello @bwplotka |
998488e
to
b813c70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a useful feature to have - let's try to get this merged :) the Prometheus that is used in the e2e tests seems like it doesn't have any version data compiled in it so we must handle that case as well. You can see that in the logs:
level=info ts=2019-03-06T18:10:40.854132599Z caller=main.go:238 msg="Starting Prometheus" version="(version=, branch=, revision=)"
So handle that case + add a test for it.
To shed even more light on why this fails ATM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this into a (hidden) option which is true by default?
4e7c4bd
to
3c227c0
Compare
3c227c0
to
1a49b00
Compare
cmd/thanos/sidecar.go
Outdated
@@ -125,6 +128,31 @@ func runSidecar( | |||
|
|||
ctx, cancel := context.WithCancel(context.Background()) | |||
g.Add(func() error { | |||
// Only check Prometheus's flags when upload is enabled. | |||
if uploadCompacted { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag is for a different purpose 😄 this one is true
if we want to upload blocks which are already compacted. We want to actually check this: https://github.com/improbable-eng/thanos/blob/f800a367cddd0150ba94ec0514ece9f4e59814de/cmd/thanos/sidecar.go#L265 Could you move that check over to here? 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Can you review again?
cmd/thanos/sidecar.go
Outdated
return errors.Wrap(err, "validate Prometheus flags") | ||
} | ||
} else { | ||
level.Info(logger).Log("msg", "uploading compacted blocks is disabled, skip validation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably change this message too accordingly 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for this low quality patch. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok, no worries 😄 there is no rush and everything's ok 👍
25dcdc6
to
880bb8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one small nit but overall LGTM. Thank you for the awesome work and being persistent 👍 it's not easy
pkg/promclient/promclient.go
Outdated
|
||
b, err := ioutil.ReadAll(resp.Body) | ||
if err != nil { | ||
return nil, errors.Errorf("failed to read body") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor thing: errors.Errorf
is not needed here since you aren't doing any formatting. Just use errors.New
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, one more thing: https://github.com/improbable-eng/thanos/blob/cd2061d479c6395c119bc385cd16e0e9ad884ce5/pkg/shipper/shipper.go#L142 can we remove this also? Sorry, I missed that part 😞
b27a647
to
09f0184
Compare
Yes, thanks for pointing this. Because we check compaction before this shipper logic, is it ok to skip this flags check here and directly return an shipper? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a rebase otherwise lgtm 👍
e1526ae
to
5828e4b
Compare
@GiedriusS @povilasv @bwplotka PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, good work. Some suggestions, but generally LGTM
cmd/thanos/sidecar.go
Outdated
@@ -9,6 +9,8 @@ import ( | |||
"sync" | |||
"time" | |||
|
|||
"github.com/hashicorp/go-version" | |||
"github.com/prometheus/common/model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are missing make format
cmd/thanos/sidecar.go
Outdated
@@ -111,6 +113,17 @@ func runSidecar( | |||
maxt: math.MaxInt64, | |||
} | |||
|
|||
confContentYaml, err := objStoreConfig.Content() | |||
if err != nil { | |||
return errors.Wrap(err, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really wrong, what's the of empty wrap ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fault. Miss is, IMO I think just return err is ok, no need to wrap it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we wrap it adding more context? some thinkg errors.wrap(err, "error getting object store config")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. But all other code of objStoreConfig.Content just return err directly. please consider add this wrap later.
cmd/thanos/sidecar.go
Outdated
return errors.Wrap(err, "validate Prometheus flags") | ||
} | ||
} else { | ||
level.Info(logger).Log("msg", "uploading blocks is disabled, skip validation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, let's remove
cmd/thanos/sidecar.go
Outdated
@@ -298,13 +322,39 @@ func runSidecar( | |||
return nil | |||
} | |||
|
|||
func validatePrometheus(ctx context.Context, logger log.Logger, m *promMetadata) error { | |||
if m.version == nil { | |||
level.Warn(logger).Log("msg", "fetched version is nil or invalid, skip validation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth to mention here in log line why we are skipping?
cmd/thanos/sidecar.go
Outdated
return errors.Wrap(err, "failed to check flags") | ||
} | ||
// Check if min-block-time and max-block-time are 2h. | ||
if flags.TSDBMinTime != model.Duration(2*time.Hour) || flags.TSDBMaxTime != model.Duration(2*time.Hour) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So.. there are people who still want to run non 2h blocks. I think we should:
- Check IF compaction is disabled by checking
if flags.TSDBMinTime != flags.TSDBMaxTime
and fail if true - Check if MinTime == 2h and produce WARNING only.
What do you think? With that approach we are safe and we warned people. Also this is what this error line suggests. Disable compaction is must have, 2h is only recommended.
cc @tarrall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point.
1cd4047
to
2960787
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
still lgtm but let's squash the commit |
2960787
to
df6b4e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just very small nits, but not blocking.
Thanks for good work @yeya24 and reviewers!
Done. Thanks for your patient reviews. PTAL @bwplotka @GiedriusS |
Please rebase it against |
cmd/thanos/sidecar.go
Outdated
|
||
// Check if compaction is disabled. | ||
if flags.TSDBMinTime != flags.TSDBMaxTime { | ||
return errors.Errorf("Found that TSDB Max time is %s and Min time is %s. "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please lower the case of the first letter here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
add compare flags func to compare flags between prom and sidecar
b86ca3b
to
e5b28e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, thanks for being persistent and getting through so much reviews 😄 🎉
* query: cleanup store statuses as they come and go (thanos-io#910) Signed-off-by: Adrien Fillon <adrien.fillon@cdiscount.com> * [docs] Example of using official prometheus helm chart to deploy server with sidecar (thanos-io#1003) * update documentation with an example of using official prometheus helm chart Signed-off-by: Ivan Kiselev <ivan@messagebird.com> * a little formatting to values Signed-off-by: Ivan Kiselev <ivan@messagebird.com> * satisfy PR comments Signed-off-by: Ivan Kiselev <ivan@messagebird.com> * Compact: group concurrency (thanos-io#1010) * compact: add concurrency to group compact * add flag to controll the number of goroutines to use when compacting group * update compact.md for group-compact-concurrency * fixed: miss wg.Add() * address CR * regenerate docs * use err group * fix typo in flag description * handle context * set up workers in main loop * move var initialisation * remove debug log * validate concurrency * move comment * warn -> error * remove extra newline * fix typo * dns: Added miekgdns resolver as a hidden option to query and ruler. (thanos-io#1016) Fixes: thanos-io#1015 Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * query: set default evaluation interval (thanos-io#1028) Subqueries allows request with no [specified resolution](https://prometheus.io/blog/2019/01/28/subquery-support/). Set it up and allow to configure default evaluation interval. * store+compactor: pre-compute index cache during compaction (thanos-io#986) Fixes first part of thanos-io#942 This changes allow to safe some startup & sync time in store gateway as it is no longer is needed to compute index-cache from block index on its own. For compatibility store GW still can do it, but it first checks bucket if there is index-cached uploaded already. In the same time, compactor precomputes the index cache file on every compaction. To allow quicker addition of index cache files we added `--index.generate-missing-cache-file` flag, that if enabled precompute missing files on compactor startup. Note that it will take time and it's only one-off step per bucket. Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com> * Added website for Thanos' docs using Hugo. (thanos-io#807) Hosted in github pages. Signed-off-by: adrien-f <adrien.fillon@gmail.com> Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * gcs: Fixed scopes for inline ServiceAccount option. (thanos-io#1033) Without this that option was unusable. Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * Fixed root docs and liche is now checking root dir as well. (thanos-io#1040) Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * storage docs: add detail about GCS policies and testing (thanos-io#1037) * add more details about GCS policies and testing * remove fixed names from exec command * Prometheus library updated to v2.8.1 (thanos-io#1009) * compact: group concurrency improvements (thanos-io#1029) * group concurrency improvements * remove unnecessary error check * add to wg in main goroutine * receive: Add block shipping (thanos-io#1011) * receive: Add retention flag for local tsdb storage (thanos-io#1046) * querier: Add /api/v1/labels support (thanos-io#905) * Feature: add /api/v1/labels support Signed-off-by: jojohappy <sarahdj0917@gmail.com> * Disabled gossip by default, marked all flags as deprecated. (thanos-io#1055) + changed small label. Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * ruler: Fixed Chunk going out or Max Uint16. (thanos-io#1041) Fixes thanos-io#1038 Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * store: azure: allow passing an endpoint parameter for specific regions (thanos-io#980) Fix thanos-io#968 Signed-off-by: Adrien Fillon <adrien.fillon@cdiscount.com> * feature: support POST method for series endpoint (thanos-io#1021) Signed-off-by: Joseph Lee <joseph.t.lee@outlook.com> * bucket verify: repair out of order labels (thanos-io#964) * bucket verify: repair out of order labels * verify repair: correctly order series in the index on rewrite When we have label sets that are not in the correct order, fixing that changes the order of the series in the index. So the index must be rewritten in that new order. This makes this repair tool take up a bunch more memory, but produces blocks that verify correctly. * Fix the TSDB block safe-delete function The directory name must be the block ID name exactly to verify. A temp directory or random name will not work here. * verify repair: fix duplicate chunk detection Pointer/reference logic error was eliminating all chunks for a series in a given TSDB block that wasn't the first chunk. Chunks are now referenced correctly via pointers. * PR feedback: use errors.Errorf() instead of fmt.Errorf() * Use errors.New() Some linters catch errors.Errorf() as its not really part of the errors package. * Liberally comment this for loop We're comparing items by pointers, using Go's range variables is misleading here and we need not fall into the same trap. * Take advantage of sort.Interface This prevents us from having to re-implement label sorting. * PR Feedback: Comments are full sentences. * Cut release 0.4.0-rc.0 (thanos-io#1017) * Cut release 0.4.0-rc.0 🎉 🎉 NOTE: This is last release that has gossip. Signed-off-by: Bartek Plotka <bwplotka@gmail.com> Co-Authored-By: povilasv <p.versockas@gmail.com> * Fixed crossbuild. Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * ci: Env fixes. (thanos-io#1058) Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * Removed bzr requirement for make crossbuild. Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * Bump github.com/hashicorp/golang-lru from 0.5.0 to 0.5.1 (thanos-io#1051) Bumps [github.com/hashicorp/golang-lru](https://github.com/hashicorp/golang-lru) from 0.5.0 to 0.5.1. - [Release notes](https://github.com/hashicorp/golang-lru/releases) - [Commits](hashicorp/golang-lru@v0.5.0...v0.5.1) Signed-off-by: dependabot[bot] <support@dependabot.com> * Initialze and correctly register all index cache metrics. (thanos-io#1069) * store/cache: add more tests (thanos-io#1071) * Fixed Downsampling process; Fixed `runutil.CloseAndCaptureErr` (thanos-io#1070) * runutil. Simplified CloseWithErrCapture. Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * Fixed Downsampling process; Fixed runutil.CloseAndCaptureErr Fixes thanos-io#1065 Root cause: * runutil defered capture error function was not passing error properly so unit tests were passing, event though there was bug * streamed block write index cache requires index file which was not closed (saved) properly yet. Closers need to be closed to perform this. Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * objstore: Expose S3 region attribute (thanos-io#1060) Minio is able to autodetect the region for cloud providers like AWS but the logic fails with Scaleway Object Storage solution. Related issue on Minio: minio/mc#2570 * Fixed fetching go-bindata failed (thanos-io#1074) * Fixed bug: - fetching go-bindata failed. - change the repo of go-bindata to github.com/go-bindata/go-bindata, because old repo has been archived. - pin the go-bindata as v3.3.1. Signed-off-by: jojohappy <sarahdj0917@gmail.com> * Add CHANGELOG Signed-off-by: jojohappy <sarahdj0917@gmail.com> * Remove CHANGELOG Signed-off-by: jojohappy <sarahdj0917@gmail.com> * add compare flags func to compare flags between prometheus and sidecar (thanos-io#838) Original message: * update documentation for a max/min block duration add compare flags func to compare flags between prom and sidecar * fix some nits Functional change: now we check the configured flags (if possible) and error out if MinTime != MaxTime. We need to check this always since if that is not true then we will get overlapping blocks. Additionally, an error message is printed out if it is not equal to 2h (the recommended value). * Ensured index cache is best effort, refactored tests, validated edge cases. (thanos-io#1073) Fixes thanos-io#651 Current size also includes slice header. Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * website: Moved to netlify. (thanos-io#1078) Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * website: Fixing netlify. (thanos-io#1080) Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * website: Added "founded by" footer. (thanos-io#1081) Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * store/proxy: properly check if context has ended (thanos-io#1082) How the code was before it could happen that we might receive some series from the stream however by the time we'd send them back to the reader, it would not read it anymore since the deadline would have been exceeded. Properly use a `select` here to get out of the goroutine if the deadline has been exceeded. Might potentially fix a problem where we see one goroutine hanging constantly (and thus blocking from work being done): ``` goroutine profile: total 126 25 @ 0x42f62f 0x40502b 0x405001 0x404de5 0xe7435b 0x45cc41 0xe7435a github.com/improbable-eng/thanos/pkg/store.startStreamSeriesSet.func1+0x18a /go/src/github.com/improbable-eng/thanos/pkg/store/proxy.go:318 ``` * Cut release v0.4.0-rc.1 (thanos-io#1088) Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * website: Removed ghpages handling; fixed docs; and status badge. (thanos-io#1084) Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * Fix readme (thanos-io#1090) * store: Compose indexCache properly allowing injection for testing purposes. (thanos-io#1098) Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * website: add sponsor section on homepage (thanos-io#1062) * website: Adjusted logos sizing and responsiveness. (thanos-io#1105) Signed-off-by: Bartek Plotka <bwplotka@gmail.com> * Add Monzo to "Used by" section 🎉 (thanos-io#1106) * Compactor: remove malformed blocks after delay (thanos-io#1053) * compactor removes malformed blocks after delay * compactor removes malformed blocks after delay * include missing file * reuse existing freshness check * fix comment * remove unused var * fix comment * syncDelay -> consistencyDelay * fix comment * update flag description * address cr * fix dupliacte error handling * minimum value for --consistency-delay * update * docs * add test case * move test to inmem bucket * Add Utility Warehouse to "used by" section (thanos-io#1108) * Add Utility Warehouse logo * Make logo smaller * website: add Adform as users (thanos-io#1109) We use Thanos extensively as well so I have added Adform. * Cut release v0.4.0 (thanos-io#1107) Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
fix #405.
Changes
Add a function to check the flags
Verification