Skip to content

Fix querier try store gateways on different zones #5476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 25, 2023

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Jul 24, 2023

What this PR does:

This pr changes GetClientsFor method of BlocksStoreSet interface to take an additional map of retried zones.
blocksStoreReplicationSet is mainly changed. If zone awareness is not enabled, the logic should be the same. If enabled, the algorithm is:

  1. There is a map to track number of attempts for each zone per block.
  2. For each block, we get the min attempts over all zones.
  3. Iterate all instances in the replication set, if the instance is located in the zone where its attempts == minAttempts, pick the target instance.

Which issue(s) this PR fixes:
Fixes #5468

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@yeya24 yeya24 force-pushed the retry-different-zones branch from 812f143 to 963b10f Compare July 25, 2023 01:34
yeya24 added 4 commits July 25, 2023 09:50
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 force-pushed the retry-different-zones branch from df034a7 to 746375b Compare July 25, 2023 16:50
Copy link
Contributor

@harry671003 harry671003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Ben.

Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24
Copy link
Contributor Author

yeya24 commented Jul 25, 2023

This seems concerning https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476.

query_fuzz_test.go:190: case 105 results mismatch.
        range query: sqrt(
          topk(
            --scalar(count_values by (job, series) ("value", -max({__name__="test_series_a"}))),
            count_values by (series, status_code, __name__) (
              "value",
              max_over_time({__name__="test_series_b"}[1h:1m] offset -3m38s)
            )
          )
        )
        res1: {series="3", status_code="200", value="65"} =>
        1 @[1690304[58](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:59)2.945]
        {series="3", status_code="200", value="67"} =>
        1 @[1690304[61](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:62)2.945]
        1 @[1690304642.945]
        {series="3", status_code="200", value="69"} =>
        1 @[1690304672.945]
        1 @[1690304702.945]
        {series="3", status_code="200", value="75"} =>
        1 @[1690304852.945]
        1 @[1690304882.945]
        {series="4", status_code="400", value="91"} =>
        1 @[1690304732.945]
        1 @[16903047[62](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:63).945]
        {series="4", status_code="400", value="97"} =>
        1 @[1690304912.945]
        1 @[1690304942.945]
        {series="4", status_code="400", value="99"} =>
        1 @[1690304972.945]
        1 @[1690305002.945]
        1 @[1690305032.945]
        1 @[1690305062.945]
        1 @[1690305092.945]
        1 @[1690305122.945]
        {series="5", status_code="500", value="113"} =>
        1 @[1690304792.945]
        1 @[1690304822.945]
        res2: {series="3", status_code="200", value="69"} =>
        1 @[1690304672.945]
        1 @[1690304702.945]
        {series="3", status_code="200", value="73"} =>
        1 @[1690304792.945]
        1 @[1690304822.945]
        {series="4", status_code="400", value="85"} =>
        1 @[1690304582.945]
        {series="4", status_code="400", value="87"} =>
        1 @[1690304612.945]
        1 @[1690304[64](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:65)2.945]
        {series="4", status_code="400", value="91"} =>
        1 @[1[69](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:70)0304732.945]
        1 @[1690304762.945]
        {series="4", status_code="400", value="97"} =>
        1 @[1690304912.945]
        1 @[1690304942.945]
        {series="5", status_code="500", value="115"} =>
        1 @[1690304852.945]
        1 @[1690304882.945]
        {series="5", status_code="500", value="119"} =>
        1 @[16903049[72](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:73).945]
        1 @[1690305002.945]
        1 @[1690305032.945]
        1 @[1690305062.945]
        1 @[1690305092.945]
        1 @[1690305122.945]
    query_fuzz_test.go:1[95](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:96): 
        	Error Trace:	/home/runner/work/cortex/cortex/integration/query_fuzz_test.go:195
        	Error:      	finished query fuzzing tests
        	Test:       	TestVerticalShardingFuzz
        	Messages:   	1 test cases failed

@alanprot
Copy link
Member

Thanks! LGTM

Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 merged commit e0bcca5 into cortexproject:master Jul 25, 2023
@yeya24 yeya24 deleted the retry-different-zones branch July 26, 2023 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retries in blocksStoreQuerier.queryWithConsistencyCheck() doesn't query all zones
3 participants