Skip to content

Query Partial Data #6526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jan 28, 2025
Merged

Query Partial Data #6526

merged 28 commits into from
Jan 28, 2025

Conversation

justinjung04
Copy link
Contributor

@justinjung04 justinjung04 commented Jan 21, 2025

What this PR does:

This PR introduces new limits (tenant config) of query_partial_data and rules_partial_data, which allows tenants to receive 2xx with a warning message when querying ingesters fail in zones greater than max unavailable zones, but less than all available zones.

For example, in the following scenario,

  • Total zone count: 3
  • Max unavailable zone: 1
  • Zones where ingesters are failing: 2

Current behavior always returns 5xx:

Screenshot 2024-12-04 at 3 04 36 PM

But with this feature enabled, customers will receive data with warning message "Query result may contain partial data".

Screenshot 2024-12-09 at 4 05 08 PM

This is useful for cortex setup that replicates ingested data across different zones. In this case, even if ingesters from different zones go down, there is a high probability of full data available in ingesters from the remaining zone(s). In such case, tenants would rather want to receive 2xx than 5xx.

To summarize:

  1. Ingesters failing across 0 to MaxUnavailableZones will return status code 200
  2. Ingesters failing across MaxUnavailableZones to TotalZones - 1 will return status code 200 with a warning message
  3. Ingesters failing in all zones will return status code 500

Which issue(s) this PR fixes:
n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@@ -3550,6 +3550,9 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
# CLI flag: -frontend.max-queriers-per-tenant
[max_queriers_per_tenant: <float> | default = 0]

# Enable query to return partial data with warning message.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this defaults to getting response from at least 1 ingester.

should this also be configurable and always needs to be lower than replication_factor? would a user want 2 out of 5 replica instead of 1 out of 5?

I think at least 1 is a good start. if the need emerges, we can re-consider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The base assumption of this feature is that zonal replication is reliable, so I wanted to keep the config nice and simple. We can definitely re-iterate in the future if needed.

@CharlieTLe
Copy link
Member

Hello @justinjung04, thank you for opening this PR.

There is a release in progress. As such, please rebase your CHANGELOG entry on top of the master branch and move the CHANGELOG entry to the top under ## master / unreleased.

Thanks,
Charlie

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
@justinjung04
Copy link
Contributor Author

Thank you @CharlieTLe, I've rebased the change.

Signed-off-by: Justin Jung <jungjust@amazon.com>
…methods as well

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Copy link
Contributor

@harry671003 harry671003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 👍

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
@alanprot
Copy link
Member

Amazing!

LGTM! thanks for doing this!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 28, 2025
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good.
The only thing is that we might want to also apply partial data response to metric metadata and query exemplars call

@yeya24 yeya24 merged commit de5cfe1 into cortexproject:master Jan 28, 2025
17 checks passed
alexqyle pushed a commit to alexqyle/cortex that referenced this pull request Jan 31, 2025
* Create partial_data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix lazyquery so that warning message is returned

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Add QueryPartialData limit

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix broken mock

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Make response with warnings to be not cached

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Updated streamingSelect in distributor_queryable

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update query.go

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update replication_set

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint again

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Generated doc

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update config description

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Do not remove warnings from seriesSet

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Avoid cache only if the warning message contains partial data error

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Remove context usage for partial data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Refactor how partial data info is passed + apply to series and label methods as well

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint + fix tests

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix build

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Create separate config for ruler partial data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Genereta doc

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Add more tests

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Change error

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix test

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Nit

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Nit

Signed-off-by: Justin Jung <jungjust@amazon.com>

---------

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
@justinjung04 justinjung04 deleted the partial-data branch March 26, 2025 17:50
yeya24 added a commit that referenced this pull request Mar 27, 2025
* Purge expired postings cache items due inactivity (#6502)

* Purge expired postings cache items due inactivity

Signed-off-by: alanprot <alanprot@gmail.com>

* Fix comments

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Update thanos to 4ba0ba403896 (#6503)

* Update thanos to 4ba0ba403896

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>

* run go mod vendor

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>

---------

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>
Co-authored-by: Daniel Sabsay <sabsay@adobe.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Bump the actions-dependencies group across 1 directory with 2 updates (#6505)

Bumps the actions-dependencies group with 2 updates in the / directory: [actions/upload-artifact](https://github.com/actions/upload-artifact) and [github/codeql-action](https://github.com/github/codeql-action).

Updates `actions/upload-artifact` from 4.5.0 to 4.6.0
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@6f51ac0...65c4c4a)

Updates `github/codeql-action` from 3.28.0 to 3.28.1
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@48ab28a...b6a472f)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-dependencies
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* calculate # of concurrency only once at the runner (#6506)

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Implement partition compaction planner (#6469)

* Implement partition compaction grouper

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix comment

Signed-off-by: Alex Le <leqiyue@amazon.com>

* replace level 1 compaction limits with ingestion replication factor

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix doc

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update compaction_visit_marker_timeout default value

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update default value for compactor_partition_index_size_limit_in_bytes

Signed-off-by: Alex Le <leqiyue@amazon.com>

* refactor code

Signed-off-by: Alex Le <leqiyue@amazon.com>

* address comments and refactor

Signed-off-by: Alex Le <leqiyue@amazon.com>

* address comment

Signed-off-by: Alex Le <leqiyue@amazon.com>

* address comment

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update config name

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Implement partition compaction planner

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix after rebase

Signed-off-by: Alex Le <leqiyue@amazon.com>

* addressed comments

Signed-off-by: Alex Le <leqiyue@amazon.com>

* updated doc and refactored metric

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix test

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Add max tenant config to tenant federation (#6493)

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Add cleaner logic to clean partition compaction blocks and related files (#6507)

* Add cleaner logic to clean partition compaction blocks and related files

Signed-off-by: Alex Le <leqiyue@amazon.com>

* refactored metrics

Signed-off-by: Alex Le <leqiyue@amazon.com>

* refactor

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update logs

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Update RELEASE.md (#6511)

Maintainers would like an additional week to get the partition compactor changes in before the first release candidate for 1.19.

Signed-off-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* update thanos version to 236777732278c64ca01c1c09d726f0f712c87164 (#6514)

Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Fix race that can cause nil reference when using expanded postings (#6518)

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Add more op label values to cortex_query_frontend_queries_total metric (#6519)

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Allow use of non-dualstack endpoints for S3 blocks storage (#6522)

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Expose grpc client connect timeout config and default to 5s (#6523)

* expose grpc client connect timeout config

Signed-off-by: yeya24 <benye@amazon.com>

* changelog

Signed-off-by: yeya24 <benye@amazon.com>

---------

Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Hook up partition compaction end to end implementation (#6510)

* Implemented partition compaction end to end with custom compaction lifecycle

Signed-off-by: Alex Le <leqiyue@amazon.com>

* removed unused variable

Signed-off-by: Alex Le <leqiyue@amazon.com>

* tweak test

Signed-off-by: Alex Le <leqiyue@amazon.com>

* tweak test

Signed-off-by: Alex Le <leqiyue@amazon.com>

* refactor according to comments

Signed-off-by: Alex Le <leqiyue@amazon.com>

* tweak test

Signed-off-by: Alex Le <leqiyue@amazon.com>

* check context error inside sharded posting

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix lint

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix integration test for memberlist

Signed-off-by: Alex Le <leqiyue@amazon.com>

* make compactor initial wait cancellable

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Test for nil on expire expanded postings (#6521)

* Test for nil on expire expanded postings

Signed-off-by: alanprot <alanprot@gmail.com>

* stopping ingester

Signed-off-by: alanprot <alanprot@gmail.com>

* refactor the test to not timeout

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* log when a request starts running in querier (#6525)

* log when a request starts running in querier

Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>

* log when a request starts running in querier for frontend processor

Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>

---------

Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Update build image according to 03a8f8c (#6508)

Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Deprecate -blocks-storage.tsdb.wal-compression-enabled flag

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Fix test (#6537)

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Mark 1.19 release in progress

https://github.com/cortexproject/cortex/blob/master/RELEASE.md#show-that-a-release-is-in-progress

Signed-off-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Prepare 1.19.0-rc.0

Signed-off-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Revert "Prepare 1.19.0-rc.0"

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup (#6547)

* Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update tests

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Remove TransferChunks gRPC method (#6543)

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Uupdate Ppromqlsmith (#6557)

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Query Partial Data (#6526)

* Create partial_data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix lazyquery so that warning message is returned

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Add QueryPartialData limit

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix broken mock

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Make response with warnings to be not cached

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Updated streamingSelect in distributor_queryable

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update query.go

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update replication_set

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint again

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Generated doc

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update config description

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Do not remove warnings from seriesSet

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Avoid cache only if the warning message contains partial data error

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Remove context usage for partial data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Refactor how partial data info is passed + apply to series and label methods as well

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint + fix tests

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix build

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Create separate config for ruler partial data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Genereta doc

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Add more tests

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Change error

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix test

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Nit

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Nit

Signed-off-by: Justin Jung <jungjust@amazon.com>

---------

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Add timeout for dynamodb ring kv (#6544)

* add dynamodb kv with timeout enforced

Signed-off-by: yeya24 <benye@amazon.com>

* add tests

Signed-off-by: yeya24 <benye@amazon.com>

* docs

Signed-off-by: Ben Ye <benye@amazon.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Bump the actions-dependencies group across 1 directory with 2 updates (#6564)

Bumps the actions-dependencies group with 2 updates in the / directory: [github/codeql-action](https://github.com/github/codeql-action) and [actions/setup-go](https://github.com/actions/setup-go).

Updates `github/codeql-action` from 3.28.1 to 3.28.7
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@b6a472f...6e54559)

Updates `actions/setup-go` from 5.2.0 to 5.3.0
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](actions/setup-go@3041bf5...f111f33)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-dependencies
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Fix: expanded postings can cache wrong data when queries are issued "in the future" (#6562)

* improve fuzz test for expanded postings cache

Signed-off-by: alanprot <alanprot@gmail.com>

* create more tests on the expanded postings cache

Signed-off-by: alanprot <alanprot@gmail.com>

* adding get series call on the test

Signed-off-by: alanprot <alanprot@gmail.com>

* no use CachedBlockChunkQuerier when query time range is completely after the last sample added in the head

Signed-off-by: alanprot <alanprot@gmail.com>

* adding comments

Signed-off-by: alanprot <alanprot@gmail.com>

* increase the number of fuzz test from 100 to 300

Signed-off-by: alanprot <alanprot@gmail.com>

* add get series fuzzy testing

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Extend ShuffleSharding on READONLY ingesters (#6517)

* Filter readOnly ingesters when sharding

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Extend shard on READONLY

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Remove old code

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Fix test

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* update changelog

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

---------

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Create guide doc for partition compaction

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Update docs/guides/partitioning-compactor.md

Co-authored-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: Alex Le <emoc1989@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* updated doc

Signed-off-by: Alex Le <leqiyue@amazon.com>

* clean white space

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Daniel Sabsay <sabsay@adobe.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>
Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com>
Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Alex Le <emoc1989@gmail.com>
Co-authored-by: Alan Protasio <approtas@amazon.com>
Co-authored-by: Daniel Sabsay <danielrsabsay@gmail.com>
Co-authored-by: Daniel Sabsay <sabsay@adobe.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: SungJin1212 <tjdwls1201@gmail.com>
Co-authored-by: Charlie Le <charlie_le@apple.com>
Co-authored-by: Ben Ye <benye@amazon.com>
Co-authored-by: Sam McBroom <86423878+sam-mcbr@users.noreply.github.com>
Co-authored-by: Ahmed Hassan <57634502+afhassan@users.noreply.github.com>
Co-authored-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com>
Co-authored-by: Daniel Blando <daniel@blando.com.br>
Co-authored-by: Justin Jung <jungjust@amazon.com>
justinjung04 added a commit to justinjung04/cortex that referenced this pull request Mar 27, 2025
* Purge expired postings cache items due inactivity (cortexproject#6502)

* Purge expired postings cache items due inactivity

Signed-off-by: alanprot <alanprot@gmail.com>

* Fix comments

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Update thanos to 4ba0ba403896 (cortexproject#6503)

* Update thanos to 4ba0ba403896

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>

* run go mod vendor

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>

---------

Signed-off-by: Daniel Sabsay <sabsay@adobe.com>
Co-authored-by: Daniel Sabsay <sabsay@adobe.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Bump the actions-dependencies group across 1 directory with 2 updates (cortexproject#6505)

Bumps the actions-dependencies group with 2 updates in the / directory: [actions/upload-artifact](https://github.com/actions/upload-artifact) and [github/codeql-action](https://github.com/github/codeql-action).

Updates `actions/upload-artifact` from 4.5.0 to 4.6.0
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@6f51ac0...65c4c4a)

Updates `github/codeql-action` from 3.28.0 to 3.28.1
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@48ab28a...b6a472f)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-dependencies
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* calculate # of concurrency only once at the runner (cortexproject#6506)

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Implement partition compaction planner (cortexproject#6469)

* Implement partition compaction grouper

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix comment

Signed-off-by: Alex Le <leqiyue@amazon.com>

* replace level 1 compaction limits with ingestion replication factor

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix doc

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update compaction_visit_marker_timeout default value

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update default value for compactor_partition_index_size_limit_in_bytes

Signed-off-by: Alex Le <leqiyue@amazon.com>

* refactor code

Signed-off-by: Alex Le <leqiyue@amazon.com>

* address comments and refactor

Signed-off-by: Alex Le <leqiyue@amazon.com>

* address comment

Signed-off-by: Alex Le <leqiyue@amazon.com>

* address comment

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update config name

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Implement partition compaction planner

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix after rebase

Signed-off-by: Alex Le <leqiyue@amazon.com>

* addressed comments

Signed-off-by: Alex Le <leqiyue@amazon.com>

* updated doc and refactored metric

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix test

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Add max tenant config to tenant federation (cortexproject#6493)

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Add cleaner logic to clean partition compaction blocks and related files (cortexproject#6507)

* Add cleaner logic to clean partition compaction blocks and related files

Signed-off-by: Alex Le <leqiyue@amazon.com>

* refactored metrics

Signed-off-by: Alex Le <leqiyue@amazon.com>

* refactor

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update logs

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Update RELEASE.md (cortexproject#6511)

Maintainers would like an additional week to get the partition compactor changes in before the first release candidate for 1.19.

Signed-off-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* update thanos version to 236777732278c64ca01c1c09d726f0f712c87164 (cortexproject#6514)

Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Fix race that can cause nil reference when using expanded postings (cortexproject#6518)

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Add more op label values to cortex_query_frontend_queries_total metric (cortexproject#6519)

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Allow use of non-dualstack endpoints for S3 blocks storage (cortexproject#6522)

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Expose grpc client connect timeout config and default to 5s (cortexproject#6523)

* expose grpc client connect timeout config

Signed-off-by: yeya24 <benye@amazon.com>

* changelog

Signed-off-by: yeya24 <benye@amazon.com>

---------

Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Hook up partition compaction end to end implementation (cortexproject#6510)

* Implemented partition compaction end to end with custom compaction lifecycle

Signed-off-by: Alex Le <leqiyue@amazon.com>

* removed unused variable

Signed-off-by: Alex Le <leqiyue@amazon.com>

* tweak test

Signed-off-by: Alex Le <leqiyue@amazon.com>

* tweak test

Signed-off-by: Alex Le <leqiyue@amazon.com>

* refactor according to comments

Signed-off-by: Alex Le <leqiyue@amazon.com>

* tweak test

Signed-off-by: Alex Le <leqiyue@amazon.com>

* check context error inside sharded posting

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix lint

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix integration test for memberlist

Signed-off-by: Alex Le <leqiyue@amazon.com>

* make compactor initial wait cancellable

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Test for nil on expire expanded postings (cortexproject#6521)

* Test for nil on expire expanded postings

Signed-off-by: alanprot <alanprot@gmail.com>

* stopping ingester

Signed-off-by: alanprot <alanprot@gmail.com>

* refactor the test to not timeout

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* log when a request starts running in querier (cortexproject#6525)

* log when a request starts running in querier

Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>

* log when a request starts running in querier for frontend processor

Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>

---------

Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Update build image according to cortexproject@03a8f8c (cortexproject#6508)

Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Deprecate -blocks-storage.tsdb.wal-compression-enabled flag

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Fix test (cortexproject#6537)

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Mark 1.19 release in progress

https://github.com/cortexproject/cortex/blob/master/RELEASE.md#show-that-a-release-is-in-progress

Signed-off-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Prepare 1.19.0-rc.0

Signed-off-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Revert "Prepare 1.19.0-rc.0"

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup (cortexproject#6547)

* Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup

Signed-off-by: Alex Le <leqiyue@amazon.com>

* update tests

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Remove TransferChunks gRPC method (cortexproject#6543)

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Uupdate Ppromqlsmith (cortexproject#6557)

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Query Partial Data (cortexproject#6526)

* Create partial_data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix lazyquery so that warning message is returned

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Add QueryPartialData limit

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix broken mock

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Make response with warnings to be not cached

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Updated streamingSelect in distributor_queryable

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update query.go

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update replication_set

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint again

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Generated doc

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update config description

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Do not remove warnings from seriesSet

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Avoid cache only if the warning message contains partial data error

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Remove context usage for partial data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Refactor how partial data info is passed + apply to series and label methods as well

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Lint + fix tests

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix build

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Create separate config for ruler partial data

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Genereta doc

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Add more tests

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Change error

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Fix test

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Update changelog

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Nit

Signed-off-by: Justin Jung <jungjust@amazon.com>

* Nit

Signed-off-by: Justin Jung <jungjust@amazon.com>

---------

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Add timeout for dynamodb ring kv (cortexproject#6544)

* add dynamodb kv with timeout enforced

Signed-off-by: yeya24 <benye@amazon.com>

* add tests

Signed-off-by: yeya24 <benye@amazon.com>

* docs

Signed-off-by: Ben Ye <benye@amazon.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Bump the actions-dependencies group across 1 directory with 2 updates (cortexproject#6564)

Bumps the actions-dependencies group with 2 updates in the / directory: [github/codeql-action](https://github.com/github/codeql-action) and [actions/setup-go](https://github.com/actions/setup-go).

Updates `github/codeql-action` from 3.28.1 to 3.28.7
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@b6a472f...6e54559)

Updates `actions/setup-go` from 5.2.0 to 5.3.0
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](actions/setup-go@3041bf5...f111f33)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-dependencies
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Fix: expanded postings can cache wrong data when queries are issued "in the future" (cortexproject#6562)

* improve fuzz test for expanded postings cache

Signed-off-by: alanprot <alanprot@gmail.com>

* create more tests on the expanded postings cache

Signed-off-by: alanprot <alanprot@gmail.com>

* adding get series call on the test

Signed-off-by: alanprot <alanprot@gmail.com>

* no use CachedBlockChunkQuerier when query time range is completely after the last sample added in the head

Signed-off-by: alanprot <alanprot@gmail.com>

* adding comments

Signed-off-by: alanprot <alanprot@gmail.com>

* increase the number of fuzz test from 100 to 300

Signed-off-by: alanprot <alanprot@gmail.com>

* add get series fuzzy testing

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Extend ShuffleSharding on READONLY ingesters (cortexproject#6517)

* Filter readOnly ingesters when sharding

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Extend shard on READONLY

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Remove old code

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* Fix test

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

* update changelog

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>

---------

Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Create guide doc for partition compaction

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Update docs/guides/partitioning-compactor.md

Co-authored-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: Alex Le <emoc1989@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* updated doc

Signed-off-by: Alex Le <leqiyue@amazon.com>

* clean white space

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Daniel Sabsay <sabsay@adobe.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>
Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>
Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com>
Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Alex Le <emoc1989@gmail.com>
Co-authored-by: Alan Protasio <approtas@amazon.com>
Co-authored-by: Daniel Sabsay <danielrsabsay@gmail.com>
Co-authored-by: Daniel Sabsay <sabsay@adobe.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: SungJin1212 <tjdwls1201@gmail.com>
Co-authored-by: Charlie Le <charlie_le@apple.com>
Co-authored-by: Ben Ye <benye@amazon.com>
Co-authored-by: Sam McBroom <86423878+sam-mcbr@users.noreply.github.com>
Co-authored-by: Ahmed Hassan <57634502+afhassan@users.noreply.github.com>
Co-authored-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com>
Co-authored-by: Daniel Blando <daniel@blando.com.br>
Co-authored-by: Justin Jung <jungjust@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/querier lgtm This PR has been approved by a maintainer size/XL type/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants