-
Notifications
You must be signed in to change notification settings - Fork 820
Hook up partition compaction end to end implementation #6510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hook up partition compaction end to end implementation #6510
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -698,15 +754,26 @@ func (c *Compactor) stopping(_ error) error { | |||
} | |||
|
|||
func (c *Compactor) running(ctx context.Context) error { | |||
// Ensure an initial cleanup occurred as first thing when running compactor. | |||
if err := services.StartAndAwaitRunning(ctx, c.blocksCleaner); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason why we have to move cleaning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because cleaner cycle might be running for a while depending on how many tenants and how big each tenants are. We don't want to compactor got into unhealthy state in the ring because of long running cleaner process.
|
||
func (f *DisabledDeduplicateFilter) DuplicateIDs() []ulid.ULID { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these types private?
Can you add some comment to DisabledDeduplicateFilter
. We want to disable duplicate filter because it makes no sense for partitioning compactor as we always have duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DefaultDeduplicateFilter
from Thanos would mark duplicate blocks if those blocks are from same group and having same source blocks. In partitioning compactor, partitions from same time range would always or eventually have same source as it is the natural of partitioning compactor. We don't want those blocks got filtered out when doing grouping for next level compaction.
|
||
globalMaxt := blocks[0].Meta().MaxTime | ||
g, _ := errgroup.WithContext(ctx) | ||
g.SetLimit(8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a sane default to set in Cortex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my test 8 is good enough to keep CPU busy during compaction. I am wondering if this number is too high for end user, would this just cause CPU usage peaked at 100% for longer time?
…fecycle Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
5e89a40
to
be06845
Compare
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
…#6510) * Implemented partition compaction end to end with custom compaction lifecycle Signed-off-by: Alex Le <leqiyue@amazon.com> * removed unused variable Signed-off-by: Alex Le <leqiyue@amazon.com> * tweak test Signed-off-by: Alex Le <leqiyue@amazon.com> * tweak test Signed-off-by: Alex Le <leqiyue@amazon.com> * refactor according to comments Signed-off-by: Alex Le <leqiyue@amazon.com> * tweak test Signed-off-by: Alex Le <leqiyue@amazon.com> * check context error inside sharded posting Signed-off-by: Alex Le <leqiyue@amazon.com> * fix lint Signed-off-by: Alex Le <leqiyue@amazon.com> * fix integration test for memberlist Signed-off-by: Alex Le <leqiyue@amazon.com> * make compactor initial wait cancellable Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com>
* Purge expired postings cache items due inactivity (#6502) * Purge expired postings cache items due inactivity Signed-off-by: alanprot <alanprot@gmail.com> * Fix comments Signed-off-by: alanprot <alanprot@gmail.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Update thanos to 4ba0ba403896 (#6503) * Update thanos to 4ba0ba403896 Signed-off-by: Daniel Sabsay <sabsay@adobe.com> * run go mod vendor Signed-off-by: Daniel Sabsay <sabsay@adobe.com> --------- Signed-off-by: Daniel Sabsay <sabsay@adobe.com> Co-authored-by: Daniel Sabsay <sabsay@adobe.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Bump the actions-dependencies group across 1 directory with 2 updates (#6505) Bumps the actions-dependencies group with 2 updates in the / directory: [actions/upload-artifact](https://github.com/actions/upload-artifact) and [github/codeql-action](https://github.com/github/codeql-action). Updates `actions/upload-artifact` from 4.5.0 to 4.6.0 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@6f51ac0...65c4c4a) Updates `github/codeql-action` from 3.28.0 to 3.28.1 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@48ab28a...b6a472f) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions-dependencies - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * calculate # of concurrency only once at the runner (#6506) Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Implement partition compaction planner (#6469) * Implement partition compaction grouper Signed-off-by: Alex Le <leqiyue@amazon.com> * fix comment Signed-off-by: Alex Le <leqiyue@amazon.com> * replace level 1 compaction limits with ingestion replication factor Signed-off-by: Alex Le <leqiyue@amazon.com> * fix doc Signed-off-by: Alex Le <leqiyue@amazon.com> * update compaction_visit_marker_timeout default value Signed-off-by: Alex Le <leqiyue@amazon.com> * update default value for compactor_partition_index_size_limit_in_bytes Signed-off-by: Alex Le <leqiyue@amazon.com> * refactor code Signed-off-by: Alex Le <leqiyue@amazon.com> * address comments and refactor Signed-off-by: Alex Le <leqiyue@amazon.com> * address comment Signed-off-by: Alex Le <leqiyue@amazon.com> * address comment Signed-off-by: Alex Le <leqiyue@amazon.com> * update config name Signed-off-by: Alex Le <leqiyue@amazon.com> * Implement partition compaction planner Signed-off-by: Alex Le <leqiyue@amazon.com> * fix after rebase Signed-off-by: Alex Le <leqiyue@amazon.com> * addressed comments Signed-off-by: Alex Le <leqiyue@amazon.com> * updated doc and refactored metric Signed-off-by: Alex Le <leqiyue@amazon.com> * fix test Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com> * Add max tenant config to tenant federation (#6493) Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Add cleaner logic to clean partition compaction blocks and related files (#6507) * Add cleaner logic to clean partition compaction blocks and related files Signed-off-by: Alex Le <leqiyue@amazon.com> * refactored metrics Signed-off-by: Alex Le <leqiyue@amazon.com> * refactor Signed-off-by: Alex Le <leqiyue@amazon.com> * update logs Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com> * Update RELEASE.md (#6511) Maintainers would like an additional week to get the partition compactor changes in before the first release candidate for 1.19. Signed-off-by: Charlie Le <charlie_le@apple.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * update thanos version to 236777732278c64ca01c1c09d726f0f712c87164 (#6514) Signed-off-by: yeya24 <benye@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Fix race that can cause nil reference when using expanded postings (#6518) Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Add more op label values to cortex_query_frontend_queries_total metric (#6519) Signed-off-by: Alex Le <leqiyue@amazon.com> * Allow use of non-dualstack endpoints for S3 blocks storage (#6522) Signed-off-by: Alex Le <leqiyue@amazon.com> * Expose grpc client connect timeout config and default to 5s (#6523) * expose grpc client connect timeout config Signed-off-by: yeya24 <benye@amazon.com> * changelog Signed-off-by: yeya24 <benye@amazon.com> --------- Signed-off-by: yeya24 <benye@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Hook up partition compaction end to end implementation (#6510) * Implemented partition compaction end to end with custom compaction lifecycle Signed-off-by: Alex Le <leqiyue@amazon.com> * removed unused variable Signed-off-by: Alex Le <leqiyue@amazon.com> * tweak test Signed-off-by: Alex Le <leqiyue@amazon.com> * tweak test Signed-off-by: Alex Le <leqiyue@amazon.com> * refactor according to comments Signed-off-by: Alex Le <leqiyue@amazon.com> * tweak test Signed-off-by: Alex Le <leqiyue@amazon.com> * check context error inside sharded posting Signed-off-by: Alex Le <leqiyue@amazon.com> * fix lint Signed-off-by: Alex Le <leqiyue@amazon.com> * fix integration test for memberlist Signed-off-by: Alex Le <leqiyue@amazon.com> * make compactor initial wait cancellable Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com> * Test for nil on expire expanded postings (#6521) * Test for nil on expire expanded postings Signed-off-by: alanprot <alanprot@gmail.com> * stopping ingester Signed-off-by: alanprot <alanprot@gmail.com> * refactor the test to not timeout Signed-off-by: alanprot <alanprot@gmail.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * log when a request starts running in querier (#6525) * log when a request starts running in querier Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com> * log when a request starts running in querier for frontend processor Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com> --------- Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Update build image according to 03a8f8c (#6508) Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Deprecate -blocks-storage.tsdb.wal-compression-enabled flag Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Fix test (#6537) Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Mark 1.19 release in progress https://github.com/cortexproject/cortex/blob/master/RELEASE.md#show-that-a-release-is-in-progress Signed-off-by: Charlie Le <charlie_le@apple.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Prepare 1.19.0-rc.0 Signed-off-by: Charlie Le <charlie_le@apple.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Revert "Prepare 1.19.0-rc.0" Signed-off-by: Alex Le <leqiyue@amazon.com> * Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup (#6547) * Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup Signed-off-by: Alex Le <leqiyue@amazon.com> * update tests Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com> * Remove TransferChunks gRPC method (#6543) Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Uupdate Ppromqlsmith (#6557) Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Query Partial Data (#6526) * Create partial_data Signed-off-by: Justin Jung <jungjust@amazon.com> * Fix lazyquery so that warning message is returned Signed-off-by: Justin Jung <jungjust@amazon.com> * Add QueryPartialData limit Signed-off-by: Justin Jung <jungjust@amazon.com> * Fix broken mock Signed-off-by: Justin Jung <jungjust@amazon.com> * Make response with warnings to be not cached Signed-off-by: Justin Jung <jungjust@amazon.com> * Updated streamingSelect in distributor_queryable Signed-off-by: Justin Jung <jungjust@amazon.com> * Update query.go Signed-off-by: Justin Jung <jungjust@amazon.com> * Update replication_set Signed-off-by: Justin Jung <jungjust@amazon.com> * Lint Signed-off-by: Justin Jung <jungjust@amazon.com> * Lint again Signed-off-by: Justin Jung <jungjust@amazon.com> * Generated doc Signed-off-by: Justin Jung <jungjust@amazon.com> * Changelog Signed-off-by: Justin Jung <jungjust@amazon.com> * Update config description Signed-off-by: Justin Jung <jungjust@amazon.com> * Do not remove warnings from seriesSet Signed-off-by: Justin Jung <jungjust@amazon.com> * Avoid cache only if the warning message contains partial data error Signed-off-by: Justin Jung <jungjust@amazon.com> * Remove context usage for partial data Signed-off-by: Justin Jung <jungjust@amazon.com> * Refactor how partial data info is passed + apply to series and label methods as well Signed-off-by: Justin Jung <jungjust@amazon.com> * Lint + fix tests Signed-off-by: Justin Jung <jungjust@amazon.com> * Fix build Signed-off-by: Justin Jung <jungjust@amazon.com> * Create separate config for ruler partial data Signed-off-by: Justin Jung <jungjust@amazon.com> * Genereta doc Signed-off-by: Justin Jung <jungjust@amazon.com> * Add more tests Signed-off-by: Justin Jung <jungjust@amazon.com> * Change error Signed-off-by: Justin Jung <jungjust@amazon.com> * Fix test Signed-off-by: Justin Jung <jungjust@amazon.com> * Update changelog Signed-off-by: Justin Jung <jungjust@amazon.com> * Update changelog Signed-off-by: Justin Jung <jungjust@amazon.com> * Nit Signed-off-by: Justin Jung <jungjust@amazon.com> * Nit Signed-off-by: Justin Jung <jungjust@amazon.com> --------- Signed-off-by: Justin Jung <jungjust@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Add timeout for dynamodb ring kv (#6544) * add dynamodb kv with timeout enforced Signed-off-by: yeya24 <benye@amazon.com> * add tests Signed-off-by: yeya24 <benye@amazon.com> * docs Signed-off-by: Ben Ye <benye@amazon.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: yeya24 <benye@amazon.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Bump the actions-dependencies group across 1 directory with 2 updates (#6564) Bumps the actions-dependencies group with 2 updates in the / directory: [github/codeql-action](https://github.com/github/codeql-action) and [actions/setup-go](https://github.com/actions/setup-go). Updates `github/codeql-action` from 3.28.1 to 3.28.7 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@b6a472f...6e54559) Updates `actions/setup-go` from 5.2.0 to 5.3.0 - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@3041bf5...f111f33) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions-dependencies - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Fix: expanded postings can cache wrong data when queries are issued "in the future" (#6562) * improve fuzz test for expanded postings cache Signed-off-by: alanprot <alanprot@gmail.com> * create more tests on the expanded postings cache Signed-off-by: alanprot <alanprot@gmail.com> * adding get series call on the test Signed-off-by: alanprot <alanprot@gmail.com> * no use CachedBlockChunkQuerier when query time range is completely after the last sample added in the head Signed-off-by: alanprot <alanprot@gmail.com> * adding comments Signed-off-by: alanprot <alanprot@gmail.com> * increase the number of fuzz test from 100 to 300 Signed-off-by: alanprot <alanprot@gmail.com> * add get series fuzzy testing Signed-off-by: alanprot <alanprot@gmail.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Extend ShuffleSharding on READONLY ingesters (#6517) * Filter readOnly ingesters when sharding Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> * Extend shard on READONLY Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> * Remove old code Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> * Fix test Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> * update changelog Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> --------- Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Create guide doc for partition compaction Signed-off-by: Alex Le <leqiyue@amazon.com> * Update docs/guides/partitioning-compactor.md Co-authored-by: Charlie Le <charlie_le@apple.com> Signed-off-by: Alex Le <emoc1989@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * updated doc Signed-off-by: Alex Le <leqiyue@amazon.com> * clean white space Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> Signed-off-by: Daniel Sabsay <sabsay@adobe.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Charlie Le <charlie_le@apple.com> Signed-off-by: yeya24 <benye@amazon.com> Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com> Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> Signed-off-by: Justin Jung <jungjust@amazon.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Alex Le <emoc1989@gmail.com> Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Daniel Sabsay <danielrsabsay@gmail.com> Co-authored-by: Daniel Sabsay <sabsay@adobe.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: SungJin1212 <tjdwls1201@gmail.com> Co-authored-by: Charlie Le <charlie_le@apple.com> Co-authored-by: Ben Ye <benye@amazon.com> Co-authored-by: Sam McBroom <86423878+sam-mcbr@users.noreply.github.com> Co-authored-by: Ahmed Hassan <57634502+afhassan@users.noreply.github.com> Co-authored-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com> Co-authored-by: Daniel Blando <daniel@blando.com.br> Co-authored-by: Justin Jung <jungjust@amazon.com>
* Purge expired postings cache items due inactivity (cortexproject#6502) * Purge expired postings cache items due inactivity Signed-off-by: alanprot <alanprot@gmail.com> * Fix comments Signed-off-by: alanprot <alanprot@gmail.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Update thanos to 4ba0ba403896 (cortexproject#6503) * Update thanos to 4ba0ba403896 Signed-off-by: Daniel Sabsay <sabsay@adobe.com> * run go mod vendor Signed-off-by: Daniel Sabsay <sabsay@adobe.com> --------- Signed-off-by: Daniel Sabsay <sabsay@adobe.com> Co-authored-by: Daniel Sabsay <sabsay@adobe.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Bump the actions-dependencies group across 1 directory with 2 updates (cortexproject#6505) Bumps the actions-dependencies group with 2 updates in the / directory: [actions/upload-artifact](https://github.com/actions/upload-artifact) and [github/codeql-action](https://github.com/github/codeql-action). Updates `actions/upload-artifact` from 4.5.0 to 4.6.0 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@6f51ac0...65c4c4a) Updates `github/codeql-action` from 3.28.0 to 3.28.1 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@48ab28a...b6a472f) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions-dependencies - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * calculate # of concurrency only once at the runner (cortexproject#6506) Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Implement partition compaction planner (cortexproject#6469) * Implement partition compaction grouper Signed-off-by: Alex Le <leqiyue@amazon.com> * fix comment Signed-off-by: Alex Le <leqiyue@amazon.com> * replace level 1 compaction limits with ingestion replication factor Signed-off-by: Alex Le <leqiyue@amazon.com> * fix doc Signed-off-by: Alex Le <leqiyue@amazon.com> * update compaction_visit_marker_timeout default value Signed-off-by: Alex Le <leqiyue@amazon.com> * update default value for compactor_partition_index_size_limit_in_bytes Signed-off-by: Alex Le <leqiyue@amazon.com> * refactor code Signed-off-by: Alex Le <leqiyue@amazon.com> * address comments and refactor Signed-off-by: Alex Le <leqiyue@amazon.com> * address comment Signed-off-by: Alex Le <leqiyue@amazon.com> * address comment Signed-off-by: Alex Le <leqiyue@amazon.com> * update config name Signed-off-by: Alex Le <leqiyue@amazon.com> * Implement partition compaction planner Signed-off-by: Alex Le <leqiyue@amazon.com> * fix after rebase Signed-off-by: Alex Le <leqiyue@amazon.com> * addressed comments Signed-off-by: Alex Le <leqiyue@amazon.com> * updated doc and refactored metric Signed-off-by: Alex Le <leqiyue@amazon.com> * fix test Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com> * Add max tenant config to tenant federation (cortexproject#6493) Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Add cleaner logic to clean partition compaction blocks and related files (cortexproject#6507) * Add cleaner logic to clean partition compaction blocks and related files Signed-off-by: Alex Le <leqiyue@amazon.com> * refactored metrics Signed-off-by: Alex Le <leqiyue@amazon.com> * refactor Signed-off-by: Alex Le <leqiyue@amazon.com> * update logs Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com> * Update RELEASE.md (cortexproject#6511) Maintainers would like an additional week to get the partition compactor changes in before the first release candidate for 1.19. Signed-off-by: Charlie Le <charlie_le@apple.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * update thanos version to 236777732278c64ca01c1c09d726f0f712c87164 (cortexproject#6514) Signed-off-by: yeya24 <benye@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Fix race that can cause nil reference when using expanded postings (cortexproject#6518) Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Add more op label values to cortex_query_frontend_queries_total metric (cortexproject#6519) Signed-off-by: Alex Le <leqiyue@amazon.com> * Allow use of non-dualstack endpoints for S3 blocks storage (cortexproject#6522) Signed-off-by: Alex Le <leqiyue@amazon.com> * Expose grpc client connect timeout config and default to 5s (cortexproject#6523) * expose grpc client connect timeout config Signed-off-by: yeya24 <benye@amazon.com> * changelog Signed-off-by: yeya24 <benye@amazon.com> --------- Signed-off-by: yeya24 <benye@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Hook up partition compaction end to end implementation (cortexproject#6510) * Implemented partition compaction end to end with custom compaction lifecycle Signed-off-by: Alex Le <leqiyue@amazon.com> * removed unused variable Signed-off-by: Alex Le <leqiyue@amazon.com> * tweak test Signed-off-by: Alex Le <leqiyue@amazon.com> * tweak test Signed-off-by: Alex Le <leqiyue@amazon.com> * refactor according to comments Signed-off-by: Alex Le <leqiyue@amazon.com> * tweak test Signed-off-by: Alex Le <leqiyue@amazon.com> * check context error inside sharded posting Signed-off-by: Alex Le <leqiyue@amazon.com> * fix lint Signed-off-by: Alex Le <leqiyue@amazon.com> * fix integration test for memberlist Signed-off-by: Alex Le <leqiyue@amazon.com> * make compactor initial wait cancellable Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com> * Test for nil on expire expanded postings (cortexproject#6521) * Test for nil on expire expanded postings Signed-off-by: alanprot <alanprot@gmail.com> * stopping ingester Signed-off-by: alanprot <alanprot@gmail.com> * refactor the test to not timeout Signed-off-by: alanprot <alanprot@gmail.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * log when a request starts running in querier (cortexproject#6525) * log when a request starts running in querier Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com> * log when a request starts running in querier for frontend processor Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com> --------- Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Update build image according to cortexproject@03a8f8c (cortexproject#6508) Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Deprecate -blocks-storage.tsdb.wal-compression-enabled flag Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Fix test (cortexproject#6537) Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Mark 1.19 release in progress https://github.com/cortexproject/cortex/blob/master/RELEASE.md#show-that-a-release-is-in-progress Signed-off-by: Charlie Le <charlie_le@apple.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Prepare 1.19.0-rc.0 Signed-off-by: Charlie Le <charlie_le@apple.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Revert "Prepare 1.19.0-rc.0" Signed-off-by: Alex Le <leqiyue@amazon.com> * Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup (cortexproject#6547) * Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup Signed-off-by: Alex Le <leqiyue@amazon.com> * update tests Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com> * Remove TransferChunks gRPC method (cortexproject#6543) Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Uupdate Ppromqlsmith (cortexproject#6557) Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Query Partial Data (cortexproject#6526) * Create partial_data Signed-off-by: Justin Jung <jungjust@amazon.com> * Fix lazyquery so that warning message is returned Signed-off-by: Justin Jung <jungjust@amazon.com> * Add QueryPartialData limit Signed-off-by: Justin Jung <jungjust@amazon.com> * Fix broken mock Signed-off-by: Justin Jung <jungjust@amazon.com> * Make response with warnings to be not cached Signed-off-by: Justin Jung <jungjust@amazon.com> * Updated streamingSelect in distributor_queryable Signed-off-by: Justin Jung <jungjust@amazon.com> * Update query.go Signed-off-by: Justin Jung <jungjust@amazon.com> * Update replication_set Signed-off-by: Justin Jung <jungjust@amazon.com> * Lint Signed-off-by: Justin Jung <jungjust@amazon.com> * Lint again Signed-off-by: Justin Jung <jungjust@amazon.com> * Generated doc Signed-off-by: Justin Jung <jungjust@amazon.com> * Changelog Signed-off-by: Justin Jung <jungjust@amazon.com> * Update config description Signed-off-by: Justin Jung <jungjust@amazon.com> * Do not remove warnings from seriesSet Signed-off-by: Justin Jung <jungjust@amazon.com> * Avoid cache only if the warning message contains partial data error Signed-off-by: Justin Jung <jungjust@amazon.com> * Remove context usage for partial data Signed-off-by: Justin Jung <jungjust@amazon.com> * Refactor how partial data info is passed + apply to series and label methods as well Signed-off-by: Justin Jung <jungjust@amazon.com> * Lint + fix tests Signed-off-by: Justin Jung <jungjust@amazon.com> * Fix build Signed-off-by: Justin Jung <jungjust@amazon.com> * Create separate config for ruler partial data Signed-off-by: Justin Jung <jungjust@amazon.com> * Genereta doc Signed-off-by: Justin Jung <jungjust@amazon.com> * Add more tests Signed-off-by: Justin Jung <jungjust@amazon.com> * Change error Signed-off-by: Justin Jung <jungjust@amazon.com> * Fix test Signed-off-by: Justin Jung <jungjust@amazon.com> * Update changelog Signed-off-by: Justin Jung <jungjust@amazon.com> * Update changelog Signed-off-by: Justin Jung <jungjust@amazon.com> * Nit Signed-off-by: Justin Jung <jungjust@amazon.com> * Nit Signed-off-by: Justin Jung <jungjust@amazon.com> --------- Signed-off-by: Justin Jung <jungjust@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Add timeout for dynamodb ring kv (cortexproject#6544) * add dynamodb kv with timeout enforced Signed-off-by: yeya24 <benye@amazon.com> * add tests Signed-off-by: yeya24 <benye@amazon.com> * docs Signed-off-by: Ben Ye <benye@amazon.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: yeya24 <benye@amazon.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Bump the actions-dependencies group across 1 directory with 2 updates (cortexproject#6564) Bumps the actions-dependencies group with 2 updates in the / directory: [github/codeql-action](https://github.com/github/codeql-action) and [actions/setup-go](https://github.com/actions/setup-go). Updates `github/codeql-action` from 3.28.1 to 3.28.7 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@b6a472f...6e54559) Updates `actions/setup-go` from 5.2.0 to 5.3.0 - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@3041bf5...f111f33) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions-dependencies - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Fix: expanded postings can cache wrong data when queries are issued "in the future" (cortexproject#6562) * improve fuzz test for expanded postings cache Signed-off-by: alanprot <alanprot@gmail.com> * create more tests on the expanded postings cache Signed-off-by: alanprot <alanprot@gmail.com> * adding get series call on the test Signed-off-by: alanprot <alanprot@gmail.com> * no use CachedBlockChunkQuerier when query time range is completely after the last sample added in the head Signed-off-by: alanprot <alanprot@gmail.com> * adding comments Signed-off-by: alanprot <alanprot@gmail.com> * increase the number of fuzz test from 100 to 300 Signed-off-by: alanprot <alanprot@gmail.com> * add get series fuzzy testing Signed-off-by: alanprot <alanprot@gmail.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Extend ShuffleSharding on READONLY ingesters (cortexproject#6517) * Filter readOnly ingesters when sharding Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> * Extend shard on READONLY Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> * Remove old code Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> * Fix test Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> * update changelog Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> --------- Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Create guide doc for partition compaction Signed-off-by: Alex Le <leqiyue@amazon.com> * Update docs/guides/partitioning-compactor.md Co-authored-by: Charlie Le <charlie_le@apple.com> Signed-off-by: Alex Le <emoc1989@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * updated doc Signed-off-by: Alex Le <leqiyue@amazon.com> * clean white space Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> Signed-off-by: Daniel Sabsay <sabsay@adobe.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: SungJin1212 <tjdwls1201@gmail.com> Signed-off-by: Charlie Le <charlie_le@apple.com> Signed-off-by: yeya24 <benye@amazon.com> Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com> Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Daniel Deluiggi <ddeluigg@amazon.com> Signed-off-by: Justin Jung <jungjust@amazon.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Alex Le <emoc1989@gmail.com> Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Daniel Sabsay <danielrsabsay@gmail.com> Co-authored-by: Daniel Sabsay <sabsay@adobe.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: SungJin1212 <tjdwls1201@gmail.com> Co-authored-by: Charlie Le <charlie_le@apple.com> Co-authored-by: Ben Ye <benye@amazon.com> Co-authored-by: Sam McBroom <86423878+sam-mcbr@users.noreply.github.com> Co-authored-by: Ahmed Hassan <57634502+afhassan@users.noreply.github.com> Co-authored-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com> Co-authored-by: Daniel Blando <daniel@blando.com.br> Co-authored-by: Justin Jung <jungjust@amazon.com>
What this PR does:
Implement partitioning compaction related lifecycle functions to make partitioning compaction end to end working.
PartitionCompactionBlockDeletableChecker
makes sure no parent blocks got deleted after each compaction. Cleaner would handle parent blocks clean up for partitioning compaction.ShardedBlockPopulator
would useShardedPosting
to including particular series in the result block.ShardedCompactionLifecycleCallback
is used to emit partitioning compaction metrics at beginning and end of compaction. It also initializeShardedBlockPopulator
for each compaction.Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]