-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize recompression for non-segmentby chunks #7632
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7632 +/- ##
==========================================
+ Coverage 80.06% 81.94% +1.87%
==========================================
Files 190 246 +56
Lines 37181 45112 +7931
Branches 9450 11254 +1804
==========================================
+ Hits 29770 36965 +7195
- Misses 2997 3727 +730
- Partials 4414 4420 +6 ☔ View full report in Codecov by Sentry. |
tsl/src/compression/recompress.c
Outdated
@@ -168,6 +168,10 @@ recompress_chunk_segmentwise_impl(Chunk *uncompressed_chunk) | |||
|
|||
CompressedSegmentInfo *current_segment = palloc0(sizeof(CompressedSegmentInfo) * n_keys); | |||
|
|||
// For chunks with no segmentby settings, we can still do segmentwise recompression | |||
// The entire chunk is treated as a single segment | |||
elog(ts_guc_debug_compression_path_info ? INFO : DEBUG1, "using non-segmentby index for recompression") ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will log every time that you are using non-segmentby index but thats not true. You should log the index name you are using instead (its easy to check what index is being used that way).
@@ -291,6 +291,23 @@ insert into nullseg_many values (:'start_time', 1, NULL, NULL); | |||
SELECT compress_chunk(:'chunk_to_compress'); | |||
select * from :compressed_chunk_name; | |||
|
|||
-- Test behaviour when no segmentby columns are present | |||
SET timescaledb.debug_compression_path_info TO ON; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets enable this GUC for the complete test so we can verify that the correct index is being used for each recompression.
5ab9b12
to
151bbf9
Compare
151bbf9
to
26e2e0b
Compare
Why does the chunk numbering change in all those tests? Is it because we did full decompress/compress before? |
.unreleased/pr_7632
Outdated
@@ -0,0 +1 @@ | |||
Implements: #7632 Optimize recompression for non-segmentby chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implements: #7632 Optimize recompression for non-segmentby chunks | |
Implements: #7632 Optimize recompression for chunks without segmentby |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
tsl/src/compression/recompress.c
Outdated
@@ -210,6 +213,12 @@ recompress_chunk_segmentwise_impl(Chunk *uncompressed_chunk) | |||
true /*need_bistate*/, | |||
0 /*insert options*/); | |||
|
|||
// For chunks with no segmentby settings, we can still do segmentwise recompression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of consistency, lets use multiline comment style.
26e2e0b
to
f75ef02
Compare
Enables the segmentwise recompression flow to be used for chunks without segmentby columns. This should be more performant than doing a full recompression.
f75ef02
to
d75ab39
Compare
## 2.19.0 (2025-03-12) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com>
## 2.19.0 (2025-03-18) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. * Improved concurrency of INSERT, UPDATE and DELETE operations on the columnstore by no longer blocking DML statements during the recompression of a chunk. * Improved system performance during Continuous Aggregates refreshes by breaking them into smaller batches which reduces systems pressure and minimizes the risk of spilling to disk. * Faster and more up-to-date results for queries against Continuous Aggregates by materializing the most recent data first (vs old data first in prior versions). * Faster analytical queries with SIMD vectorization of aggregations over text columns and group by over multiple column * Enable optimizing chunk size for faster query performance on the columnstore by adding support for merging columnstore chunks to the merge_chunk API. **Deprecation warning** This is the last minor release supporting PostgreSQL 14. Starting with the minor version of TimescaleDB only Postgres 15, 16 and 17 will be supported. **Downgrading of 2.19.0** This release introduces custom bool compression, if you enable this feature via the `enable_bool_compression` and must downgrade to a previous, please use the [following script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql) to convert the columns back to their previous state. TimescaleDB versions prior to 2.19.0 do not know how to handle this new type. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **GUCs** * `enable_bool_compression`: enable the BOOL compression algorithm, default: `OFF` * `enable_exclusive_locking_recompression`: enable exclusive locking during recompression (legacy mode), default: `OFF` **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT --------- Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com> Signed-off-by: Ramon Guiu <ramon@timescale.com> Co-authored-by: Ramon Guiu <ramon@timescale.com>
Enables the segmentwise recompression flow to be used for chunks without segmentby columns.
This should be more performant than doing a full recompression.