Optimize recompression for non-segmentby chunks #7632

kpan2034 · 2025-01-28T23:00:45Z

Enables the segmentwise recompression flow to be used for chunks without segmentby columns.

This should be more performant than doing a full recompression.

codecov · 2025-01-28T23:21:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.94%. Comparing base (59f50f2) to head (d75ab39).
Report is 765 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7632      +/-   ##
==========================================
+ Coverage   80.06%   81.94%   +1.87%     
==========================================
  Files         190      246      +56     
  Lines       37181    45112    +7931     
  Branches     9450    11254    +1804     
==========================================
+ Hits        29770    36965    +7195     
- Misses       2997     3727     +730     
- Partials     4414     4420       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

antekresic · 2025-01-30T11:59:37Z

tsl/src/compression/recompress.c

@@ -168,6 +168,10 @@ recompress_chunk_segmentwise_impl(Chunk *uncompressed_chunk)

 	CompressedSegmentInfo *current_segment = palloc0(sizeof(CompressedSegmentInfo) * n_keys);

+	// For chunks with no segmentby settings, we can still do segmentwise recompression
+	// The entire chunk is treated as a single segment
+	elog(ts_guc_debug_compression_path_info ? INFO : DEBUG1, "using non-segmentby index for recompression") ;


This will log every time that you are using non-segmentby index but thats not true. You should log the index name you are using instead (its easy to check what index is being used that way).

tsl/test/expected/recompress_chunk_segmentwise.out

antekresic · 2025-01-30T12:02:03Z

tsl/test/sql/recompress_chunk_segmentwise.sql

@@ -291,6 +291,23 @@ insert into nullseg_many values (:'start_time', 1, NULL, NULL);
 SELECT compress_chunk(:'chunk_to_compress');
 select * from :compressed_chunk_name;

+-- Test behaviour when no segmentby columns are present
+SET timescaledb.debug_compression_path_info TO ON;


Lets enable this GUC for the complete test so we can verify that the correct index is being used for each recompression.

svenklemm · 2025-02-10T15:04:39Z

Why does the chunk numbering change in all those tests? Is it because we did full decompress/compress before?

svenklemm · 2025-02-10T15:09:38Z

.unreleased/pr_7632

@@ -0,0 +1 @@
+Implements: #7632 Optimize recompression for non-segmentby chunks


Suggested change

Implements: #7632 Optimize recompression for non-segmentby chunks

Implements: #7632 Optimize recompression for chunks without segmentby

antekresic

🎉

antekresic · 2025-02-12T11:13:42Z

tsl/src/compression/recompress.c

@@ -210,6 +213,12 @@ recompress_chunk_segmentwise_impl(Chunk *uncompressed_chunk)
 						true /*need_bistate*/,
 						0 /*insert options*/);

+	// For chunks with no segmentby settings, we can still do segmentwise recompression


For the sake of consistency, lets use multiline comment style.

Enables the segmentwise recompression flow to be used for chunks without segmentby columns. This should be more performant than doing a full recompression.

@bjornuppeke

## 2.19.0 (2025-03-12) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com>

@bjornuppeke

## 2.19.0 (2025-03-18) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. * Improved concurrency of INSERT, UPDATE and DELETE operations on the columnstore by no longer blocking DML statements during the recompression of a chunk. * Improved system performance during Continuous Aggregates refreshes by breaking them into smaller batches which reduces systems pressure and minimizes the risk of spilling to disk. * Faster and more up-to-date results for queries against Continuous Aggregates by materializing the most recent data first (vs old data first in prior versions). * Faster analytical queries with SIMD vectorization of aggregations over text columns and group by over multiple column * Enable optimizing chunk size for faster query performance on the columnstore by adding support for merging columnstore chunks to the merge_chunk API. **Deprecation warning** This is the last minor release supporting PostgreSQL 14. Starting with the minor version of TimescaleDB only Postgres 15, 16 and 17 will be supported. **Downgrading of 2.19.0** This release introduces custom bool compression, if you enable this feature via the `enable_bool_compression` and must downgrade to a previous, please use the [following script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql) to convert the columns back to their previous state. TimescaleDB versions prior to 2.19.0 do not know how to handle this new type. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **GUCs** * `enable_bool_compression`: enable the BOOL compression algorithm, default: `OFF` * `enable_exclusive_locking_recompression`: enable exclusive locking during recompression (legacy mode), default: `OFF` **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT --------- Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com> Signed-off-by: Ramon Guiu <ramon@timescale.com> Co-authored-by: Ramon Guiu <ramon@timescale.com>

kpan2034 requested review from antekresic and svenklemm January 28, 2025 23:00

github-actions bot assigned kpan2034 Jan 28, 2025

antekresic reviewed Jan 30, 2025

View reviewed changes

kpan2034 force-pushed the optimize-non-segmentby branch from 5ab9b12 to 151bbf9 Compare February 5, 2025 00:05

kpan2034 requested a review from antekresic February 5, 2025 00:05

kpan2034 force-pushed the optimize-non-segmentby branch from 151bbf9 to 26e2e0b Compare February 6, 2025 23:12

kpan2034 marked this pull request as ready for review February 6, 2025 23:12

svenklemm reviewed Feb 10, 2025

View reviewed changes

svenklemm approved these changes Feb 10, 2025

View reviewed changes

antekresic approved these changes Feb 12, 2025

View reviewed changes

kpan2034 force-pushed the optimize-non-segmentby branch from 26e2e0b to f75ef02 Compare February 14, 2025 15:55

kpan2034 enabled auto-merge (squash) February 14, 2025 15:55

Optimize recompression for non-segmentby chunks

d75ab39

Enables the segmentwise recompression flow to be used for chunks without segmentby columns. This should be more performant than doing a full recompression.

kpan2034 force-pushed the optimize-non-segmentby branch from f75ef02 to d75ab39 Compare February 14, 2025 20:42

kpan2034 merged commit 9b499aa into timescale:main Feb 14, 2025
47 checks passed

This was referenced Mar 12, 2025

CHANGELOG for 2.19.0 #7824

Closed

CHANGELOG for 2.19.0 #7829

Merged

bayandin mentioned this pull request Mar 21, 2025

timescaledb 2.19.0 bayandin/homebrew-tap#255

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize recompression for non-segmentby chunks #7632

Optimize recompression for non-segmentby chunks #7632

kpan2034 commented Jan 28, 2025

codecov bot commented Jan 28, 2025 •

edited

Loading

antekresic Jan 30, 2025

antekresic Jan 30, 2025

svenklemm commented Feb 10, 2025

svenklemm Feb 10, 2025

antekresic left a comment

antekresic Feb 12, 2025

		@@ -0,0 +1 @@
		Implements: #7632 Optimize recompression for non-segmentby chunks

Optimize recompression for non-segmentby chunks #7632

Optimize recompression for non-segmentby chunks #7632

Conversation

kpan2034 commented Jan 28, 2025

codecov bot commented Jan 28, 2025 • edited Loading

Codecov Report

antekresic Jan 30, 2025

Choose a reason for hiding this comment

antekresic Jan 30, 2025

Choose a reason for hiding this comment

svenklemm commented Feb 10, 2025

svenklemm Feb 10, 2025

Choose a reason for hiding this comment

antekresic left a comment

Choose a reason for hiding this comment

antekresic Feb 12, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 28, 2025 •

edited

Loading