Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize recompression for non-segmentby chunks #7632

Merged
merged 1 commit into from
Feb 14, 2025

Conversation

kpan2034
Copy link
Contributor

Enables the segmentwise recompression flow to be used for chunks without segmentby columns.

This should be more performant than doing a full recompression.

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.94%. Comparing base (59f50f2) to head (d75ab39).
Report is 765 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7632      +/-   ##
==========================================
+ Coverage   80.06%   81.94%   +1.87%     
==========================================
  Files         190      246      +56     
  Lines       37181    45112    +7931     
  Branches     9450    11254    +1804     
==========================================
+ Hits        29770    36965    +7195     
- Misses       2997     3727     +730     
- Partials     4414     4420       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -168,6 +168,10 @@ recompress_chunk_segmentwise_impl(Chunk *uncompressed_chunk)

CompressedSegmentInfo *current_segment = palloc0(sizeof(CompressedSegmentInfo) * n_keys);

// For chunks with no segmentby settings, we can still do segmentwise recompression
// The entire chunk is treated as a single segment
elog(ts_guc_debug_compression_path_info ? INFO : DEBUG1, "using non-segmentby index for recompression") ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will log every time that you are using non-segmentby index but thats not true. You should log the index name you are using instead (its easy to check what index is being used that way).

@@ -291,6 +291,23 @@ insert into nullseg_many values (:'start_time', 1, NULL, NULL);
SELECT compress_chunk(:'chunk_to_compress');
select * from :compressed_chunk_name;

-- Test behaviour when no segmentby columns are present
SET timescaledb.debug_compression_path_info TO ON;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets enable this GUC for the complete test so we can verify that the correct index is being used for each recompression.

@kpan2034 kpan2034 force-pushed the optimize-non-segmentby branch from 5ab9b12 to 151bbf9 Compare February 5, 2025 00:05
@kpan2034 kpan2034 requested a review from antekresic February 5, 2025 00:05
@kpan2034 kpan2034 force-pushed the optimize-non-segmentby branch from 151bbf9 to 26e2e0b Compare February 6, 2025 23:12
@kpan2034 kpan2034 marked this pull request as ready for review February 6, 2025 23:12
@svenklemm
Copy link
Member

Why does the chunk numbering change in all those tests? Is it because we did full decompress/compress before?

@@ -0,0 +1 @@
Implements: #7632 Optimize recompression for non-segmentby chunks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implements: #7632 Optimize recompression for non-segmentby chunks
Implements: #7632 Optimize recompression for chunks without segmentby

Copy link
Contributor

@antekresic antekresic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@@ -210,6 +213,12 @@ recompress_chunk_segmentwise_impl(Chunk *uncompressed_chunk)
true /*need_bistate*/,
0 /*insert options*/);

// For chunks with no segmentby settings, we can still do segmentwise recompression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of consistency, lets use multiline comment style.

@kpan2034 kpan2034 force-pushed the optimize-non-segmentby branch from 26e2e0b to f75ef02 Compare February 14, 2025 15:55
@kpan2034 kpan2034 enabled auto-merge (squash) February 14, 2025 15:55
Enables the segmentwise recompression flow to be used for chunks without
segmentby columns.

This should be more performant than doing a full recompression.
@kpan2034 kpan2034 force-pushed the optimize-non-segmentby branch from f75ef02 to d75ab39 Compare February 14, 2025 20:42
@kpan2034 kpan2034 merged commit 9b499aa into timescale:main Feb 14, 2025
47 checks passed
philkra added a commit that referenced this pull request Mar 12, 2025
## 2.19.0 (2025-03-12)

This release contains performance improvements and bug fixes since 
the 2.18.2 release. We recommend that you upgrade at the next 
available opportunity.

**Features**
* [#7586](#7586) Vectorized aggregation with grouping by a single text column.
* [#7632](#7632) Optimize recompression for chunks without segmentby
* [#7655](#7655) Support vectorized aggregation on Hypercore TAM
* [#7669](#7669) Add support for merging compressed chunks
* [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on.
* [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks
* [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses
* [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration
* [#7788](#7788) Add callback to mem_guard for background workers
* [#7789](#7789) Do not recompress segmentwise when default order by is empty
* [#7790](#7790) Add configurable Incremental CAgg Refresh Policy

**Bugfixes**
* [#7665](#7665) Block merging of frozen chunks
* [#7673](#7673) Don't abort additional INSERTs when hitting first conflict
* [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order.
* [#7747](#7747) Block TAM rewrites with incompatible GUC setting
* [#7748](#7748) Crash in the segmentwise recompression
* [#7764](#7764) Fix compression settings handling in Hypercore TAM
* [#7768](#7768) Remove costing index scan of hypertable parent
* [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE

**Thanks**
* @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks
* @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT

Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com>
This was referenced Mar 12, 2025
philkra added a commit that referenced this pull request Mar 18, 2025
## 2.19.0 (2025-03-18)

This release contains performance improvements and bug fixes since the
2.18.2 release. We recommend that you upgrade at the next available
opportunity.

* Improved concurrency of INSERT, UPDATE and DELETE operations on the
columnstore by no longer blocking DML statements during the
recompression of a chunk.
* Improved system performance during Continuous Aggregates refreshes by
breaking them into smaller batches which reduces systems pressure and
minimizes the risk of spilling to disk.
* Faster and more up-to-date results for queries against Continuous
Aggregates by materializing the most recent data first (vs old data
first in prior versions).
* Faster analytical queries with SIMD vectorization of aggregations over
text columns and group by over multiple column
* Enable optimizing chunk size for faster query performance on the
columnstore by adding support for merging columnstore chunks to the
merge_chunk API.

**Deprecation warning**

This is the last minor release supporting PostgreSQL 14. Starting with
the minor version of TimescaleDB only Postgres 15, 16 and 17 will be
supported.

**Downgrading of 2.19.0**

This release introduces custom bool compression, if you enable this
feature via the `enable_bool_compression` and must downgrade to a
previous, please use the [following
script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql)
to convert the columns back to their previous state. TimescaleDB
versions prior to 2.19.0 do not know how to handle this new type.

**Features**
* [#7586](#7586) Vectorized
aggregation with grouping by a single text column.
* [#7632](#7632) Optimize
recompression for chunks without segmentby
* [#7655](#7655) Support
vectorized aggregation on Hypercore TAM
* [#7669](#7669) Add
support for merging compressed chunks
* [#7701](#7701) Implement
a custom compression algorithm for bool columns. It is experimental and
can undergo backwards-incompatible changes. For testing, enable it using
timescaledb.enable_bool_compression = on.
* [#7707](#7707) Support
ALTER COLUMN SET NOT NULL on compressed chunks
* [#7765](#7765) Allow tsdb
as alias for timescaledb in WITH and SET clauses
* [#7786](#7786) Show
warning for inefficient compress_chunk_time_interval configuration
* [#7788](#7788) Add
callback to mem_guard for background workers
* [#7789](#7789) Do not
recompress segmentwise when default order by is empty
* [#7790](#7790) Add
configurable Incremental CAgg Refresh Policy

**Bugfixes**
* [#7665](#7665) Block
merging of frozen chunks
* [#7673](#7673) Don't
abort additional INSERTs when hitting first conflict
* [#7714](#7714) Fixes a
wrong result when compressed NULL values were confused with default
values. This happened in very special circumstances with alter table
added a new column with a default value, an update and compression in a
very particular order.
* [#7747](#7747) Block TAM
rewrites with incompatible GUC setting
* [#7748](#7748) Crash in
the segmentwise recompression
* [#7764](#7764) Fix
compression settings handling in Hypercore TAM
* [#7768](#7768) Remove
costing index scan of hypertable parent
* [#7799](#7799) Handle
DEFAULT table access name in ALTER TABLE

**GUCs**
* `enable_bool_compression`: enable the BOOL compression algorithm,
default: `OFF`
* `enable_exclusive_locking_recompression`: enable exclusive locking
during recompression (legacy mode), default: `OFF`

**Thanks**
* @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT
DO NOTHING on compressed chunks
* @kav23alex for reporting a segmentation fault on ALTER TABLE with
DEFAULT

---------

Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com>
Signed-off-by: Ramon Guiu <ramon@timescale.com>
Co-authored-by: Ramon Guiu <ramon@timescale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants