Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bool compression (#7233) #7701

Merged
merged 1 commit into from
Feb 17, 2025

Conversation

dbeck
Copy link
Contributor

@dbeck dbeck commented Feb 12, 2025

Reusing the existing Simple8bRLE algorithm for bools. I added a new
compression type specifically for this case called 'bool'.

A new GUC is introduced so we can revert to the previous, array
compression for bools: timescaledb.enable_bool_compression.
It defaults to false.

To enable bool compression set the GUC:

timescaledb.enable_bool_compression=true

Fixes #7233

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 81.87311% with 60 lines in your changes missing coverage. Please review.

Project coverage is 81.93%. Comparing base (59f50f2) to head (d9e4d1b).
Report is 769 commits behind head on main.

Files with missing lines Patch % Lines
tsl/test/src/compression_unit_test.c 77.20% 0 Missing and 31 partials ⚠️
tsl/src/compression/algorithms/bool_compress.c 86.66% 4 Missing and 20 partials ⚠️
tsl/src/compression/compression.c 70.00% 3 Missing ⚠️
tsl/test/src/compression_sql_test.c 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7701      +/-   ##
==========================================
+ Coverage   80.06%   81.93%   +1.86%     
==========================================
  Files         190      247      +57     
  Lines       37181    45447    +8266     
  Branches     9450    11355    +1905     
==========================================
+ Hits        29770    37235    +7465     
- Misses       2997     3738     +741     
- Partials     4414     4474      +60     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

{
BoolDecompressionIterator *iterator = palloc(sizeof(*iterator));
decompression_iterator_init(iterator,
(void *) PG_DETOAST_DATUM(bool_compressed),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make detoasting the caller's responsibility? We have some faster functions for that in detoaster.c. You can change it to Assert(VARATT_IS_4B_U(bool_compresed)) -- probably, I forgot the TOAST types again...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make this the caller's responsibility, then it will be inconsistent with the other compression algorithms. Do you want to change them too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised a separate issue to address this across all compression algorithms: #7716

{
CompressedDataHeaderFields;
uint8 has_nulls; /* 1 if this has a NULLs bitmap after the values, 0 otherwise */
uint8 padding[2];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other compression algorithms had this alignment to 8 bytes. Do we not need it?

{
compressor->has_nulls = true;
simple8brle_compressor_append(&compressor->values, compressor->last_value);
simple8brle_compressor_append(&compressor->nulls, 1);
Copy link
Member

@akuzm akuzm Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're adding 1 for a null value, so this means you'll have to negate it after decompression to match the Arrow format? Let's switch to validity bitmap instead, so that we decompress straight to Arrow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it is in the other compression algorithms, but happy to change this.

Comment on lines +171 to +172
.bool_compressor_append = tsl_bool_compressor_append,
.bool_compressor_finish = tsl_bool_compressor_finish,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need them in cross-module functions, they shouldn't be called from the Apache part, right? For testing, you can add the TSL functions directly like it's done for ts_read_compressed_data_file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is again for consistency with the other compression algos.

@dbeck dbeck force-pushed the bool-compress-reindent branch 2 times, most recently from e9372f0 to d566381 Compare February 17, 2025 13:51
Copy link
Member

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think we have discussed all the important points here and on Slack, so let's proceed with it.

@dbeck dbeck force-pushed the bool-compress-reindent branch 3 times, most recently from ebc550e to c756e51 Compare February 17, 2025 15:10
@dbeck dbeck force-pushed the bool-compress-reindent branch 2 times, most recently from 8333066 to 032cc4a Compare February 17, 2025 15:59
@dbeck dbeck force-pushed the bool-compress-reindent branch from 032cc4a to 76d7bb8 Compare February 17, 2025 16:21
@dbeck dbeck enabled auto-merge (rebase) February 17, 2025 17:40
Reusing the existing Simple8bRLE algorithm for bools. I added a new
compression type specifically for this case called 'bool'.

A new GUC is introduced so we can revert to the previous, array
compression for bools: `timescaledb.enable_bool_compression`.
It defaults to `false`.

To enable bool compression set the GUC:

`timescaledb.enable_bool_compression=true`

Fixes timescale#7233
@dbeck dbeck force-pushed the bool-compress-reindent branch from 76d7bb8 to d9e4d1b Compare February 17, 2025 18:35
@dbeck dbeck merged commit d7a8b4b into timescale:main Feb 17, 2025
67 checks passed
philkra added a commit that referenced this pull request Mar 12, 2025
## 2.19.0 (2025-03-12)

This release contains performance improvements and bug fixes since 
the 2.18.2 release. We recommend that you upgrade at the next 
available opportunity.

**Features**
* [#7586](#7586) Vectorized aggregation with grouping by a single text column.
* [#7632](#7632) Optimize recompression for chunks without segmentby
* [#7655](#7655) Support vectorized aggregation on Hypercore TAM
* [#7669](#7669) Add support for merging compressed chunks
* [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on.
* [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks
* [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses
* [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration
* [#7788](#7788) Add callback to mem_guard for background workers
* [#7789](#7789) Do not recompress segmentwise when default order by is empty
* [#7790](#7790) Add configurable Incremental CAgg Refresh Policy

**Bugfixes**
* [#7665](#7665) Block merging of frozen chunks
* [#7673](#7673) Don't abort additional INSERTs when hitting first conflict
* [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order.
* [#7747](#7747) Block TAM rewrites with incompatible GUC setting
* [#7748](#7748) Crash in the segmentwise recompression
* [#7764](#7764) Fix compression settings handling in Hypercore TAM
* [#7768](#7768) Remove costing index scan of hypertable parent
* [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE

**Thanks**
* @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks
* @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT

Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com>
This was referenced Mar 12, 2025
philkra added a commit that referenced this pull request Mar 18, 2025
## 2.19.0 (2025-03-18)

This release contains performance improvements and bug fixes since the
2.18.2 release. We recommend that you upgrade at the next available
opportunity.

* Improved concurrency of INSERT, UPDATE and DELETE operations on the
columnstore by no longer blocking DML statements during the
recompression of a chunk.
* Improved system performance during Continuous Aggregates refreshes by
breaking them into smaller batches which reduces systems pressure and
minimizes the risk of spilling to disk.
* Faster and more up-to-date results for queries against Continuous
Aggregates by materializing the most recent data first (vs old data
first in prior versions).
* Faster analytical queries with SIMD vectorization of aggregations over
text columns and group by over multiple column
* Enable optimizing chunk size for faster query performance on the
columnstore by adding support for merging columnstore chunks to the
merge_chunk API.

**Deprecation warning**

This is the last minor release supporting PostgreSQL 14. Starting with
the minor version of TimescaleDB only Postgres 15, 16 and 17 will be
supported.

**Downgrading of 2.19.0**

This release introduces custom bool compression, if you enable this
feature via the `enable_bool_compression` and must downgrade to a
previous, please use the [following
script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql)
to convert the columns back to their previous state. TimescaleDB
versions prior to 2.19.0 do not know how to handle this new type.

**Features**
* [#7586](#7586) Vectorized
aggregation with grouping by a single text column.
* [#7632](#7632) Optimize
recompression for chunks without segmentby
* [#7655](#7655) Support
vectorized aggregation on Hypercore TAM
* [#7669](#7669) Add
support for merging compressed chunks
* [#7701](#7701) Implement
a custom compression algorithm for bool columns. It is experimental and
can undergo backwards-incompatible changes. For testing, enable it using
timescaledb.enable_bool_compression = on.
* [#7707](#7707) Support
ALTER COLUMN SET NOT NULL on compressed chunks
* [#7765](#7765) Allow tsdb
as alias for timescaledb in WITH and SET clauses
* [#7786](#7786) Show
warning for inefficient compress_chunk_time_interval configuration
* [#7788](#7788) Add
callback to mem_guard for background workers
* [#7789](#7789) Do not
recompress segmentwise when default order by is empty
* [#7790](#7790) Add
configurable Incremental CAgg Refresh Policy

**Bugfixes**
* [#7665](#7665) Block
merging of frozen chunks
* [#7673](#7673) Don't
abort additional INSERTs when hitting first conflict
* [#7714](#7714) Fixes a
wrong result when compressed NULL values were confused with default
values. This happened in very special circumstances with alter table
added a new column with a default value, an update and compression in a
very particular order.
* [#7747](#7747) Block TAM
rewrites with incompatible GUC setting
* [#7748](#7748) Crash in
the segmentwise recompression
* [#7764](#7764) Fix
compression settings handling in Hypercore TAM
* [#7768](#7768) Remove
costing index scan of hypertable parent
* [#7799](#7799) Handle
DEFAULT table access name in ALTER TABLE

**GUCs**
* `enable_bool_compression`: enable the BOOL compression algorithm,
default: `OFF`
* `enable_exclusive_locking_recompression`: enable exclusive locking
during recompression (legacy mode), default: `OFF`

**Thanks**
* @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT
DO NOTHING on compressed chunks
* @kav23alex for reporting a segmentation fault on ALTER TABLE with
DEFAULT

---------

Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com>
Signed-off-by: Ramon Guiu <ramon@timescale.com>
Co-authored-by: Ramon Guiu <ramon@timescale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement bool compression
3 participants