Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bool compression (#7233) #7701

Merged
merged 1 commit into from
Feb 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .unreleased/pr_7701
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Implements: #7701 Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on.
3 changes: 3 additions & 0 deletions sql/updates/latest-dev.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ CREATE FUNCTION _timescaledb_functions.compressed_data_has_nulls(_timescaledb_in
RETURNS BOOL
LANGUAGE C STRICT IMMUTABLE
AS '@MODULE_PATHNAME@', 'ts_update_placeholder';

INSERT INTO _timescaledb_catalog.compression_algorithm( id, version, name, description) values
( 5, 1, 'COMPRESSION_ALGORITHM_BOOL', 'bool');
1 change: 1 addition & 0 deletions sql/updates/reverse-dev.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ ALTER TABLE _timescaledb_internal.bgw_job_stat_history

DROP FUNCTION IF EXISTS _timescaledb_functions.compressed_data_has_nulls(_timescaledb_internal.compressed_data);

DELETE FROM _timescaledb_catalog.compression_algorithm WHERE id = 5 AND version = 1 AND name = 'COMPRESSION_ALGORITHM_BOOL';
4 changes: 4 additions & 0 deletions src/cross_module_fn.c
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ CROSSMODULE_WRAPPER(dictionary_compressor_append);
CROSSMODULE_WRAPPER(dictionary_compressor_finish);
CROSSMODULE_WRAPPER(array_compressor_append);
CROSSMODULE_WRAPPER(array_compressor_finish);
CROSSMODULE_WRAPPER(bool_compressor_append);
CROSSMODULE_WRAPPER(bool_compressor_finish);
CROSSMODULE_WRAPPER(create_compressed_chunk);
CROSSMODULE_WRAPPER(compress_chunk);
CROSSMODULE_WRAPPER(decompress_chunk);
Expand Down Expand Up @@ -419,6 +421,8 @@ TSDLLEXPORT CrossModuleFunctions ts_cm_functions_default = {
.dictionary_compressor_finish = error_no_default_fn_pg_community,
.array_compressor_append = error_no_default_fn_pg_community,
.array_compressor_finish = error_no_default_fn_pg_community,
.bool_compressor_append = error_no_default_fn_pg_community,
.bool_compressor_finish = error_no_default_fn_pg_community,
.hypercore_handler = process_hypercore_handler,
.hypercore_proxy_handler = process_hypercore_proxy_handler,
.is_compressed_tid = error_no_default_fn_pg_community,
Expand Down
2 changes: 2 additions & 0 deletions src/cross_module_fn.h
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,8 @@ typedef struct CrossModuleFunctions
PGFunction dictionary_compressor_finish;
PGFunction array_compressor_append;
PGFunction array_compressor_finish;
PGFunction bool_compressor_append;
PGFunction bool_compressor_finish;
PGFunction hypercore_handler;
PGFunction hypercore_proxy_handler;
PGFunction is_compressed_tid;
Expand Down
12 changes: 12 additions & 0 deletions src/guc.c
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ TSDLLEXPORT bool ts_guc_auto_sparse_indexes = true;
TSDLLEXPORT bool ts_guc_default_hypercore_use_access_method = false;
bool ts_guc_enable_chunk_skipping = false;
TSDLLEXPORT bool ts_guc_enable_segmentwise_recompression = true;
TSDLLEXPORT bool ts_guc_enable_bool_compression = false;

/* Enable of disable columnar scans for columnar-oriented storage engines. If
* disabled, regular sequence scans will be used instead. */
Expand Down Expand Up @@ -746,6 +747,17 @@ _guc_init(void)
NULL,
NULL);

DefineCustomBoolVariable(MAKE_EXTOPTION("enable_bool_compression"),
"Enable experimental bool compression functionality",
"Enable bool compression",
&ts_guc_enable_bool_compression,
false,
PGC_USERSET,
0,
NULL,
NULL,
NULL);

/*
* Define the limit on number of invalidation-based refreshes we allow per
* refresh call. If this limit is exceeded, fall back to a single refresh that
Expand Down
1 change: 1 addition & 0 deletions src/guc.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ extern TSDLLEXPORT bool ts_guc_enable_delete_after_compression;
extern TSDLLEXPORT bool ts_guc_enable_merge_on_cagg_refresh;
extern bool ts_guc_enable_chunk_skipping;
extern TSDLLEXPORT bool ts_guc_enable_segmentwise_recompression;
extern TSDLLEXPORT bool ts_guc_enable_bool_compression;

#ifdef USE_TELEMETRY
typedef enum TelemetryLevel
Expand Down
9 changes: 9 additions & 0 deletions tsl/src/compression/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,15 @@ structure and does not actually compress it (though TOAST-based compression
can be applied on top). It is the compression mechanism used when no other
compression mechanism works. It can store any type of data.

### Bool Compressor

The bool compressor is a simple compression algorithm that stores boolean values
using the simple8b_rle algorithm only, without any additional processing. During
decompression it decompresses the data and stores it in memory as a bitmap. The
row based iterators then walk through the bitmap. The bool compressor differs from
the other compressors in that it stores the last non-value as a place holder for
the null values. This is done to make vectorization easier.

# Merging chunks while compressing #

## Setup ##
Expand Down
3 changes: 2 additions & 1 deletion tsl/src/compression/algorithms/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ set(SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/datum_serialize.c
${CMAKE_CURRENT_SOURCE_DIR}/deltadelta.c
${CMAKE_CURRENT_SOURCE_DIR}/dictionary.c
${CMAKE_CURRENT_SOURCE_DIR}/gorilla.c)
${CMAKE_CURRENT_SOURCE_DIR}/gorilla.c
${CMAKE_CURRENT_SOURCE_DIR}/bool_compress.c)
target_sources(${TSL_LIBRARY_NAME} PRIVATE ${SOURCES})
Loading
Loading