-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental CAgg Refresh Policy #7790
Incremental CAgg Refresh Policy #7790
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7790 +/- ##
==========================================
+ Coverage 80.06% 81.89% +1.82%
==========================================
Files 190 247 +57
Lines 37181 45685 +8504
Branches 9450 11431 +1981
==========================================
+ Hits 29770 37412 +7642
- Misses 2997 3776 +779
- Partials 4414 4497 +83 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
81f49e3
to
b697dae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Since it is in draft, I will wait with approving until you have the final version.
4e90f1e
to
d976191
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions regarding some parts of the code where I am not sure if it is correct or not.
d865281
to
b139fa1
Compare
3ee29cd
to
8dcbe0e
Compare
|
78be03a
to
3eb84bd
Compare
The default behavior will process all batches, this option will make sense ONLY when |
a9c1954
to
9cf7694
Compare
Nowadays a Continuous Aggregate refresh policy process everything only once independent of how large the refresh window is. For example if you have a hypertable with a huge amount of rows it can take a lot of time and requires a lot of resources in terms of CPU, Memory and I/O to refresh a CAgg, and all the aggregated data will be visible for the users only when the refresh policy complete it execution. This PR add the capability of a CAgg refresh policy be executed incrementaly in "batches". Each "batch" is an individual transaction that will process a small fraction of the entire refresh window, and once the "batch" finishes the execution the data refreshed will already be visible for the users even before policy execution end. To tweak and control the incremental refresh some new options was added to `add_continuous_aggregate_policy` API: * `buckets_per_batch`: number of buckets to be refreshed by a "batch". To summarize this value is multiplied by the CAgg bucket width to determine the size of the batch range. Default value is `0` (zero) that means it will keep the current behavior of single batch execution. Values less than `0` (zero) are not allowed. * `max_batches_per_execution`: maximum number of batches to be executed by a policy execution. This option is used to limit the number of batches processed by a single policy execution, so if some batches remain next time the policy run they will be processed. Default value is `10` (ten) that means that each job execution will process the maximum of ten batches. To make it unlimited then the value should be `0` (zero). Values less than `0` (zero) are not allowed.
e170e39
to
a1d1109
Compare
Related to this PR: timescale/timescaledb#7790 Signed-off-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Related to this PR: timescale/timescaledb#7790 Signed-off-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
## 2.19.0 (2025-03-12) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com>
## 2.19.0 (2025-03-18) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. * Improved concurrency of INSERT, UPDATE and DELETE operations on the columnstore by no longer blocking DML statements during the recompression of a chunk. * Improved system performance during Continuous Aggregates refreshes by breaking them into smaller batches which reduces systems pressure and minimizes the risk of spilling to disk. * Faster and more up-to-date results for queries against Continuous Aggregates by materializing the most recent data first (vs old data first in prior versions). * Faster analytical queries with SIMD vectorization of aggregations over text columns and group by over multiple column * Enable optimizing chunk size for faster query performance on the columnstore by adding support for merging columnstore chunks to the merge_chunk API. **Deprecation warning** This is the last minor release supporting PostgreSQL 14. Starting with the minor version of TimescaleDB only Postgres 15, 16 and 17 will be supported. **Downgrading of 2.19.0** This release introduces custom bool compression, if you enable this feature via the `enable_bool_compression` and must downgrade to a previous, please use the [following script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql) to convert the columns back to their previous state. TimescaleDB versions prior to 2.19.0 do not know how to handle this new type. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **GUCs** * `enable_bool_compression`: enable the BOOL compression algorithm, default: `OFF` * `enable_exclusive_locking_recompression`: enable exclusive locking during recompression (legacy mode), default: `OFF` **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT --------- Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com> Signed-off-by: Ramon Guiu <ramon@timescale.com> Co-authored-by: Ramon Guiu <ramon@timescale.com>
Nowadays a Continuous Aggregate refresh policy process everything only once independent of how large the refresh window is. For example if you have a hypertable with a huge amount of rows it can take a lot of time and requires a lot of resources in terms of CPU, Memory and I/O to refresh a CAgg, and all the aggregated data will be visible for the users only when the refresh policy complete it execution.
This PR add the capability of a CAgg refresh policy be executed incrementaly in "batches". Each "batch" is an individual transaction that will process a small fraction of the entire refresh window, and once the "batch" finishes the execution the data refreshed will already be visible for the users even before policy execution end.
To tweak and control the incremental refresh some new options was added to
add_continuous_aggregate_policy
API:buckets_per_batch
: number of buckets to be refreshed by a "batch". To summarize this value is multiplied by the CAgg bucket width to determine the size of the batch range. Default value is0
(zero) that means it will keep the current behavior of single batch execution. Values less than0
(zero) are not allowed.max_batches_per_execution
: maximum number of batches to be executed by a policy execution. This option is used to limit the number of batches processed by a single policy execution, so if some batches remain next time the policy run they will be processed. Default value is10
(ten) that means that each job execution will process the maximum of ten batches. To make it unlimited then the value should be0
(zero). Values less than0
(zero) are not allowed.