-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support vectorized aggregation on Hypercore TAM #7655
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7655 +/- ##
==========================================
+ Coverage 80.06% 81.98% +1.91%
==========================================
Files 190 246 +56
Lines 37181 45067 +7886
Branches 9450 11242 +1792
==========================================
+ Hits 29770 36948 +7178
- Misses 2997 3710 +713
+ Partials 4414 4409 -5 ☔ View full report in Codecov by Sentry. |
486e2b1
to
270fd54
Compare
efd1d0e
to
20de575
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No significant comments. A request to add verbose
so that we can make sure that we do not accidentally introduce bugs and also wondering about the coverage, so would be good to investigate this.
600808e
to
0dba341
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Note that I just merged it with main through the github interface, so you might need to fetch if you want to update something. This should also test the text columns, because I just merged the text column grouping into the main branch.
There was just a cosmetic conflict in the workflows, otherwise it merged automatically. |
It needs to know the value size.
Add support for running VectorAgg on top of scans on Hypercore TAM. Currently, only ColumnarScan can run below VectorAgg when Hypercore TAM is used. In theory, a SeqScan or IndexScan reading from Hypercore TAM should also work because they would produce Arrow slots. However, only ColumnarScan performs vectorized filtering, which is currently assumed to happen before the VectorAgg node. In ColumnarScan, it is necessary to turn off projection when VectorAgg is used. Otherwise, it would project the arrow slot into a virtual slot, thus losing the vector data. Ideally, a projection should never be planned to begin with, but this isn't possible since VectorAgg modifies existing non-vectorized Agg plans that already includes projections.
14c2891
to
6409603
Compare
## 2.19.0 (2025-03-12) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com>
## 2.19.0 (2025-03-18) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. * Improved concurrency of INSERT, UPDATE and DELETE operations on the columnstore by no longer blocking DML statements during the recompression of a chunk. * Improved system performance during Continuous Aggregates refreshes by breaking them into smaller batches which reduces systems pressure and minimizes the risk of spilling to disk. * Faster and more up-to-date results for queries against Continuous Aggregates by materializing the most recent data first (vs old data first in prior versions). * Faster analytical queries with SIMD vectorization of aggregations over text columns and group by over multiple column * Enable optimizing chunk size for faster query performance on the columnstore by adding support for merging columnstore chunks to the merge_chunk API. **Deprecation warning** This is the last minor release supporting PostgreSQL 14. Starting with the minor version of TimescaleDB only Postgres 15, 16 and 17 will be supported. **Downgrading of 2.19.0** This release introduces custom bool compression, if you enable this feature via the `enable_bool_compression` and must downgrade to a previous, please use the [following script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql) to convert the columns back to their previous state. TimescaleDB versions prior to 2.19.0 do not know how to handle this new type. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **GUCs** * `enable_bool_compression`: enable the BOOL compression algorithm, default: `OFF` * `enable_exclusive_locking_recompression`: enable exclusive locking during recompression (legacy mode), default: `OFF` **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT --------- Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com> Signed-off-by: Ramon Guiu <ramon@timescale.com> Co-authored-by: Ramon Guiu <ramon@timescale.com>
Add support for vectorized aggregation over Hypercore TAM. This includes some refactoring of the VectorAgg node in order to plan vectorized aggregation on top of ColumnarScans.
Currently, only ColumnarScan can run below VectorAgg, because it is doing qual filtering. In theory, a SeqScan reading from Hypercore TAM should also work because it would produce Arrow slots. However, a SeqScan doesn't do vectorized filtering, which is currently assumed to be done before the VectorAgg node.
In ColumnarScan, it necessary to turn off projection when VectorAgg is used. Otherwise, it would project the arrow slot into a virtual slot, thus losing the vector data. Ideally, a projection should never be planned to begin with, but this isn't possible since VectorAgg relies on replacing existing non-vectorized Agg plans added by PostgreSQL.
The existing vectoragg tests can be run with TAM enabled by default and in that case the tests produces the same vectoragg results as without TAM.
Closes: #7654
Disable-check: commit-count