Support vectorized aggregation on Hypercore TAM #7655

erimatnor · 2025-02-04T15:09:42Z

Add support for vectorized aggregation over Hypercore TAM. This includes some refactoring of the VectorAgg node in order to plan vectorized aggregation on top of ColumnarScans.

Currently, only ColumnarScan can run below VectorAgg, because it is doing qual filtering. In theory, a SeqScan reading from Hypercore TAM should also work because it would produce Arrow slots. However, a SeqScan doesn't do vectorized filtering, which is currently assumed to be done before the VectorAgg node.

In ColumnarScan, it necessary to turn off projection when VectorAgg is used. Otherwise, it would project the arrow slot into a virtual slot, thus losing the vector data. Ideally, a projection should never be planned to begin with, but this isn't possible since VectorAgg relies on replacing existing non-vectorized Agg plans added by PostgreSQL.

The existing vectoragg tests can be run with TAM enabled by default and in that case the tests produces the same vectoragg results as without TAM.

Closes: #7654

Disable-check: commit-count

codecov · 2025-02-04T15:20:20Z

Codecov Report

Attention: Patch coverage is 79.79798% with 20 lines in your changes missing coverage. Please review.

Project coverage is 81.98%. Comparing base (59f50f2) to head (6409603).
Report is 762 commits behind head on main.

Files with missing lines	Patch %	Lines
tsl/src/nodes/vector_agg/vector_slot.h	59.37%	11 Missing and 2 partials ⚠️
tsl/src/nodes/vector_agg/exec.c	85.36%	0 Missing and 6 partials ⚠️
tsl/src/nodes/vector_agg/plan_tam.c	93.75%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7655      +/-   ##
==========================================
+ Coverage   80.06%   81.98%   +1.91%     
==========================================
  Files         190      246      +56     
  Lines       37181    45067    +7886     
  Branches     9450    11242    +1792     
==========================================
+ Hits        29770    36948    +7178     
- Misses       2997     3710     +713     
+ Partials     4414     4409       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tsl/src/nodes/vector_agg/plan.c

tsl/src/nodes/vector_agg/exec.c

tsl/src/nodes/vector_agg/plan.c

tsl/src/nodes/vector_agg/vector_slot.h

tsl/src/nodes/vector_agg/exec.c

mkindahl

No significant comments. A request to add verbose so that we can make sure that we do not accidentally introduce bugs and also wondering about the coverage, so would be good to investigate this.

tsl/src/nodes/vector_agg/exec.h

tsl/test/expected/hypercore_vectoragg.out

akuzm

LGTM. Note that I just merged it with main through the github interface, so you might need to fetch if you want to update something. This should also test the text columns, because I just merged the text column grouping into the main branch.

akuzm · 2025-02-13T13:16:31Z

I just merged it with main through the github interface

There was just a cosmetic conflict in the workflows, otherwise it merged automatically.

It needs to know the value size.

Add support for running VectorAgg on top of scans on Hypercore TAM. Currently, only ColumnarScan can run below VectorAgg when Hypercore TAM is used. In theory, a SeqScan or IndexScan reading from Hypercore TAM should also work because they would produce Arrow slots. However, only ColumnarScan performs vectorized filtering, which is currently assumed to happen before the VectorAgg node. In ColumnarScan, it is necessary to turn off projection when VectorAgg is used. Otherwise, it would project the arrow slot into a virtual slot, thus losing the vector data. Ideally, a projection should never be planned to begin with, but this isn't possible since VectorAgg modifies existing non-vectorized Agg plans that already includes projections.

@bjornuppeke

## 2.19.0 (2025-03-12) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com>

@bjornuppeke

## 2.19.0 (2025-03-18) This release contains performance improvements and bug fixes since the 2.18.2 release. We recommend that you upgrade at the next available opportunity. * Improved concurrency of INSERT, UPDATE and DELETE operations on the columnstore by no longer blocking DML statements during the recompression of a chunk. * Improved system performance during Continuous Aggregates refreshes by breaking them into smaller batches which reduces systems pressure and minimizes the risk of spilling to disk. * Faster and more up-to-date results for queries against Continuous Aggregates by materializing the most recent data first (vs old data first in prior versions). * Faster analytical queries with SIMD vectorization of aggregations over text columns and group by over multiple column * Enable optimizing chunk size for faster query performance on the columnstore by adding support for merging columnstore chunks to the merge_chunk API. **Deprecation warning** This is the last minor release supporting PostgreSQL 14. Starting with the minor version of TimescaleDB only Postgres 15, 16 and 17 will be supported. **Downgrading of 2.19.0** This release introduces custom bool compression, if you enable this feature via the `enable_bool_compression` and must downgrade to a previous, please use the [following script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql) to convert the columns back to their previous state. TimescaleDB versions prior to 2.19.0 do not know how to handle this new type. **Features** * [#7586](#7586) Vectorized aggregation with grouping by a single text column. * [#7632](#7632) Optimize recompression for chunks without segmentby * [#7655](#7655) Support vectorized aggregation on Hypercore TAM * [#7669](#7669) Add support for merging compressed chunks * [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on. * [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks * [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses * [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration * [#7788](#7788) Add callback to mem_guard for background workers * [#7789](#7789) Do not recompress segmentwise when default order by is empty * [#7790](#7790) Add configurable Incremental CAgg Refresh Policy **Bugfixes** * [#7665](#7665) Block merging of frozen chunks * [#7673](#7673) Don't abort additional INSERTs when hitting first conflict * [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order. * [#7747](#7747) Block TAM rewrites with incompatible GUC setting * [#7748](#7748) Crash in the segmentwise recompression * [#7764](#7764) Fix compression settings handling in Hypercore TAM * [#7768](#7768) Remove costing index scan of hypertable parent * [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE **GUCs** * `enable_bool_compression`: enable the BOOL compression algorithm, default: `OFF` * `enable_exclusive_locking_recompression`: enable exclusive locking during recompression (legacy mode), default: `OFF` **Thanks** * @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks * @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT --------- Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com> Signed-off-by: Ramon Guiu <ramon@timescale.com> Co-authored-by: Ramon Guiu <ramon@timescale.com>

erimatnor added table-access-method hypercore labels Feb 4, 2025

erimatnor requested a review from mkindahl February 4, 2025 15:09

github-actions bot assigned erimatnor Feb 4, 2025

erimatnor force-pushed the hypercore-vectoragg branch from 486e2b1 to 270fd54 Compare February 4, 2025 15:20

akuzm reviewed Feb 4, 2025

View reviewed changes

tsl/src/nodes/vector_agg/plan.c Show resolved Hide resolved

erimatnor force-pushed the hypercore-vectoragg branch 9 times, most recently from efd1d0e to 20de575 Compare February 6, 2025 13:47

erimatnor marked this pull request as ready for review February 6, 2025 16:53

erimatnor requested a review from akuzm February 6, 2025 16:53

akuzm reviewed Feb 10, 2025

View reviewed changes

tsl/src/nodes/vector_agg/exec.c Show resolved Hide resolved

akuzm reviewed Feb 10, 2025

View reviewed changes

tsl/src/nodes/vector_agg/plan.c Outdated Show resolved Hide resolved

akuzm reviewed Feb 10, 2025

View reviewed changes

tsl/src/nodes/vector_agg/vector_slot.h Outdated Show resolved Hide resolved

akuzm reviewed Feb 10, 2025

View reviewed changes

tsl/src/nodes/vector_agg/exec.c Outdated Show resolved Hide resolved

mkindahl approved these changes Feb 11, 2025

View reviewed changes

tsl/src/nodes/vector_agg/exec.h Show resolved Hide resolved

tsl/test/expected/hypercore_vectoragg.out Outdated Show resolved Hide resolved

erimatnor force-pushed the hypercore-vectoragg branch 7 times, most recently from 600808e to 0dba341 Compare February 12, 2025 12:36

erimatnor requested a review from akuzm February 12, 2025 14:48

akuzm approved these changes Feb 13, 2025

View reviewed changes

akuzm and others added 2 commits February 13, 2025 18:30

Interface changes for the serialized hashing strategy

f66768c

It needs to know the value size.

erimatnor force-pushed the hypercore-vectoragg branch from 14c2891 to 6409603 Compare February 13, 2025 17:33

erimatnor enabled auto-merge (rebase) February 13, 2025 18:03

erimatnor merged commit af64c7b into timescale:main Feb 13, 2025
49 of 50 checks passed

erimatnor deleted the hypercore-vectoragg branch February 13, 2025 18:29

erimatnor mentioned this pull request Feb 14, 2025

Interface changes for the serialized hashing strategy #7657

Closed

This was referenced Mar 12, 2025

CHANGELOG for 2.19.0 #7824

Closed

CHANGELOG for 2.19.0 #7829

Merged

bayandin mentioned this pull request Mar 21, 2025

timescaledb 2.19.0 bayandin/homebrew-tap#255

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support vectorized aggregation on Hypercore TAM #7655

Support vectorized aggregation on Hypercore TAM #7655

erimatnor commented Feb 4, 2025 •

edited

Loading

codecov bot commented Feb 4, 2025 •

edited

Loading

mkindahl left a comment

akuzm left a comment

akuzm commented Feb 13, 2025

Support vectorized aggregation on Hypercore TAM #7655

Support vectorized aggregation on Hypercore TAM #7655

Conversation

erimatnor commented Feb 4, 2025 • edited Loading

codecov bot commented Feb 4, 2025 • edited Loading

Codecov Report

mkindahl left a comment

Choose a reason for hiding this comment

akuzm left a comment

Choose a reason for hiding this comment

akuzm commented Feb 13, 2025

erimatnor commented Feb 4, 2025 •

edited

Loading

codecov bot commented Feb 4, 2025 •

edited

Loading