Can btree index be used to speed up sorting operations of columnar table? #187

ZhiXingHeYiApple · 2023-10-29T14:12:21Z

What's wrong?

By reading the documentation, we can know that column tables currently support btree and hash index.
I have a columnar table with little more than six million data volumes. The id field of table is primary key (which auto generate btree index). But when I perform an id sort query on this table, although the query plan shows that index scanning is performed, the query is still extremely slow. It seems that the root cause is Buffers: shared hit=65743485 read=4259

select count(*) from log_delta_flow_stat;
count  |
-------+
6088351|


explain (analyze true, buffers true) select id from log_delta_flow_stat order by id offset 0 limit 10;

QUERY PLAN                                                                                                                                                           |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Limit  (cost=0.43..0.70 rows=10 width=8) (actual time=45106.868..45106.869 rows=0 loops=1)                                                                           |
  Buffers: shared hit=65743485 read=4259                                                                                                                             |
  ->  Index Scan using log_delta_flow_stat_pkey on log_delta_flow_stat  (cost=0.43..162174.32 rows=6086603 width=8) (actual time=45106.866..45106.867 rows=0 loops=1)|
        Buffers: shared hit=65743485 read=4259                                                                                                                       |
Planning:                                                                                                                                                            |
  Buffers: shared hit=884                                                                                                                                            |
Planning Time: 0.572 ms                                                                                                                                              |
Execution Time: 45139.394 ms                                                                                                                                         |

And When I remove the order by id clause, the query (which use ColumnarScan) is fast.

explain (analyze true, buffers true) select id from log_delta_flow_stat offset 0 limit 10;

QUERY PLAN                                                                                                                                     |
-----------------------------------------------------------------------------------------------------------------------------------------------+
Limit  (cost=0.00..1.00 rows=10 width=8) (actual time=0.667..0.671 rows=10 loops=1)                                                            |
  Buffers: shared hit=67 read=4                                                                                                                |
  ->  Custom Scan (ColumnarScan) on log_delta_flow_stat  (cost=0.00..610628.95 rows=6094581 width=8) (actual time=0.666..0.668 rows=10 loops=1)|
        Columnar Projected Columns: id                                                                                                         |
        Cache Hits: 0                                                                                                                          |
        Cache Misses: 1                                                                                                                        |
        Cache Evictions: 0                                                                                                                     |
        Cache Writes: 1                                                                                                                        |
        Cache Maximum Size: 80000                                                                                                              |
        Cache Ending Size: 80000                                                                                                               |
        Total Cache Entries: 1                                                                                                                 |
        Buffers: shared hit=67 read=4                                                                                                          |
Planning:                                                                                                                                      |
  Buffers: shared hit=683                                                                                                                      |
Planning Time: 0.624 ms                                                                                                                        |
Execution Time: 0.719 ms                                                                                                                       |

Thanks for your reply.

The text was updated successfully, but these errors were encountered:

mkaruza · 2023-10-30T08:22:00Z

Hi @ZhiXingHeYiApple

Columnar storage doesn't work very good with random access and there is a overhead of fetching and decompressing chunk just to find specific tuple. Currently, chunk that is going to be read and will remain in memory only if next tuple fetched is in same chunk - if not it will removed from memory.
If your table has a lot of columns, index scan will instruct storage engine to retrieve all columns, rather than specific one, and that also cause overhead.

In short, this are known limitation of columnar storage with index scans.

leoyvens · 2023-10-30T18:11:06Z

I've been measuring BTree index scans in Hydra tables vs heap tables, and I've found that it can be improved by lowering the stripe_row_limit and rebuilding the table.

In my results, with 10k rows per stripe, for a bigint column, heap outperforms by 10x. Not so bad for Hydra but this point by @mkaruza could explain why it doesn't match heap table performance:

If your table has a lot of columns, index scan will instruct storage engine to retrieve all columns, rather than specific one, and that also cause overhead.

My table has 20 columns. This seems like an unfourtunate interaction between btree indexes and hydra, and I'd have some follow up questions on it. How can I tell if this is happening? Is there some DB config I can set to prevent the index scan from doing this? Could Hydra itself implement an optimization for this or are you blocked by limitations in PG core?

JerrySievert · 2023-10-30T18:29:10Z

you can enable column decompression caching, which might help with random querying like you are doing.

you can read more about enabling it and tuning it at https://docs.hydra.so/concepts/optimizing-query-performance#column-cache

leoyvens · 2023-10-30T18:39:05Z

Thanks, though from my experiments my impression is that having the disk page in the OS or PG buffers cache is of more impact than having it on the decompression cache, decompression didn't seem like a major bottleneck. My questions were more specific to that situation where there is column overfetching on index scan.

JerrySievert · 2023-10-30T18:40:59Z

the compression cache is meant to help when you are actively decompressing the same chunk repeatedly, since the pg/os caches only hold the compressed chunk. depends entirely on how often those same chunks get decompressed. if you enable it, those counts should show up in the explain analyze, giving you the ability to determine whether quickly if it is worth it to enable or now.

leoyvens · 2023-10-30T21:07:23Z

I'm on Hydra Cloud so that must already be on. The cache statistics don't show in Index Scan query plans. Anyways, for my case I'm measuring in the performance of 'cold' reads of rows not in caches.

JerrySievert · 2023-10-30T21:58:24Z

the cache is only for the select query itself, cold, not in-general. and, is only helpful specifically when you are continually reading the same chunks.

mkaruza · 2023-10-31T09:04:53Z

Hi @leoyvens

My table has 20 columns. This seems like an unfourtunate interaction between btree indexes and hydra, and I'd have some follow up questions on it. How can I tell if this is happening? Is there some DB config I can set to prevent the index scan from doing this? Could Hydra itself implement an optimization for this or are you blocked by limitations in PG core?

There is no config to control this and it is always happening, index scan node request tuple from storage (heap, columnar, something else). For heap it cheap since it locate row tuple and returns , while for columnar we need to fetch all columns (with each default of 10k values), decompress them just to return single tuple requested by index scan.
We are working on some improvement that will address this issue - but again it will not fix all cases and will depend on query and plan that is going to be executed.

wuputah · 2023-12-13T20:50:44Z

addressed in #205. please give it a try when you can and let us know if it helps. You will need to SET columnar.enable_columnar_index_scan = true; to use the new functionality.

ZhiXingHeYiApple added the bug Something isn't working label Oct 29, 2023

wuputah changed the title ~~[Bug]: Whether btree index can speed up sorting operations of columnar table?~~ Can btree index be used to speed up sorting operations of columnar table? Oct 30, 2023

wuputah removed the bug Something isn't working label Oct 30, 2023

leoyvens mentioned this issue Nov 7, 2023

[Perf]: Planning time increases linearly with number of stripes #188

Open

wuputah closed this as completed Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can btree index be used to speed up sorting operations of columnar table? #187

Can btree index be used to speed up sorting operations of columnar table? #187

ZhiXingHeYiApple commented Oct 29, 2023

mkaruza commented Oct 30, 2023 •

edited

Loading

leoyvens commented Oct 30, 2023

JerrySievert commented Oct 30, 2023

leoyvens commented Oct 30, 2023

JerrySievert commented Oct 30, 2023

leoyvens commented Oct 30, 2023

JerrySievert commented Oct 30, 2023

mkaruza commented Oct 31, 2023 •

edited

Loading

wuputah commented Dec 13, 2023 •

edited

Loading

Can btree index be used to speed ​​up sorting operations of columnar table? #187

Can btree index be used to speed ​​up sorting operations of columnar table? #187

Comments

ZhiXingHeYiApple commented Oct 29, 2023

What's wrong?

mkaruza commented Oct 30, 2023 • edited Loading

leoyvens commented Oct 30, 2023

JerrySievert commented Oct 30, 2023

leoyvens commented Oct 30, 2023

JerrySievert commented Oct 30, 2023

leoyvens commented Oct 30, 2023

JerrySievert commented Oct 30, 2023

mkaruza commented Oct 31, 2023 • edited Loading

wuputah commented Dec 13, 2023 • edited Loading

Can btree index be used to speed up sorting operations of columnar table? #187

Can btree index be used to speed up sorting operations of columnar table? #187

mkaruza commented Oct 30, 2023 •

edited

Loading

mkaruza commented Oct 31, 2023 •

edited

Loading

wuputah commented Dec 13, 2023 •

edited

Loading