Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can btree index be used to speed ​​up sorting operations of columnar table? #187

Closed
ZhiXingHeYiApple opened this issue Oct 29, 2023 · 9 comments

Comments

@ZhiXingHeYiApple
Copy link

What's wrong?

By reading the documentation, we can know that column tables currently support btree and hash index.
I have a columnar table with little more than six million data volumes. The id field of table is primary key (which auto generate btree index). But when I perform an id sort query on this table, although the query plan shows that index scanning is performed, the query is still extremely slow. It seems that the root cause is Buffers: shared hit=65743485 read=4259

select count(*) from log_delta_flow_stat;
count  |
-------+
6088351|


explain (analyze true, buffers true) select id from log_delta_flow_stat order by id offset 0 limit 10;

QUERY PLAN                                                                                                                                                           |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Limit  (cost=0.43..0.70 rows=10 width=8) (actual time=45106.868..45106.869 rows=0 loops=1)                                                                           |
  Buffers: shared hit=65743485 read=4259                                                                                                                             |
  ->  Index Scan using log_delta_flow_stat_pkey on log_delta_flow_stat  (cost=0.43..162174.32 rows=6086603 width=8) (actual time=45106.866..45106.867 rows=0 loops=1)|
        Buffers: shared hit=65743485 read=4259                                                                                                                       |
Planning:                                                                                                                                                            |
  Buffers: shared hit=884                                                                                                                                            |
Planning Time: 0.572 ms                                                                                                                                              |
Execution Time: 45139.394 ms                                                                                                                                         |

And When I remove the order by id clause, the query (which use ColumnarScan) is fast.

explain (analyze true, buffers true) select id from log_delta_flow_stat offset 0 limit 10;

QUERY PLAN                                                                                                                                     |
-----------------------------------------------------------------------------------------------------------------------------------------------+
Limit  (cost=0.00..1.00 rows=10 width=8) (actual time=0.667..0.671 rows=10 loops=1)                                                            |
  Buffers: shared hit=67 read=4                                                                                                                |
  ->  Custom Scan (ColumnarScan) on log_delta_flow_stat  (cost=0.00..610628.95 rows=6094581 width=8) (actual time=0.666..0.668 rows=10 loops=1)|
        Columnar Projected Columns: id                                                                                                         |
        Cache Hits: 0                                                                                                                          |
        Cache Misses: 1                                                                                                                        |
        Cache Evictions: 0                                                                                                                     |
        Cache Writes: 1                                                                                                                        |
        Cache Maximum Size: 80000                                                                                                              |
        Cache Ending Size: 80000                                                                                                               |
        Total Cache Entries: 1                                                                                                                 |
        Buffers: shared hit=67 read=4                                                                                                          |
Planning:                                                                                                                                      |
  Buffers: shared hit=683                                                                                                                      |
Planning Time: 0.624 ms                                                                                                                        |
Execution Time: 0.719 ms                                                                                                                       |

Thanks for your reply.

@ZhiXingHeYiApple ZhiXingHeYiApple added the bug Something isn't working label Oct 29, 2023
@mkaruza
Copy link
Contributor

mkaruza commented Oct 30, 2023

Hi @ZhiXingHeYiApple

  • Columnar storage doesn't work very good with random access and there is a overhead of fetching and decompressing chunk just to find specific tuple. Currently, chunk that is going to be read and will remain in memory only if next tuple fetched is in same chunk - if not it will removed from memory.

  • If your table has a lot of columns, index scan will instruct storage engine to retrieve all columns, rather than specific one, and that also cause overhead.

In short, this are known limitation of columnar storage with index scans.

@wuputah wuputah changed the title [Bug]: Whether btree index can speed ​​up sorting operations of columnar table? Can btree index be used to speed ​​up sorting operations of columnar table? Oct 30, 2023
@wuputah wuputah removed the bug Something isn't working label Oct 30, 2023
@leoyvens
Copy link

I've been measuring BTree index scans in Hydra tables vs heap tables, and I've found that it can be improved by lowering the stripe_row_limit and rebuilding the table.

In my results, with 10k rows per stripe, for a bigint column, heap outperforms by 10x. Not so bad for Hydra but this point by @mkaruza could explain why it doesn't match heap table performance:

If your table has a lot of columns, index scan will instruct storage engine to retrieve all columns, rather than specific one, and that also cause overhead.

My table has 20 columns. This seems like an unfourtunate interaction between btree indexes and hydra, and I'd have some follow up questions on it. How can I tell if this is happening? Is there some DB config I can set to prevent the index scan from doing this? Could Hydra itself implement an optimization for this or are you blocked by limitations in PG core?

@JerrySievert
Copy link
Contributor

you can enable column decompression caching, which might help with random querying like you are doing.

you can read more about enabling it and tuning it at https://docs.hydra.so/concepts/optimizing-query-performance#column-cache

@leoyvens
Copy link

Thanks, though from my experiments my impression is that having the disk page in the OS or PG buffers cache is of more impact than having it on the decompression cache, decompression didn't seem like a major bottleneck. My questions were more specific to that situation where there is column overfetching on index scan.

@JerrySievert
Copy link
Contributor

the compression cache is meant to help when you are actively decompressing the same chunk repeatedly, since the pg/os caches only hold the compressed chunk. depends entirely on how often those same chunks get decompressed. if you enable it, those counts should show up in the explain analyze, giving you the ability to determine whether quickly if it is worth it to enable or now.

@leoyvens
Copy link

I'm on Hydra Cloud so that must already be on. The cache statistics don't show in Index Scan query plans. Anyways, for my case I'm measuring in the performance of 'cold' reads of rows not in caches.

@JerrySievert
Copy link
Contributor

the cache is only for the select query itself, cold, not in-general. and, is only helpful specifically when you are continually reading the same chunks.

@mkaruza
Copy link
Contributor

mkaruza commented Oct 31, 2023

Hi @leoyvens

My table has 20 columns. This seems like an unfourtunate interaction between btree indexes and hydra, and I'd have some follow up questions on it. How can I tell if this is happening? Is there some DB config I can set to prevent the index scan from doing this? Could Hydra itself implement an optimization for this or are you blocked by limitations in PG core?

There is no config to control this and it is always happening, index scan node request tuple from storage (heap, columnar, something else). For heap it cheap since it locate row tuple and returns , while for columnar we need to fetch all columns (with each default of 10k values), decompress them just to return single tuple requested by index scan.
We are working on some improvement that will address this issue - but again it will not fix all cases and will depend on query and plan that is going to be executed.

@wuputah
Copy link
Member

wuputah commented Dec 13, 2023

addressed in #205. please give it a try when you can and let us know if it helps. You will need to SET columnar.enable_columnar_index_scan = true; to use the new functionality.

@wuputah wuputah closed this as completed Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants