Skip to content

Optimal query for partitioned psql table #8333

@APPLECHOI

Description

@APPLECHOI

Is your proposal related to a problem?

Hey I want to discuss the generated hasura query is not in the optimal:
For example, If I have a partition table by range (height) which schemal is like that:
Which transaction table may have infinite rows in the future with infinite partition table of size ~200k row each

CREATE TABLE block
(
    height           BIGINT UNIQUE PRIMARY KEY,
...
);

CREATE INDEX block_index ON block (height);

CREATE TABLE transaction
(
		height BIGINT NOT NULL REFERENCES block (height),
        transaction_id TEXT NOT NULL,
...
) PARTITION BY RANGE(height);
CREATE INDEX transaction_index ON transaction (height);

CREATE TABLE transaction_default PARTITION OF transaction DEFAULT;

And I query the certain height with their transaction

query MyQuery {
  block(order_by: {height: desc}, limit: 10, where: {height: {_eq: "25984463"}}) {
    height
    id
    parent_id
    timestamp
    transactions {
      height
      transaction_id
    }
  }
}

The hasura generated query would query all the partition table and JOIN block table with each of them
Which the Execution Plan is like:

Aggregate  (cost=171243.44..171243.45 rows=1 width=32)
  ->  Sort  (cost=171243.42..171243.43 rows=1 width=40)
        Sort Key: block.height DESC
        ->  Nested Loop Left Join  (cost=171235.34..171243.41 rows=1 width=40)
              ->  Limit  (cost=0.29..8.31 rows=1 width=178)
                    ->  Index Scan using block_pkey on block  (cost=0.29..8.31 rows=1 width=178)
                          Index Cond: (height = '25984463'::bigint)
              ->  Aggregate  (cost=171235.05..171235.06 rows=1 width=32)
                    ->  Append  (cost=0.00..171234.65 rows=26 width=73)
                          ->  Seq Scan on transaction_23850000_23900000  (cost=0.00..53677.18 rows=7 width=73)
                                Filter: (block.height = height)
                          ->  Seq Scan on transaction_23900000_23950000  (cost=0.00..60130.75 rows=11 width=73)
                                Filter: (block.height = height)
                          ->  Seq Scan on transaction_default  (cost=0.00..57426.60 rows=8 width=73)
                                Filter: (block.height = height)
                    SubPlan 2
                      ->  Result  (cost=0.00..0.01 rows=1 width=32)
              SubPlan 1
                ->  Result  (cost=0.00..0.01 rows=1 width=32)
JIT:
  Functions: 33
  Options: Inlining false, Optimization false, Expressions true, Deforming true

RESPONSE TIME
11166 ms
Hence, complexity = number of partition * log2

But if I do the optimal sql query by hand:

 select block.height,transaction.transaction_id from block left join transaction on block.height=transaction.height where block.height=25984463 order by block.height desc limit 10;

The execution plan would be:

                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Limit  (cost=1000.29..56725.60 rows=7 width=73)
   ->  Nested Loop Left Join  (cost=1000.29..56725.60 rows=7 width=73)
         Join Filter: (block.height = transaction_default.height)
         ->  Index Only Scan using block_pkey on block  (cost=0.29..8.31 rows=1 width=8)
               Index Cond: (height = 25984463)
         ->  Gather  (cost=1000.00..56717.20 rows=7 width=73)
               Workers Planned: 2
               ->  Parallel Seq Scan on transaction_default  (cost=0.00..55716.50 rows=3 width=73)
                     Filter: (height = 25984463)
(9 rows)

It will only query JOIN the partition table with certain height, complexity = number of partition + log2

Which JOIN is a expensive operation that would affact the performance a lot if hasura generated query join all partition table internally

I did a stress test with loads of partition tables with few rows accidentally and the query speed of Hasura is really slow because the query generated is not good enough. I just thinking if there is one day the rows will be infinite if we keep parsing it, that is not sustainable to have nlog2 complexity

Describe the solution you'd like

Please generate the optimal query for join cause at partition table

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions