-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Is your proposal related to a problem?
Hey I want to discuss the generated hasura query is not in the optimal:
For example, If I have a partition table by range (height) which schemal is like that:
Which transaction table may have infinite rows in the future with infinite partition table of size ~200k row each
CREATE TABLE block
(
height BIGINT UNIQUE PRIMARY KEY,
...
);
CREATE INDEX block_index ON block (height);
CREATE TABLE transaction
(
height BIGINT NOT NULL REFERENCES block (height),
transaction_id TEXT NOT NULL,
...
) PARTITION BY RANGE(height);
CREATE INDEX transaction_index ON transaction (height);
CREATE TABLE transaction_default PARTITION OF transaction DEFAULT;And I query the certain height with their transaction
query MyQuery {
block(order_by: {height: desc}, limit: 10, where: {height: {_eq: "25984463"}}) {
height
id
parent_id
timestamp
transactions {
height
transaction_id
}
}
}
The hasura generated query would query all the partition table and JOIN block table with each of them
Which the Execution Plan is like:
Aggregate (cost=171243.44..171243.45 rows=1 width=32)
-> Sort (cost=171243.42..171243.43 rows=1 width=40)
Sort Key: block.height DESC
-> Nested Loop Left Join (cost=171235.34..171243.41 rows=1 width=40)
-> Limit (cost=0.29..8.31 rows=1 width=178)
-> Index Scan using block_pkey on block (cost=0.29..8.31 rows=1 width=178)
Index Cond: (height = '25984463'::bigint)
-> Aggregate (cost=171235.05..171235.06 rows=1 width=32)
-> Append (cost=0.00..171234.65 rows=26 width=73)
-> Seq Scan on transaction_23850000_23900000 (cost=0.00..53677.18 rows=7 width=73)
Filter: (block.height = height)
-> Seq Scan on transaction_23900000_23950000 (cost=0.00..60130.75 rows=11 width=73)
Filter: (block.height = height)
-> Seq Scan on transaction_default (cost=0.00..57426.60 rows=8 width=73)
Filter: (block.height = height)
SubPlan 2
-> Result (cost=0.00..0.01 rows=1 width=32)
SubPlan 1
-> Result (cost=0.00..0.01 rows=1 width=32)
JIT:
Functions: 33
Options: Inlining false, Optimization false, Expressions true, Deforming trueRESPONSE TIME
11166 ms
Hence, complexity = number of partition * log2
But if I do the optimal sql query by hand:
select block.height,transaction.transaction_id from block left join transaction on block.height=transaction.height where block.height=25984463 order by block.height desc limit 10;The execution plan would be:
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (cost=1000.29..56725.60 rows=7 width=73)
-> Nested Loop Left Join (cost=1000.29..56725.60 rows=7 width=73)
Join Filter: (block.height = transaction_default.height)
-> Index Only Scan using block_pkey on block (cost=0.29..8.31 rows=1 width=8)
Index Cond: (height = 25984463)
-> Gather (cost=1000.00..56717.20 rows=7 width=73)
Workers Planned: 2
-> Parallel Seq Scan on transaction_default (cost=0.00..55716.50 rows=3 width=73)
Filter: (height = 25984463)
(9 rows)It will only query JOIN the partition table with certain height, complexity = number of partition + log2
Which JOIN is a expensive operation that would affact the performance a lot if hasura generated query join all partition table internally
I did a stress test with loads of partition tables with few rows accidentally and the query speed of Hasura is really slow because the query generated is not good enough. I just thinking if there is one day the rows will be infinite if we keep parsing it, that is not sustainable to have nlog2 complexity
Describe the solution you'd like
Please generate the optimal query for join cause at partition table