Skip to content

Commit

Permalink
[BACKPORT 2024.1][#21141,21036] YSQL: Enable remote filters for Bitma…
Browse files Browse the repository at this point in the history
…p Scans

Summary:
Original commit: 773869c / D32651
Pushing down filters prevents us from fetching unnecessary rows. In the bitmap scan case, if we can push down a filter at the Bitmap Index Scan level, it saves us from
1. retrieving the ybctid
2. using that ybctid in any bitmap operations (AND, OR)
3. retrieving the row corresponding to the ybctid.

With this diff, both YB Bitmap Table Scans and Bitmap Index Scans are capable of pushing down filters.

== YB Bitmap Table Scan pushdown ==

YB Bitmap Table Scans can push down any clause. The pushdown logic is similar to the recheck logic - it only pushes down a filter if it is necessary. For example,
```lang=sql
/*+ BitmapScan(test_limit) */ EXPLAIN SELECT * FROM test_limit WHERE a < 1000 OR b < 1000;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 YB Bitmap Table Scan on test_limit  (cost=6.91..11.21 rows=10 width=12)
   ->  BitmapOr  (cost=6.91..6.91 rows=20 width=0)
         ->  Bitmap Index Scan on test_limit_a_idx  (cost=0.00..3.45 rows=10 width=0)
               Index Cond: (a < 1000)
         ->  Bitmap Index Scan on test_limit_b_idx  (cost=0.00..3.45 rows=10 width=0)
               Index Cond: (b < 1000)
(6 rows)
```
does not require a remote or local filter because it knows that the index scans will return only correct results.
```
/*+ BitmapScan(test_limit) */ EXPLAIN SELECT * FROM test_limit WHERE a < 1000 OR b < 1000 AND c < 1;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 YB Bitmap Table Scan on test_limit  (cost=7.06..11.41 rows=10 width=12)
   Remote Filter: ((a < 1000) OR ((b < 1000) AND (c < 1)))
   ->  BitmapOr  (cost=7.06..7.06 rows=20 width=0)
         ->  Bitmap Index Scan on test_limit_a_idx  (cost=0.00..3.53 rows=10 width=0)
               Index Cond: (a < 1000)
         ->  Bitmap Index Scan on test_limit_b_idx  (cost=0.00..3.53 rows=10 width=0)
               Index Cond: (b < 1000)
(7 rows)
```
This requires a filter, and this filter can be pushed down.

== Bitmap Index Scan pushdown ==

There are two cases where filters can be pushed down for bitmap index scans.
=== 1. If the filter clause is at the same nesting level as the index qual in the boolean operation tree ===
During planning, when the OR is broken up into separate indexes, the pushdown conditions are extracted for each branch of the OR. This allows us to get useful filters for pushdown, knowing that a particular index scan is concerned only by a portion of the whole condition.

For example, in the index below, the first Bitmap Index Scan has an index condition `unique1 < 5` and a remote filter `unique1 % 2 = 0`.
```lang=sql
/*+ BitmapScan(tenk1) */ EXPLAIN (ANALYZE, DIST, COSTS OFF) SELECT * FROM tenk1 WHERE (unique1 < 5 AND unique1 % 2 = 0) OR
unique2 < 3;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 YB Bitmap Table Scan on tenk1 (actual rows=6 loops=1)
   Remote Filter: (((unique1 < 5) AND ((unique1 % 2) = 0)) OR (unique2 < 3))
   Storage Table Read Requests: 1
   Storage Table Rows Scanned: 6
   ->  BitmapOr (actual rows=6 loops=1)
         ->  Bitmap Index Scan on tenk1_unique1 (actual rows=3 loops=1)
               Index Cond: (unique1 < 5)
               Remote Index Filter: ((unique1 % 2) = 0)
               Storage Index Read Requests: 1
               Storage Index Rows Scanned: 5
         ->  Bitmap Index Scan on tenk1_unique2 (actual rows=3 loops=1)
               Index Cond: (unique2 < 3)
               Storage Index Read Requests: 1
               Storage Index Rows Scanned: 3
 Storage Read Requests: 3
 Storage Rows Scanned: 14
 Storage Write Requests: 0
 Storage Flush Requests: 0
(18 rows)
```

When creating a bitmap scan plan, the planner breaks the OR into two parts.

a: `(unique1 < 5 AND unique1 % 2 = 0)`
b: `unique2 < 3`

If it can find an index scan for each part, then a bitmap scan is a valid plan. (If there was a third clause on an unindexed field, c: `string1 = 'hi'`, then there would not be an index to answer each part of the OR so we cannot use a bitmap scan here).

The default PG behaviour is to extract index clauses from each branch, so PG would identify `unique1 < 5` for a and `unique2 < 3` for b. With this diff, Yugabyte goes a step further. If a condition is not a valid index condition (e.g. `unique1 % 2 = 0`, then we test if it can be pushed down using the index columns. If yes, then we store it as a valid pushdown filter for a bitmap index scan.

So we identify:

| | index qual | filter qual |
| a | `unique1 < 5` | `unique1 % 2 = 0` |
| b | `unique2 < 3` | |

Since we found an index path for each branch of the OR, we've found a valid bitmap scan path.

=== 2. All columns referenced by the top-level condition are contained in the index ===
In this case, we can push down the entire condition (or a portion of the top-level AND clause) to the index with the included columns. We cannot partially pushdown top-level ORs because this index doesn't know enough about the other branches, but that case was already handled above.

For example, in the example below, the first Bitmap Index Scan  pushes down the entire condition `(((a < 5) AND ((a % 2) = 0)) OR ((c <= 10) AND ((a % 3) = 0)))` on `multi_c_a_idx` because the every column in the condition can be checked by this index.
```lang=sql
/*+ BitmapScan(multi) */ EXPLAIN (ANALYZE, DIST, COSTS OFF)
SELECT * FROM multi WHERE (a < 50 AND a % 2 = 0) AND (c <= 100 AND c % 3 = 0);
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on multi (actual time=3.655..3.666 rows=16 loops=1)
   Storage Table Read Requests: 1
   Storage Table Read Execution Time: 1.014 ms
   Storage Table Rows Scanned: 16
   ->  BitmapAnd (actual time=2.492..2.492 rows=16 loops=1)
         ->  Bitmap Index Scan on multi_c_a_idx (actual time=2.550..2.550 rows=16 loops=1)
               Index Cond: (c <= 100)
               Storage Index Filter: ((a < 50) AND ((a % 2) = 0) AND ((c % 3) = 0))
               Storage Index Read Requests: 1
               Storage Index Read Execution Time: 1.778 ms
               Storage Index Rows Scanned: 33
         ->  Bitmap Index Scan on multi_pkey (actual time=1.288..1.288 rows=24 loops=1)
               Index Cond: (a < 50)
               Storage Index Filter: ((a % 2) = 0)
               Storage Table Read Requests: 1
               Storage Table Read Execution Time: 1.164 ms
               Storage Table Rows Scanned: 49
 Planning Time: 15.273 ms
 Execution Time: 4.362 ms
 Storage Read Requests: 2
 Storage Read Execution Time: 2.942 ms
 Storage Rows Scanned: 82
 Storage Write Requests: 0
 Catalog Read Requests: 19
 Catalog Read Execution Time: 31.753 ms
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 34.695 ms
 Peak Memory Usage: 119 kB
(26 rows)
```

=== 3. All columns referenced by the one of the top-level clauses (implicitly ANDed) are contained in the index ===

Consider the `tenk1` table:
```lang=sql
\d tenk1
                  Table "public.tenk1"
...
Indexes:
    "tenk1_hundred" lsm (hundred ASC)
    "tenk1_thous_tenthous" lsm (thousand ASC, tenthous ASC)
    "tenk1_unique1" lsm (unique1 ASC)
    "tenk1_unique2" lsm (unique2 ASC)
```

Then the query can push down the `tenthous % 2 = 0` condition to the `thousand` bitmap index scan. Even though it will still need to be validated at the Table Scan layer (to check the results from the `unique2` bitmap index scan, it is efficient to reduce the number of rows retrieved.

```
explain (analyze, costs off, dist) /*+ bitmapscan(tenk1) */ select * from tenk1 where thousand < 10 and unique2 < 1000 and tenthous % 2 =
 0;
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on tenk1 (actual time=17.917..17.929 rows=4 loops=1)
   Storage Filter: ((tenthous % 2) = 0)
   Storage Table Read Requests: 1
   Storage Table Read Execution Time: 1.170 ms
   Storage Table Rows Scanned: 4
   ->  BitmapAnd (actual time=16.422..16.422 rows=4 loops=1)
         ->  Bitmap Index Scan on tenk1_unique2 (actual time=13.315..13.315 rows=1000 loops=1)
               Index Cond: (unique2 < 1000)
               Storage Index Read Requests: 1
               Storage Index Read Execution Time: 10.981 ms
               Storage Index Rows Scanned: 1000
         ->  Bitmap Index Scan on tenk1_thous_tenthous (actual time=2.845..2.845 rows=50 loops=1)
               Index Cond: (thousand < 10)
               Storage Index Filter: ((tenthous % 2) = 0)
               Storage Index Read Requests: 1
               Storage Index Read Execution Time: 2.407 ms
               Storage Index Rows Scanned: 100
 Planning Time: 0.894 ms
 Execution Time: 18.702 ms
 Storage Read Requests: 3
 Storage Read Execution Time: 14.559 ms
 Storage Rows Scanned: 1104
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 14.559 ms
 Peak Memory Usage: 180 kB
(28 rows)
```
Jira: DB-10083

Test Plan:

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbBitmapScans'
```

Tested with random query generator

Reviewers: amartsinchyk, tnayak

Reviewed By: tnayak

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D34006
  • Loading branch information
timothy-e committed Apr 11, 2024
1 parent e2877a1 commit ab3d7ea
Show file tree
Hide file tree
Showing 40 changed files with 1,695 additions and 263 deletions.
54 changes: 43 additions & 11 deletions src/postgres/src/backend/commands/explain.c
Original file line number Diff line number Diff line change
Expand Up @@ -1929,6 +1929,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_BitmapIndexScan:
pname = sname = "Bitmap Index Scan";
break;
case T_YbBitmapIndexScan:
pname = sname = "Bitmap Index Scan";
break;
case T_BitmapHeapScan:
pname = sname = "Bitmap Heap Scan";
break;
Expand Down Expand Up @@ -2187,6 +2190,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
break;
case T_BitmapIndexScan:
case T_YbBitmapIndexScan:
{
BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
const char *indexname =
Expand Down Expand Up @@ -2485,7 +2489,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
"Index Cond", planstate, ancestors, es);
if (IsYugaByteEnabled() && es->rpc && es->analyze)
break;
case T_YbBitmapIndexScan:
show_scan_qual(((YbBitmapIndexScan *) plan)->indexqualorig,
"Index Cond", planstate, ancestors, es);
show_scan_qual(((YbBitmapIndexScan *) plan)->yb_idx_pushdown.quals,
"Storage Index Filter", planstate, ancestors, es);
if (es->rpc && es->analyze)
show_yb_rpc_stats(planstate, es);
break;
case T_BitmapHeapScan:
Expand All @@ -2502,22 +2512,44 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_YbBitmapTableScan:
if (((YbBitmapTableScanState *) planstate)->recheck_required)
show_scan_qual(((YbBitmapTableScan *) plan)->bitmapqualorig,
"Recheck Cond", planstate, ancestors, es);
if (((YbBitmapTableScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
planstate, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
{
YbBitmapTableScanState *bitmapscanstate =
(YbBitmapTableScanState *) planstate;
YbBitmapTableScan *bitmapplan = (YbBitmapTableScan *) plan;
List *storage_filter = bitmapscanstate->work_mem_exceeded
? bitmapplan->fallback_pushdown.quals
: bitmapplan->rel_pushdown.quals;
List *local_filter = bitmapscanstate->work_mem_exceeded
? bitmapplan->fallback_local_quals
: plan->qual;

/* Storage filters are applied first, so they are output first. */
if (bitmapscanstate->recheck_required)
show_scan_qual(bitmapplan->recheck_pushdown.quals,
"Storage Recheck Cond", planstate, ancestors,
es);
show_scan_qual(storage_filter, "Storage Filter", planstate,
ancestors, es);

if (bitmapscanstate->recheck_required)
{
show_scan_qual(bitmapplan->recheck_local_quals, "Recheck Cond",
planstate, ancestors, es);
if (bitmapplan->recheck_local_quals)
show_instrumentation_count("Rows Removed by Index Recheck",
2, planstate, es);
}

show_scan_qual(local_filter, "Filter", planstate, ancestors, es);
if (local_filter)
show_instrumentation_count("Rows Removed by Filter", 1,
planstate, es);
if (es->rpc && es->analyze)
show_yb_rpc_stats(planstate, es);
if (es->analyze)
show_ybtidbitmap_info((YbBitmapTableScanState *) planstate, es);
show_ybtidbitmap_info(bitmapscanstate, es);
break;

}
case T_SampleScan:
show_tablesample(((SampleScan *) plan)->tablesample,
planstate, ancestors, es);
Expand Down
2 changes: 1 addition & 1 deletion src/postgres/src/backend/executor/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,6 @@ OBJS = execAmi.o execCurrent.o execExpr.o execExprInterp.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o tqueue.o spi.o \
nodeTableFuncscan.o ybcExpr.o ybcFunction.o ybc_fdw.o ybcModifyTable.o \
nodeYbSeqscan.o nodeYbBitmapTablescan.o
nodeYbBitmapIndexscan.o nodeYbSeqscan.o nodeYbBitmapTablescan.o

include $(top_srcdir)/src/backend/common.mk
6 changes: 6 additions & 0 deletions src/postgres/src/backend/executor/execAmi.c
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
#include "executor/nodeValuesscan.h"
#include "executor/nodeWindowAgg.h"
#include "executor/nodeWorktablescan.h"
#include "executor/nodeYbBitmapIndexscan.h"
#include "executor/nodeYbSeqscan.h"
#include "nodes/nodeFuncs.h"
#include "nodes/relation.h"
Expand Down Expand Up @@ -196,6 +197,11 @@ ExecReScan(PlanState *node)
ExecReScanBitmapIndexScan((BitmapIndexScanState *) node);
break;


case T_YbBitmapIndexScanState:
ExecReScanYbBitmapIndexScan((YbBitmapIndexScanState *) node);
break;

case T_BitmapHeapScanState:
ExecReScanBitmapHeapScan((BitmapHeapScanState *) node);
break;
Expand Down
15 changes: 15 additions & 0 deletions src/postgres/src/backend/executor/execProcnode.c
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@
#include "executor/nodeWindowAgg.h"
#include "executor/nodeWorktablescan.h"
#include "executor/nodeYbBatchedNestloop.h"
#include "executor/nodeYbBitmapIndexscan.h"
#include "executor/nodeYbBitmapTablescan.h"
#include "executor/nodeYbSeqscan.h"
#include "nodes/nodeFuncs.h"
Expand Down Expand Up @@ -237,6 +238,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;

case T_YbBitmapIndexScan:
result = (PlanState *) ExecInitYbBitmapIndexScan((YbBitmapIndexScan *) node,
estate, eflags);
break;

case T_BitmapHeapScan:
result = (PlanState *) ExecInitBitmapHeapScan((BitmapHeapScan *) node,
estate, eflags);
Expand Down Expand Up @@ -525,6 +531,11 @@ MultiExecProcNode(PlanState *node)
result = MultiExecBitmapIndexScan((BitmapIndexScanState *) node);
break;

case T_YbBitmapIndexScanState:
result = MultiExecYbBitmapIndexScan(
(YbBitmapIndexScanState *) node);
break;

case T_BitmapAndState:
result = MultiExecBitmapAnd((BitmapAndState *) node);
break;
Expand Down Expand Up @@ -655,6 +666,10 @@ ExecEndNode(PlanState *node)
ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
break;

case T_YbBitmapIndexScanState:
ExecEndYbBitmapIndexScan((YbBitmapIndexScanState *) node);
break;

case T_BitmapHeapScanState:
ExecEndBitmapHeapScan((BitmapHeapScanState *) node);
break;
Expand Down
54 changes: 7 additions & 47 deletions src/postgres/src/backend/executor/nodeBitmapIndexscan.c
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,10 @@ ExecBitmapIndexScan(PlanState *pstate)
Node *
MultiExecBitmapIndexScan(BitmapIndexScanState *node)
{
TupleBitmap bitmap;
TIDBitmap *bitmap;
IndexScanDesc scandesc;
double nTuples = 0;
bool doscan;
bool is_yb_bitmap_scan;

/* must provide our own instrumentation support */
if (node->ss.ps.instrument)
Expand All @@ -66,9 +65,6 @@ MultiExecBitmapIndexScan(BitmapIndexScanState *node)
*/
scandesc = node->biss_ScanDesc;

is_yb_bitmap_scan = IsYugaByteEnabled() &&
IsYBRelation(scandesc->indexRelation);

/*
* If we have runtime keys and they've not already been set up, do it now.
* Array keys are also treated as runtime keys; note that if ExecReScan
Expand All @@ -92,41 +88,24 @@ MultiExecBitmapIndexScan(BitmapIndexScanState *node)
*/
if (node->biss_result)
{
if (is_yb_bitmap_scan)
{
Assert(IsA(node->biss_result, YbTIDBitmap));
bitmap.ybtbm = node->biss_result;
}
else
bitmap.tbm = node->biss_result;
bitmap = node->biss_result;

node->biss_result = NULL; /* reset for next time */
}
else if (is_yb_bitmap_scan)
{
bitmap.ybtbm = yb_tbm_create(work_mem * 1024L);
}
else
{
/* XXX should we use less than work_mem for this? */
bitmap.tbm = tbm_create(work_mem * 1024L,
((BitmapIndexScan *) node->ss.ps.plan)->isshared
? node->ss.ps.state->es_query_dsa : NULL);
bitmap = tbm_create(work_mem * 1024L,
((BitmapIndexScan *) node->ss.ps.plan)->isshared
? node->ss.ps.state->es_query_dsa : NULL);
}

/*
* Get TIDs from index and insert into bitmap
*/
while (doscan)
{
/*
* For Yugabyte-based index, call the variant of index_getbitmap that
* takes a YbTIDBitmap instead of a TIDBitmap
*/
if (is_yb_bitmap_scan)
nTuples += (double) yb_index_getbitmap(scandesc, bitmap.ybtbm);
else
nTuples += (double) index_getbitmap(scandesc, bitmap.tbm);
nTuples += (double) index_getbitmap(scandesc, bitmap);

CHECK_FOR_INTERRUPTS();

Expand All @@ -142,7 +121,7 @@ MultiExecBitmapIndexScan(BitmapIndexScanState *node)
if (node->ss.ps.instrument)
InstrStopNode(node->ss.ps.instrument, nTuples);

return is_yb_bitmap_scan ? (Node *) bitmap.ybtbm : (Node *) bitmap.tbm;
return (Node *) bitmap;
}

/* ----------------------------------------------------------------
Expand Down Expand Up @@ -223,13 +202,6 @@ ExecEndBitmapIndexScan(BitmapIndexScanState *node)
index_endscan(indexScanDesc);
if (indexRelationDesc)
index_close(indexRelationDesc, NoLock);

if (IsYugaByteEnabled())
{
Relation relation = node->ss.ss_currentRelation;
if (relation)
ExecCloseScanRelation(relation);
}
}

/* ----------------------------------------------------------------
Expand Down Expand Up @@ -351,18 +323,6 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
estate->es_snapshot,
indexstate->biss_NumScanKeys);

if (IsYugaByteEnabled())
{
if (IsYBRelation(indexstate->biss_RelationDesc))
indexstate->ss.ss_currentRelation =
ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);

indexstate->biss_ScanDesc->heapRelation =
indexstate->ss.ss_currentRelation;
indexstate->biss_ScanDesc->yb_exec_params = &estate->yb_exec_params;
indexstate->biss_ScanDesc->fetch_ybctids_only = true;
}

/*
* If no run-time keys to calculate, go ahead and pass the scankeys to the
* index AM.
Expand Down
41 changes: 19 additions & 22 deletions src/postgres/src/backend/executor/nodeBitmapOr.c
Original file line number Diff line number Diff line change
Expand Up @@ -143,36 +143,33 @@ MultiExecBitmapOr(BitmapOrState *node)
*/
if (IsA(subnode, BitmapIndexScanState))
{
bool is_yugabyte = IsYugaByteEnabled() &&
IsYBRelation(
((BitmapIndexScanState *) subnode)->ss.ss_currentRelation);

if (result.tbm == NULL) /* first subplan */
{
if (is_yugabyte)
result.ybtbm = yb_tbm_create(work_mem * 1024L);
else
/* XXX should we use less than work_mem for this? */
result.tbm = tbm_create(work_mem * 1024L,
((BitmapOr *) node->ps.plan)->isshared
? node->ps.state->es_query_dsa
: NULL);
/* XXX should we use less than work_mem for this? */
result.tbm = tbm_create(work_mem * 1024L,
((BitmapOr *) node->ps.plan)->isshared
? node->ps.state->es_query_dsa
: NULL);
}

if (is_yugabyte)
{
((BitmapIndexScanState *) subnode)->biss_result = result.ybtbm;
subresult.ybtbm = (YbTIDBitmap *) MultiExecProcNode(subnode);
}
else
{
((BitmapIndexScanState *) subnode)->biss_result = result.tbm;
subresult.tbm = (TIDBitmap *) MultiExecProcNode(subnode);
}
((BitmapIndexScanState *) subnode)->biss_result = result.tbm;
subresult.tbm = (TIDBitmap *) MultiExecProcNode(subnode);

if (subresult.tbm != result.tbm)
elog(ERROR, "unrecognized result from subplan");
}
/* We do the same for YbBitmapIndexScan children */
else if (IsA(subnode, YbBitmapIndexScanState))
{
if (result.ybtbm == NULL) /* first subplan */
result.ybtbm = yb_tbm_create(work_mem * 1024L);

((YbBitmapIndexScanState *) subnode)->biss_result = result.ybtbm;
subresult.ybtbm = (YbTIDBitmap *) MultiExecProcNode(subnode);

if (subresult.ybtbm != result.ybtbm)
elog(ERROR, "unrecognized result from subplan");
}
else
{
subresult.tbm = (TIDBitmap *) MultiExecProcNode(subnode);
Expand Down
Loading

0 comments on commit ab3d7ea

Please sign in to comment.