Skip to content

[YSQL] Primary Key Selectivity is inaccurate for when the filter condition contains equality on hash key and inequality on range key #12941

@ramsrivatsa

Description

@ramsrivatsa

Jira Link: DB-2677

Description

Consider the tables temp1 and temp2 with the following schemas.

yugabyte=# \d temp1
               Table "public.temp1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 hk     | integer |           | not null |
 rk     | integer |           | not null |
Indexes:
    "temp1_pkey" PRIMARY KEY, lsm (hk HASH, rk ASC)

yugabyte=# \d temp2
               Table "public.temp2"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 hk     | integer |           | not null |
 rk     | integer |           | not null |
Indexes:
    "temp2_pkey" PRIMARY KEY, lsm (hk HASH, rk ASC)

The tables have been populated with the following insert commands.

INSERT INTO temp1  SELECT i%100, i FROM (SELECT generate_series(1, 10000) i) t;

INSERT INTO temp2  SELECT i/100, i FROM (SELECT generate_series(1, 10000) i) t;

Hence the contents of the table temp1 looks like this

 hk |  rk
----+-------
  1 |     1
  2 |     2
  3 |     3
  4 |     4
  5 |     5
  ...
 98 |    98
 99 |    99
  0 |   100
  1 |   101
  2 |   102
  3 |   103
  ...
  ...
 95 |  9995
 96 |  9996
 97 |  9997
 98 |  9998
 99 |  9999
  0 | 10000

while the contents of table temp2 looks like this

 hk  |  rk
-----+-------
   0 |     1
   0 |     2
   0 |     3
   0 |     4
  ....
   1 |   100
   1 |   101
   1 |   102
   1 |   103
  ...
  ...
   2 |   200
   2 |   201
   2 |   202
   2 |   203
   ...
   ...
   ...
  99 |  9996
  99 |  9997
  99 |  9998
  99 |  9999
 100 | 10000

While executing queries with a PK index lookup, selectivities are pretty similar in both situations. And hence the number of rows estimated to be returned is the same for both situations. This is because we multiply the selectivities on the hash component and range component and use them to estimate the number of rows. This needs to be changed.

yugabyte=# EXPLAIN ANALYZE SELECT * FROM temp1 where hk = 1 AND rk < 200;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Index Scan using temp1_pkey on temp1  (cost=0.00..4.23 rows=2 width=8) (actual time=0.415..0.417 rows=2 loops=1)
   Index Cond: ((hk = 1) AND (rk < 200))
 Planning Time: 1.937 ms
 Execution Time: 0.438 ms
(4 rows)

yugabyte=# EXPLAIN ANALYZE SELECT * FROM temp2 where hk = 1 AND rk < 200;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Index Scan using temp2_pkey on temp2  (cost=0.00..4.23 rows=2 width=8) (actual time=1.167..1.202 rows=100 loops=1)
   Index Cond: ((hk = 1) AND (rk < 200))
 Planning Time: 2.422 ms
 Execution Time: 1.230 ms
(4 rows)

Metadata

Metadata

Labels

area/ysqlYugabyte SQL (YSQL)kind/enhancementThis is an enhancement of an existing featurepriority/mediumMedium priority issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions