Support `STORING` columns INDEX #18616

zhangjinpeng87 · 2020-07-16T03:15:58Z

Feature Request

Is your feature request related to a problem? Please describe:

Assume we have a table named t, and there is a secondary index on column col3 and col4:

CREATE TABLE t (
  id INT primary key,
  col2 INT,
  col3 INT,
  col4 char(32), 
  key idx_col2_col3(col2, col3),
);

SELECT col4 from t WHERE col2=1 AND col3>5, this query will search the related id by the index idx_col2_col3 at first, and then read the value of col4, we call this process as double read.

Describe the feature you'd like:

If we store the column col4 in the index idx_col2_col3, then we can read the value of col4 directly. The CREATE statement looks like that:

CREATE TABLE t (
  id INT primary key,
  col2 INT,
  col3 INT,
  col4 char(32), 
  key idx_col2_col3(col2, col3) STORING (col4),
);

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

The text was updated successfully, but these errors were encountered:

zz-jason · 2020-07-24T04:42:54Z

@zhangjinpeng1987 Are there any other references about the storing column index?

scsldb · 2020-08-27T03:50:39Z

col4 add to idx_col2_col3 index, maybe the problem is solved.

zhangjinpeng87 · 2020-08-30T13:38:40Z

@scsldb They are different if there are several columns that need to retrieve or the value of these columns is large. Add columns to index means these columns will be encoded to the index key(large keys are not friendly to the storage engine), but if storing columns, these columns will be stored in the value.

@zz-jason https://www.cockroachlabs.com/docs/stable/indexes.html#storing-columns

zhangjinpeng87 · 2020-09-08T07:17:45Z

@scsldb @zz-jason Cloud-Spanner also has such a feature

Following description copy from https://cloud.google.com/spanner/docs/whitepapers/optimizing-schema-design

_STORING index clause
Secondary indexes allow you to find rows by attributes other than the primary key. If all the data requested is in the index itself, it can be consulted on its own without reading the primary record. This can save significant resources as no join is required.

Unfortunately, index keys are limited to 16 in number and 8 KiB in aggregate size, restricting what can be put in them. To compensate for these limitations, Cloud Spanner has the ability to store extra data in any index, via the STORING clause. STORING a column in an index results in its values being duplicated, with a copy stored in the index. You can think of an index with STORING as a simple single table materialized view (views are not natively supported in Cloud Spanner at this time).

Another useful application of STORING is as part of a NULL_FILTERED index. This allows you to define what is effectively a materialized view of a sparse subset of a table that you can scan efficiently. For example, you might create such an index on the is_unread column of a mailbox to be able to serve the unread messages view in a single table scan, but without paying for a complete copy of every mailbox._

zz-jason · 2020-09-08T07:35:06Z

This feature is awesome:

It reduces the read performance of queries on a secondary index by eliminating a probe operation on the table records.
It allows the user to choose which columns to be stored, i.e. clustered, together with the secondary index.

But in order not to waste the storage and increase the write latency caused by storing unnecessary columns, which columns to be stored in the secondary index should be well-designed.

tirsen · 2021-02-02T14:27:15Z

This indeed looks very interesting!

It will also help in the following scenario:

CREATE TABLE payment (
  id           BIGINT PRIMARY KEY,
  sender_id    BIGINT,
  recipient_id BIGINT,
  amount_cents BIGINT
);

SELECT SUM(amount_cents) AS amount_sent
FROM payment
WHERE sender_id = ?;

SELECT SUM(amount_cents) AS amount_received
FROM payment
WHERE recipient_id = ?;

Both of these queries will do a double read and if the table is very large a large fan out scatter-gather query to fetch all the amount_cents data. Since the scatter-gather query has such a wide fan out we will not get benefit of pushing the aggregation operation down in tikv.

If we added these indexes:

ALTER TABLE payment ADD KEY idx_sender_id (sender_id) STORING (amount_cents);
ALTER TABLE payment ADD KEY idx_recipient_id (recipient_id) STORING (amount_cents);

This is very difficult to achieve in a traditionally sharded system where you can only pick one column to shard by (recipient or sender) and the other side will be punished. Regular clustered primary key indexes such as those that will be released in 5.0 will also not solve for both of these queries, only one.

zhangjinpeng87 added type/feature-request Categorizes issue or PR as related to a new feature. type/performance labels Jul 16, 2020

zz-jason added feature/reviewing This feature request is reviewing by product managers feature/discussing This feature request is discussing among product managers and removed feature/reviewing This feature request is reviewing by product managers labels Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `STORING` columns INDEX #18616

Support `STORING` columns INDEX #18616

zhangjinpeng87 commented Jul 16, 2020

zz-jason commented Jul 24, 2020 •

edited

Loading

scsldb commented Aug 27, 2020

zhangjinpeng87 commented Aug 30, 2020

zhangjinpeng87 commented Sep 8, 2020

zz-jason commented Sep 8, 2020

tirsen commented Feb 2, 2021 •

edited

Loading

Support STORING columns INDEX #18616

Support STORING columns INDEX #18616

Comments

zhangjinpeng87 commented Jul 16, 2020

Feature Request

zz-jason commented Jul 24, 2020 • edited Loading

scsldb commented Aug 27, 2020

zhangjinpeng87 commented Aug 30, 2020

zhangjinpeng87 commented Sep 8, 2020

zz-jason commented Sep 8, 2020

tirsen commented Feb 2, 2021 • edited Loading

Support `STORING` columns INDEX #18616

Support `STORING` columns INDEX #18616

zz-jason commented Jul 24, 2020 •

edited

Loading

tirsen commented Feb 2, 2021 •

edited

Loading