-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support STORING
columns INDEX
#18616
Comments
@zhangjinpeng1987 Are there any other references about the |
col4 add to |
@scsldb They are different if there are several columns that need to retrieve or the value of these columns is large. Add columns to index means these columns will be encoded to the index key(large keys are not friendly to the storage engine), but if storing columns, these columns will be stored in the value. @zz-jason https://www.cockroachlabs.com/docs/stable/indexes.html#storing-columns |
@scsldb @zz-jason Cloud-Spanner also has such a feature Following description copy from https://cloud.google.com/spanner/docs/whitepapers/optimizing-schema-design _STORING index clause Unfortunately, index keys are limited to 16 in number and 8 KiB in aggregate size, restricting what can be put in them. To compensate for these limitations, Cloud Spanner has the ability to store extra data in any index, via the STORING clause. STORING a column in an index results in its values being duplicated, with a copy stored in the index. You can think of an index with STORING as a simple single table materialized view (views are not natively supported in Cloud Spanner at this time). Another useful application of STORING is as part of a NULL_FILTERED index. This allows you to define what is effectively a materialized view of a sparse subset of a table that you can scan efficiently. For example, you might create such an index on the is_unread column of a mailbox to be able to serve the unread messages view in a single table scan, but without paying for a complete copy of every mailbox._ |
This feature is awesome:
But in order not to waste the storage and increase the write latency caused by storing unnecessary columns, which columns to be stored in the secondary index should be well-designed. |
This indeed looks very interesting! It will also help in the following scenario:
Both of these queries will do a double read and if the table is very large a large fan out scatter-gather query to fetch all the If we added these indexes:
This is very difficult to achieve in a traditionally sharded system where you can only pick one column to shard by (recipient or sender) and the other side will be punished. Regular clustered primary key indexes such as those that will be released in 5.0 will also not solve for both of these queries, only one. |
Feature Request
Is your feature request related to a problem? Please describe:
Assume we have a table named
t
, and there is a secondary index on columncol3
andcol4
:SELECT col4 from t WHERE col2=1 AND col3>5
, this query will search the relatedid
by the indexidx_col2_col3
at first, and then read the value ofcol4
, we call this process asdouble read
.Describe the feature you'd like:
If we store the column
col4
in the indexidx_col2_col3
, then we can read the value ofcol4
directly. The CREATE statement looks like that:Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
The text was updated successfully, but these errors were encountered: