-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] YBOperation.partition_key encoding should respect null value #12190
Comments
Another example.
2 Create a table and a unique index
Tablet and tablets info after (1) and (2) have been executed:
Shortened dump:
Last eight records in hexadecimal format:
Please mention how null is persisted on disk – it is
Index tablets info after splitting has been executed:
Tablet partition boundary in hex: Table Content of tablets is looking as expected with respect to split key (matches partitions bound, see above):
The rows count gives the following results:
But it is expected to have |
…ith range partitioning Summary: Issue #12190 gives a potential data loss, a temporary disabling for both dynamic and manual splitting for index tables with range partitioning is required at master server side. When #12190 and #12191 are done tablet splitting should be enabled for those tables. Additional tablet splitting disabling is added at tablet server side, other checks and cases will be covered in #12189. Additionally, an index versioning support is implemented by adding a new field `partition_key_version` into schema's `TableProperties` and `TablePropertiesPB` -- the default version should be updated in the context of #12190, after that tablet splitting becomes enabled automatically for index tables with range partitioning. Test Plan: Main: 1. unit test: ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "PgTabletSplitTest.ManualSplitIndexTablet" 2. manual test: 2.1) create a cluster ``` ./bin/yb-ctl create --master_flags="enable_automatic_tablet_splitting=true,\ tablet_split_low_phase_shard_count_per_node=16,\ tablet_split_low_phase_size_threshold_bytes=13421" \ --tserver_flags="yb_num_shards_per_tserver=1" ``` 2.2) run `./bin/ysqlsh` and execute ``` create table employees(id int primary key, name text, description text); create index idx1 on employees(description ASC); ``` 2.3) open tablet server web ui and make sure only one tablet exists for index table 2.4) download `collect_insert_data.txt` from #10926 and rename to `collect_insert_data.py` 2.5) execute `./collect_insert_data.py` to generate sql script 2.6) execute `./bin/ysqlsh -f sql_data.sql` 2.7) make sure only one tablet still exists for index table, while the table has several tablets 2.8) flush `idx1` index tablet: `./bin/yb-ts-cli flush_tablet <UUID>` 2.9) try manual split: `./bin/yb-admin split_tablet -master_addresses localhost:7100 <UUID>` the following error should appear: ``` Error running split_tablet: Not implemented (yb/master/tablet_split_manager.cc:176): Unable to start split of tablet <UUID>: Tablet splitting is not supported for the index table "idx1" with table_id "000033e5000030008000000000004005". Please, rebuild the index! ``` 3. Additional tests should not give unexpected results: ybd --cxx-test tablet_tablet-split-test ybd --cxx-test integration-tests_tablet-split-itest ybd --cxx-test integration-tests_tablet_server-itest ybd --cxx-test integration-tests_cql-tablet-split-test ybd --cxx-test integration-xcluster-tablet-split-itest ybd --cxx-test common_schema-test --gtest_filter TestSchema.TestSchema ybd --cxx-test common_schema-test --gtest_filter TestSchema.TestCreateProjection ybd --cxx-test master_catalog_manager-test --gtest_filter "TestCatalogManager.CheckIfCanDeleteSingleTablet" ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "PgTabletSplitTest.SplitDuringLongRunningTransaction" ybd --cxx-test pgwrapper_pg_mini-test --gtest_filter "PgMiniTest.TabletSplitSecondaryIndexYSQL" Reviewers: timur, rthallam Reviewed By: timur, rthallam Subscribers: zyu, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D16738
… index tables with range partitioning Summary: Issue #12190 gives a potential data loss, a temporary disabling for both dynamic and manual splitting for index tables with range partitioning is required at master server side. When #12190 and #12191 are done tablet splitting should be enabled for those tables. Additional tablet splitting disabling is added at tablet server side, other checks and cases will be covered in #12189. Additionally, an index versioning support is implemented by adding a new field `partition_key_version` into schema's `TableProperties` and `TablePropertiesPB` -- the default version should be updated in the context of #12190, after that tablet splitting becomes enabled automatically for index tables with range partitioning. Original commit: 2ef55c8 / D16738 Test Plan: Main: 1. unit test: ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "PgTabletSplitTest.ManualSplitIndexTablet" 2. manual test: 2.1) create a cluster ``` ./bin/yb-ctl create --master_flags="enable_automatic_tablet_splitting=true,\ tablet_split_low_phase_shard_count_per_node=16,\ tablet_split_low_phase_size_threshold_bytes=13421" \ --tserver_flags="yb_num_shards_per_tserver=1" ``` 2.2) run `./bin/ysqlsh` and execute ``` create table employees(id int primary key, name text, description text); create index idx1 on employees(description ASC); ``` 2.3) open tablet server web ui and make sure only one tablet exists for index table 2.4) download `collect_insert_data.txt` from #10926 and rename to `collect_insert_data.py` 2.5) execute `./collect_insert_data.py` to generate sql script 2.6) execute `./bin/ysqlsh -f sql_data.sql` 2.7) make sure only one tablet still exists for index table, while the table has several tablets 2.8) flush `idx1` index tablet: `./bin/yb-ts-cli flush_tablet <UUID>` 2.9) try manual split: `./bin/yb-admin split_tablet -master_addresses localhost:7100 <UUID>` the following error should appear: ``` Error running split_tablet: Not implemented (yb/master/tablet_split_manager.cc:176): Unable to start split of tablet <UUID>: Tablet splitting is not supported for the index table "idx1" with table_id "000033e5000030008000000000004005". Please, rebuild the index! ``` 3. Additional tests should not give unexpected results: ybd --cxx-test tablet_tablet-split-test ybd --cxx-test integration-tests_tablet-split-itest ybd --cxx-test integration-tests_tablet_server-itest ybd --cxx-test integration-tests_cql-tablet-split-test ybd --cxx-test integration-xcluster-tablet-split-itest ybd --cxx-test common_schema-test --gtest_filter TestSchema.TestSchema ybd --cxx-test common_schema-test --gtest_filter TestSchema.TestCreateProjection ybd --cxx-test master_catalog_manager-test --gtest_filter "TestCatalogManager.CheckIfCanDeleteSingleTablet" ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "PgTabletSplitTest.SplitDuringLongRunningTransaction" ybd --cxx-test pgwrapper_pg_mini-test --gtest_filter "PgMiniTest.TabletSplitSecondaryIndexYSQL" Reviewers: rthallam, timur Reviewed By: rthallam, timur Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D17001
…pect null value Summary: #12190: YBOperation.partition_key encodes NULLs with no respect to column's sorting type. This leads to a situation when YBOperation with such partition_key is forwarded to a wrong tablet. In case of index tables such incorrect forwarding leads to a data loss for the dynamic tablet splitting: the YBOperation is skipped during applying of intents due to filtering out by tablet's key_bounds. The intent of the fix is to update YBOperation.partition_key generation algo to take column's sorting type into account and make the operation be forwarded correctly. #12191: Index tables contain a hidden system column `ybuniqueidxkeysuffix` or `ybidxbasectid` in a primary key that is not taken into account for partitions calculation for `split at` statement. This leads to a structure inconsistency between tablet partition bounds, tablet key bouds (splitting related, row's DocKey structure and YBOperation.partition_key, while the docdb layer expects all these entities structure to be the same. The fix extends tablet partition bound keys to set up `-Inf` for the hidden column explicitly -- same way it is done for the range columns whose values are not explicitly specified by `split at` statement, refer to `YBTransformPartitionSplitPoints()` Additionally, `GetRangeComponents()` fixed to remove a small regression added with D15429 (commit a2aa0d3): originally `docdb::ValueType::kHighest` had been added to the `upper_bounds` after `GetRangeComponents ()` returned the range components, but due to mentioned update in the method, `docdb::ValueType::kHighest` was begun to be added after each range component but the last one (!). This change also returns the original intention to add `docdb::ValueType::kHighest` only once at the end, confirmed with @tnayak. Additionally, partitioning_version is used for backward compatibility, to distinguish old and new tables to be able to read incorrectly persisted data in the past (but the splitting of such old tablets is kept disabled). The issue and the fix are mainly impact index tables and regular tables cannot have NULLs in a primary key. Test Plan: ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "*PgPartitioningVersionTest.IndexRowsPersistenceAfterManualSplit/*" ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "*PgPartitioningVersionTest.UniqueIndexRowsPersistenceAfterManualSplit/*" ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "*PgPartitioningVersionTest.ManualSplit/*" ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "*PgPartitioningVersionTest.SplitAt/*" ybd --cxx-test tools_yb-backup-test_ent --gtest_filter "YBBackupTest.TestYCQLPartitioningVersion" ybd --cxx-test tools_yb-backup-test_ent --gtest_filter "YBBackupTest.TestYSQLPartitioningVersion" ybd --cxx-test common_schema-test --gtest_filter TestSchema.TestSchema ybd --cxx-test common_schema-test --gtest_filter TestSchema.TestCreateProjection ybd --cxx-test pgwrapper_pg_mini-test --gtest_filter "PgMiniTest.TabletSplitSecondaryIndexYSQL" Reviewers: timur, rskannan, tnayak, yguan, oleg Reviewed By: tnayak, oleg Subscribers: zyu, ybase, rthallam, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D17276
keeping opened to not forget to backport after version mismatch issue (#13168) is fixed. |
…ding should respect null value Summary: Original commit: 61985b6 / D17276 [#12190]: YBOperation.partition_key encodes NULLs with no respect to column's sorting type. This leads to a situation when YBOperation with such partition_key is forwarded to a wrong tablet. In case of index tables such incorrect forwarding leads to a data loss for the dynamic tablet splitting: the YBOperation is skipped during applying of intents due to filtering out by tablet's key_bounds. The intent of the fix is to update YBOperation.partition_key generation algo to take column's sorting type into account and make the operation be forwarded correctly. [#12191]: Index tables contain a hidden system column ybuniqueidxkeysuffix or ybidxbasectid in a primary key that is not taken into account for partitions calculation for split at statement. This leads to a structure inconsistency between tablet partition bounds, tablet key bouds (splitting related, row's DocKey structure and YBOperation.partition_key, while the docdb layer expects all these entities structure to be the same. The fix extends tablet partition bound keys to set up -Inf for the hidden column explicitly -- same way it is done for the range columns whose values are not explicitly specified by split at statement, refer to YBTransformPartitionSplitPoints() Additionally, GetRangeComponents() fixed to remove a small regression added with D15429 (commit a2aa0d3): originally docdb::ValueType::kHighest had been added to the upper_bounds after GetRangeComponents () returned the range components, but due to mentioned update in the method, docdb::ValueType::kHighest was begun to be added after each range component but the last one (!). This change also returns the original intention to add docdb::ValueType::kHighest only once at the end, confirmed with @tnayak. Additionally, partitioning_version is used for backward compatibility, to distinguish old and new tables to be able to read incorrectly persisted data in the past (but the splitting of such old tablets is kept disabled). The issue and the fix are mainly impact index tables and regular tables cannot have NULLs in a primary key. Test Plan: ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "*PgPartitioningVersionTest.IndexRowsPersistenceAfterManualSplit/*" ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "*PgPartitioningVersionTest.UniqueIndexRowsPersistenceAfterManualSplit/*" ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "*PgPartitioningVersionTest.ManualSplit/*" ybd --cxx-test pgwrapper_pg_tablet_split-test --gtest_filter "*PgPartitioningVersionTest.SplitAt/*" ybd --cxx-test tools_yb-backup-test_ent --gtest_filter "YBBackupTest.TestYCQLPartitioningVersion" ybd --cxx-test tools_yb-backup-test_ent --gtest_filter "YBBackupTest.TestYSQLPartitioningVersion" ybd --cxx-test common_schema-test --gtest_filter TestSchema.TestSchema ybd --cxx-test common_schema-test --gtest_filter TestSchema.TestCreateProjection ybd --cxx-test pgwrapper_pg_mini-test --gtest_filter "PgMiniTest.TabletSplitSecondaryIndexYSQL" Reviewers: oleg, rthallam Reviewed By: oleg, rthallam Subscribers: tnayak, ybase, asrivastava, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D20627
Backported to 2.14, hence closing. |
Jira Link: [DB-363](https://yugabyte.atlassian.net/browse/DB-363)
Description
Let's assume the following example
and encoded partition bound looks like
5370000021
in hexIn accordance with documentation column
ybuniqueidxkeysuffix
may be ofkNullLow
,kNullHigh
orbase table ctid
values. Hence the following possibleYBOperation.partition_key
should be concerned as correctly encoded values (in hex):In all of these cases the write operation will be forwarded to a correct tablet as all possible values for the first byte are greater than the encoded value of
kGroupEnd == 0x21
.But as of now, first and second options are encoded with
537000000021
<5370000021
. Mainly, this is happening becauseQLValuePB::VALUE_NOT_SET
is treated askLowest
for aYBOperation.partition_key
, but should not. There is a stack of calls that leads to the inconsistency:GetRangeComponents()
GetRangePartitionKey()
InitWritePartitionKey()
The fact the
YBOperation.partition_key
has incorrect format is also confirmed by a manual decoding (via unit test), the following error appears:Error when decoding range components of a document key: Expected a primitive value type, got kLowest.
Note. Refer to KeyEntryValue::NullValue() and PostgreSQL documentation for details on how null should be encoded. By default NULL LAST approach is used or ASC ordering and NULL FIRST for DESC ordering.
Step to reproduce the issue:
enable_automatic_tablet_splitting=true
:Tablets info after statements have been executed:
YBOperation is forwarded to the first tablet
d75c1be78e94456d9803758d6650a897
asYBOperation.partition_key
is less than the second partition lower bound:537000000021
<5370000021
. But the operation should have been forwarded to the second tablet in accordance with range bounds. Tablets dump confirms the inserted row is persisted in the first tablet (with respect to DocDB format):The text was updated successfully, but these errors were encountered: