Skip to content

Next-gen columnar read fails with Schema out of date col:-3 on partition table late materialization #10862

@JaySon-Huang

Description

@JaySon-Huang

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Run the fullstack test in a next-gen columnar environment:

cd tests/fullstack-test-next-gen
./compose.sh exec -T tiflash-cn0 bash -c \
  'cd /tests && ENABLE_NEXT_GEN=true verbose=true ./run-test.sh fullstack-test/mpp/late_materialization_extra_table_id_column.test'

The test case (tests/fullstack-test/mpp/late_materialization_extra_table_id_column.test) does the following:

  1. Create a range-partitioned table test.t(id, age, t time, key(id)).
  2. Bulk-load data, set TiFlash replica, and run ANALYZE.
  3. In a single transaction:
    • insert into test.t values (11, 10, ...), (12, 11, ...);
    • select * from test.t where id > 10;
    • select hour(t) as hour, sum(age) from test.t where id > 10 group by hour;

Key conditions that trigger the bug:

  • Partition table scan with dynamic partition pruning (tidb_partition_prune_mode='dynamic')
  • Late materialization (select * includes virtual column _tidb_tid, column id -3)
  • Uncommitted rows still in memtable (insert + select in the same transaction)

2. What did you expect to see? (Required)

Both queries in the transaction should succeed and return:

+------+------+-----------+
| id   | age  | t         |
+------+------+-----------+
|   11 |   10 | 700:11:11 |
|   12 |   11 | 710:11:11 |
+------+------+-----------+
+------+----------+
| hour | sum(age) |
+------+----------+
|  710 |       11 |
|  700 |       10 |
+------+----------+

The virtual _tidb_tid column (EXTRA_PHYSICAL_TABLE_ID_COL_ID = -3) should be filled by TiFlash locally, not read from storage.

3. What did you see instead (Required)

The select * query fails. TiFlash logs:

read_block failed in tiflash-proxy
Read block from proxy failed

Proxy (kvengine) logs:

ffi_read_block failed: table error Schema out of date: tbl:2568 col:-3 read null for not null column

The failure happens on the partition region that contains the newly inserted rows (large memtable, e.g. region 1297 with ~2.5MB memtable). Other partition regions return 0 rows and do not hit the error. The follow-up group by query can succeed because it does not include column -3 in the scan.

4. What is your TiFlash version? (Required)

Observed on v9.0.0-beta.2.pre-170-g8ec02509ae (next-gen columnar / cloud-storage-engine path).


Root cause analysis (current understanding)

There is an inconsistency between TiFlash and kvengine on how virtual column -3 (_tidb_tid / EXTRA_PHYSICAL_TABLE_ID_COL_ID) is handled in the next-gen columnar read path.

TiFlash side (correct design):

  • genColumnDefinesForDisaggregatedRead() excludes extra_table_id_col_id from columns sent to the read path and only records extra_table_id_index.
  • When deserializing proxy blocks, TiFlash skips column -3 and fills it locally via action.fill(block, physical_table_id).

Bug:

  • RNProxyReader::createProxyReader() builds table_info from table_scan_pb columns and only skips generated columns. It still passes column -3 to fn_get_columnar_reader().
  • kvengine new_schema_from_columns() filters handle column -1 but not -3, so the columnar/row decoder tries to read a not-null column that does not exist in memtable/row data.
  • Decoding fails with read null for not null columnSchemaOutOfDate.

Suggested fix:

  1. TiFlash (primary): In createProxyReader(), skip MutSup::extra_table_id_col_id when building table_info, consistent with genColumnDefinesForDisaggregatedRead().
  2. kvengine (defensive): In new_schema_from_columns() / CloudColumnarReader::new, filter EXTRA_PHYSICAL_TABLE_ID_COL_ID (-3) similar to HANDLE_COL_ID (-1).

Legacy DM read path does not have this issue because virtual columns are handled entirely on the TiFlash side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugThe issue is confirmed as a bug.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions