Background
Follow-up from code review on the columnar read path introduced for #10844 (Support new columnar storage as data source).
When region splitting yields no remote regions, RNProxyReadTask::buildProxyReadTask and StorageDisaggregated::readThroughColumnar do not handle the empty-scan case safely.
Original review comment: #10842 (comment)
Problems
1. Divide by zero in buildProxyReadTask
File: dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Function: RNProxyReadTask::buildProxyReadTask (~lines 672–677)
After splitKeyRangesByLocations, if all_remote_regions_by_region is empty:
region_num == 0
real_num_streams = std::min(num_streams, region_num) == 0
regions_per_reader = (region_num + real_num_streams - 1) / real_num_streams divides by zero
This can happen when remote_table_ranges is empty, when DAG remote_regions is empty, or when splitting returns no locations for the given key ranges.
2. readThroughColumnar assumes a non-empty pipeline / group builder
Same file, StorageDisaggregated::readThroughColumnar (both overloads, ~lines 238–321):
If buildProxyReadTaskWithBackoff returns no tasks, the loop over tasks creates no streams/operators. Subsequent code still calls:
pipeline.firstStream()->getHeader() (BlockInputStreams path, ~line 261)
group_builder.getCurrentHeader() (Pipeline path, ~line 311)
before building DAGExpressionAnalyzer, which crashes on an empty pipeline/group.
Expected behavior
An empty remote region set is a valid scan (zero rows), consistent with DAGStorageInterpreter::executeImpl stage 3:
- Return early from
buildProxyReadTask before computing regions_per_reader when region_num == 0.
- In
readThroughColumnar, before creating the analyzer, install a null source with the table-scan header (genNamesAndTypesForTableScan) — NullBlockInputStream or NullSourceOp — then continue with generated-column placeholder, cast, and filter handling.
Do not paper over this with std::max(real_num_streams, 1) when there are no regions.
Suggested fix
buildProxyReadTask: After collecting regions, if all_remote_regions_by_region.empty(), log and return {} (scan_context is already registered at function entry).
readThroughColumnar: If read_proxy_tasks.empty(), append a null stream/op with header from genNamesAndTypesForTableScan(table_scan); otherwise build tasks as today.
- Optional: Skip
buildProxyReadTaskWithBackoff when buildRemoteTableRanges() reports region_num == 0, but keep the guard in buildProxyReadTask for the case where ranges are non-empty but split returns no regions.
Acceptance criteria
Background
Follow-up from code review on the columnar read path introduced for #10844 (Support new columnar storage as data source).
When region splitting yields no remote regions,
RNProxyReadTask::buildProxyReadTaskandStorageDisaggregated::readThroughColumnardo not handle the empty-scan case safely.Original review comment: #10842 (comment)
Problems
1. Divide by zero in
buildProxyReadTaskFile:
dbms/src/Storages/StorageDisaggregatedColumnar.cppFunction:
RNProxyReadTask::buildProxyReadTask(~lines 672–677)After
splitKeyRangesByLocations, ifall_remote_regions_by_regionis empty:region_num == 0real_num_streams = std::min(num_streams, region_num) == 0regions_per_reader = (region_num + real_num_streams - 1) / real_num_streamsdivides by zeroThis can happen when
remote_table_rangesis empty, when DAGremote_regionsis empty, or when splitting returns no locations for the given key ranges.2.
readThroughColumnarassumes a non-empty pipeline / group builderSame file,
StorageDisaggregated::readThroughColumnar(both overloads, ~lines 238–321):If
buildProxyReadTaskWithBackoffreturns no tasks, the loop over tasks creates no streams/operators. Subsequent code still calls:pipeline.firstStream()->getHeader()(BlockInputStreams path, ~line 261)group_builder.getCurrentHeader()(Pipeline path, ~line 311)before building
DAGExpressionAnalyzer, which crashes on an empty pipeline/group.Expected behavior
An empty remote region set is a valid scan (zero rows), consistent with
DAGStorageInterpreter::executeImplstage 3:buildProxyReadTaskbefore computingregions_per_readerwhenregion_num == 0.readThroughColumnar, before creating the analyzer, install a null source with the table-scan header (genNamesAndTypesForTableScan) —NullBlockInputStreamorNullSourceOp— then continue with generated-column placeholder, cast, and filter handling.Do not paper over this with
std::max(real_num_streams, 1)when there are no regions.Suggested fix
buildProxyReadTask: After collecting regions, ifall_remote_regions_by_region.empty(), log and return{}(scan_context is already registered at function entry).readThroughColumnar: Ifread_proxy_tasks.empty(), append a null stream/op with header fromgenNamesAndTypesForTableScan(table_scan); otherwise build tasks as today.buildProxyReadTaskWithBackoffwhenbuildRemoteTableRanges()reportsregion_num == 0, but keep the guard inbuildProxyReadTaskfor the case where ranges are non-empty but split returns no regions.Acceptance criteria
DAGStorageInterpreterempty pipeline / emptygroup_builderhandling