Skip to content

Next-gen columnar: handle empty remote region scan #10848

@JaySon-Huang

Description

@JaySon-Huang

Background

Follow-up from code review on the columnar read path introduced for #10844 (Support new columnar storage as data source).

When region splitting yields no remote regions, RNProxyReadTask::buildProxyReadTask and StorageDisaggregated::readThroughColumnar do not handle the empty-scan case safely.

Original review comment: #10842 (comment)

Problems

1. Divide by zero in buildProxyReadTask

File: dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Function: RNProxyReadTask::buildProxyReadTask (~lines 672–677)

After splitKeyRangesByLocations, if all_remote_regions_by_region is empty:

  • region_num == 0
  • real_num_streams = std::min(num_streams, region_num) == 0
  • regions_per_reader = (region_num + real_num_streams - 1) / real_num_streams divides by zero

This can happen when remote_table_ranges is empty, when DAG remote_regions is empty, or when splitting returns no locations for the given key ranges.

2. readThroughColumnar assumes a non-empty pipeline / group builder

Same file, StorageDisaggregated::readThroughColumnar (both overloads, ~lines 238–321):

If buildProxyReadTaskWithBackoff returns no tasks, the loop over tasks creates no streams/operators. Subsequent code still calls:

  • pipeline.firstStream()->getHeader() (BlockInputStreams path, ~line 261)
  • group_builder.getCurrentHeader() (Pipeline path, ~line 311)

before building DAGExpressionAnalyzer, which crashes on an empty pipeline/group.

Expected behavior

An empty remote region set is a valid scan (zero rows), consistent with DAGStorageInterpreter::executeImpl stage 3:

  • Return early from buildProxyReadTask before computing regions_per_reader when region_num == 0.
  • In readThroughColumnar, before creating the analyzer, install a null source with the table-scan header (genNamesAndTypesForTableScan) — NullBlockInputStream or NullSourceOp — then continue with generated-column placeholder, cast, and filter handling.

Do not paper over this with std::max(real_num_streams, 1) when there are no regions.

Suggested fix

  1. buildProxyReadTask: After collecting regions, if all_remote_regions_by_region.empty(), log and return {} (scan_context is already registered at function entry).
  2. readThroughColumnar: If read_proxy_tasks.empty(), append a null stream/op with header from genNamesAndTypesForTableScan(table_scan); otherwise build tasks as today.
  3. Optional: Skip buildProxyReadTaskWithBackoff when buildRemoteTableRanges() reports region_num == 0, but keep the guard in buildProxyReadTask for the case where ranges are non-empty but split returns no regions.

Acceptance criteria

  • No divide-by-zero when region splitting returns zero regions
  • Empty scan returns a valid zero-row result (null stream/op + analyzer path) instead of crashing
  • Behavior aligned with DAGStorageInterpreter empty pipeline / empty group_builder handling

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions