Background
Follow-up from code review on the columnar read path introduced for #10844 (Support new columnar storage as data source).
In RNProxyReader::createProxyReader (dbms/src/Storages/StorageDisaggregatedColumnar.cpp), the ColumnarReaderErrorType::RegionError and LockedError branches have several robustness issues that can mis-classify errors or lose diagnostics.
Original review comment: #10842 (comment)
Problems
1. Unchecked ParseFromString
region_error.ParseFromString(error_msg) (and lock_info.ParseFromString(error_msg)) return values are ignored. If the proxy returns a malformed payload, we proceed with a default-constructed protobuf, fall into the else branch of the RegionError handler, and may throw RegionException with NOT_FOUND and no useful diagnostic.
Suggested fix: Check the return value; on failure, log payload size (and optionally a short debug hint) and throw Exception with ErrorCodes::COLUMNAR_SNAPSHOT_ERROR (consistent with other columnar reader error paths that backoff-retry).
2. Lossy region_id_ver on epoch_not_match
In the epoch_not_match branch, region_id_ver is overwritten on every iteration of current_regions(), so RegionException's extra message only reflects the last region even when unavailable_regions contains many.
Additionally, the string currently uses the outer request's region_ver instead of each region's region.region_epoch().version().
Suggested fix: Build a joined diagnostic string (e.g. via FmtBuffer) listing all affected regions as id:version:conf_ver, using each region's epoch from current_regions(). Ensure the string outlives the RegionException constructor (store in a local String before throw).
3. Dead retry_regions variable
retry_regions is populated in both the epoch_not_match and region_not_found sub-branches but never read. This looks like an incomplete port from StorageDisaggregatedRemote.cpp, where retry_regions is passed to dropRegionCache.
Suggested fix (minimal): Remove the dead variable if dropping only region_ver_id (the stale request epoch) is intentional.
Optional follow-up: If split/merge scenarios require invalidating cache for all regions in current_regions(), wire explicit dropRegion calls instead of keeping an unused set (align with pingcap region-cache semantics).
4. (Optional) Tighter classification in the RegionError else branch
When parsing succeeds but the error is neither epoch_not_match nor region_not_found, we still throw RegionException with NOT_FOUND and may leave unavailable_regions empty. Consider mapping known errorpb::Error variants (as in LearnerReadWorker) or using OTHER / COLUMNAR_SNAPSHOT_ERROR for unknown cases, with ShortDebugString() in logs and extra_msg.
Location
- File:
dbms/src/Storages/StorageDisaggregatedColumnar.cpp
- Function:
RNProxyReader::createProxyReader (~lines 461–537)
Acceptance criteria
Background
Follow-up from code review on the columnar read path introduced for #10844 (Support new columnar storage as data source).
In
RNProxyReader::createProxyReader(dbms/src/Storages/StorageDisaggregatedColumnar.cpp), theColumnarReaderErrorType::RegionErrorandLockedErrorbranches have several robustness issues that can mis-classify errors or lose diagnostics.Original review comment: #10842 (comment)
Problems
1. Unchecked
ParseFromStringregion_error.ParseFromString(error_msg)(andlock_info.ParseFromString(error_msg)) return values are ignored. If the proxy returns a malformed payload, we proceed with a default-constructed protobuf, fall into theelsebranch of the RegionError handler, and may throwRegionExceptionwithNOT_FOUNDand no useful diagnostic.Suggested fix: Check the return value; on failure, log payload size (and optionally a short debug hint) and throw
ExceptionwithErrorCodes::COLUMNAR_SNAPSHOT_ERROR(consistent with other columnar reader error paths that backoff-retry).2. Lossy
region_id_veronepoch_not_matchIn the
epoch_not_matchbranch,region_id_veris overwritten on every iteration ofcurrent_regions(), soRegionException's extra message only reflects the last region even whenunavailable_regionscontains many.Additionally, the string currently uses the outer request's
region_verinstead of each region'sregion.region_epoch().version().Suggested fix: Build a joined diagnostic string (e.g. via
FmtBuffer) listing all affected regions asid:version:conf_ver, using each region's epoch fromcurrent_regions(). Ensure the string outlives theRegionExceptionconstructor (store in a localStringbeforethrow).3. Dead
retry_regionsvariableretry_regionsis populated in both theepoch_not_matchandregion_not_foundsub-branches but never read. This looks like an incomplete port fromStorageDisaggregatedRemote.cpp, whereretry_regionsis passed todropRegionCache.Suggested fix (minimal): Remove the dead variable if dropping only
region_ver_id(the stale request epoch) is intentional.Optional follow-up: If split/merge scenarios require invalidating cache for all regions in
current_regions(), wire explicitdropRegioncalls instead of keeping an unused set (align with pingcap region-cache semantics).4. (Optional) Tighter classification in the RegionError
elsebranchWhen parsing succeeds but the error is neither
epoch_not_matchnorregion_not_found, we still throwRegionExceptionwithNOT_FOUNDand may leaveunavailable_regionsempty. Consider mapping knownerrorpb::Errorvariants (as inLearnerReadWorker) or usingOTHER/COLUMNAR_SNAPSHOT_ERRORfor unknown cases, withShortDebugString()in logs and extra_msg.Location
dbms/src/Storages/StorageDisaggregatedColumnar.cppRNProxyReader::createProxyReader(~lines 461–537)Acceptance criteria
ParseFromStringfailures are handled explicitly for both RegionError and LockedError payloadsepoch_not_matchexceptions include complete, correct epoch info for all regions incurrent_regions()retry_regions(either removed or used for cache invalidation)NOT_FOUND