You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tiflash-cn0 has depends_on: [minio0, tikv0], but this only waits for the TiKV container to start, not for PD cluster bootstrap to finish.
Root cause in code
In contrib/tiflash-columnar-hub/hub-runtime/src/run.rs, Columnar Hub registers its store to PD without retry and panics on the first failure:
pd_client.put_store(store).unwrap_or_else(|err| {panic!("failed to register TiFlash Columnar Hub store {} to PD: {}",
store_id, err
)});
When TiFlash starts in parallel with TiKV, this call can happen before TiKV bootstraps the PD cluster.
2. What did you expect to see? (Required)
TiFlash should start successfully even when it comes up at roughly the same time as TiKV.
Columnar Hub should retry (or wait) until PD reports the cluster is bootstrapped, then register its store.
TiFlash should listen on tcp_port (9000) and integration tests should proceed normally.
3. What did you see instead (Required)
TiFlash aborts during startup (~8 seconds after launch). The proxy thread panics inside Columnar Hub, which triggers SIGABRT in the main process.
Container stdout (compose logs tiflash-cn0):
thread '<unnamed>' panicked at hub-runtime/src/run.rs:1200:13:
failed to register TiFlash Columnar Hub store 1 to PD: cluster 7642623563617450895 is not bootstrapped
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at library/core/src/panicking.rs:218:5:
panic in a function that cannot unwind
thread caused non-unwinding panic. aborting.
TiFlash error log:
[ERROR] [BaseDaemon.cpp:368] ["(from thread 2) Received signal Aborted(6)."]
Stack trace points to run_raftstore_proxy_ffi at contrib/tiflash-columnar-hub/hub-runtime/src/lib.rs:70, called from ProxyStateMachine.h:272.
After the crash, TiFlash does not listen on port 9000. Running tests fails immediately with:
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Build TiFlash with next-gen columnar enabled (includes
contrib/tiflash-columnar-hub, introduced by *: refactor proxy to hub lib for columnar #10849).Copy the binary into the integration-test mount path:
Start a fresh next-gen disaggregated test cluster (TiFlash compute node with
use_columnar = true):cd tests/fullstack-test-next-gen ./compose.sh down --remove-orphans rm -rf data log ./compose.sh up -dWait only a few seconds (do not manually restart TiFlash), then verify TiFlash is down:
Or run any integration test immediately:
Environment notes
tests/docker/next-gen-config/tiflash_cn.tomlwith[flash] use_columnar = true.tests/fullstack-test-next-gen/disagg_tiflash.rocky9.yaml.tiflash-cn0hasdepends_on: [minio0, tikv0], but this only waits for the TiKV container to start, not for PD cluster bootstrap to finish.Root cause in code
In
contrib/tiflash-columnar-hub/hub-runtime/src/run.rs, Columnar Hub registers its store to PD without retry and panics on the first failure:When TiFlash starts in parallel with TiKV, this call can happen before TiKV bootstraps the PD cluster.
2. What did you expect to see? (Required)
tcp_port(9000) and integration tests should proceed normally.3. What did you see instead (Required)
TiFlash aborts during startup (~8 seconds after launch). The proxy thread panics inside Columnar Hub, which triggers
SIGABRTin the main process.Container stdout (
compose logs tiflash-cn0):TiFlash error log:
Stack trace points to
run_raftstore_proxy_ffiatcontrib/tiflash-columnar-hub/hub-runtime/src/lib.rs:70, called fromProxyStateMachine.h:272.After the crash, TiFlash does not listen on port 9000. Running tests fails immediately with:
Workaround (confirms this is a startup race)
After TiKV has bootstrapped PD (typically 15–30 seconds after
compose up), manually restarting TiFlash succeeds:At failure time, PD already has a cluster ID but no bootstrapped store yet; after TiKV bootstrap, PD
/pd/api/v1/storesshows TiKV storeUp.Suggested fix directions
put_storewhen PD returnscluster is not bootstrapped(similar to TiKV startup behavior).4. What is your TiFlash version? (Required)
Related upstream change: #10849 (*: refactor proxy to hub lib for columnar).