-
Notifications
You must be signed in to change notification settings - Fork 164
Description
Is your feature request related to a problem? Please describe.
The StarRocks connector currently uses only the MySQL wire protocol (port 9030) for all query execution. The MySQL protocol serializes data row-by-row in a text-based format, which becomes a significant performance bottleneck for large result sets commonly encountered in dashboard queries:
- 10K+ rows: MySQL serialization overhead dominates query latency, with ~54ms for 100K rows at full table scan.
- High memory allocations: The row-based deserialization path produces ~1.2M allocations per 100K-row query, increasing GC pressure in the Rill runtime process.
- Throughput ceiling: MySQL protocol tops out at ~1.8M rows/sec for typical analytical queries, limiting dashboard responsiveness for large datasets.
StarRocks 3.0+ natively supports Apache Arrow Flight SQL, a columnar transport protocol designed for high-throughput analytical workloads, but Rill does not currently leverage it.
Describe the solution you'd like
Add Arrow Flight SQL as an opt-in query transport for the StarRocks connector, configurable via a transport: "flight_sql" property. The implementation should:
-
Route
Query()andQuerySchema()through Arrow Flight SQL when configured, while keepingExec()andDryRunon MySQL (DDL/DML operations don't benefit from columnar transfer). -
Automatically fall back to MySQL for parameterized queries (
stmt.Args), since Arrow Flight SQL does not support query parameter binding at the protocol level. -
Handle StarRocks' FE/BE split architecture: FE handles
Execute()(query planning), while BE nodes serveDoGet()(data retrieval). The connector must parse endpoint Location URIs fromFlightInfoto routeDoGetto the correct BE node. -
Pool BE clients to avoid per-query gRPC connection overhead, with eviction on connection failure for resilience during rolling restarts.
-
Limit concurrency via a configurable semaphore (
flight_sql_max_conns) to prevent exhausting StarRocks FE's per-user connection limit. -
Support
ExecutionTimeoutviacontext.WithTimeout, consistent with other OLAP drivers (ClickHouse, Druid, Pinot). -
Provide a
flight_sql_be_addroverride for environments where BE nodes advertise internal addresses not reachable from the client (e.g., Docker, NAT).
Expected configuration
type: connector
driver: starrocks
host: "starrocks-fe.example.com"
port: 9030
username: "analyst"
password: "{{ .env.STARROCKS_PASSWORD }}"
database: "my_database"
transport: "flight_sql" # "mysql" (default) or "flight_sql"
flight_sql_port: 9408 # FE Arrow Flight SQL port (default: 9408)
flight_sql_be_addr: "host:port" # Optional: override BE address for NAT/Docker
flight_sql_max_conns: 100 # Optional: max concurrent Flight SQL queries (default: 100)Expected performance improvement
| Query Size | MySQL | Flight SQL | Improvement |
|---|---|---|---|
| 1 row | 0.67ms | 2.28ms | MySQL 3.4x faster (gRPC overhead) |
| 100 rows | 2.75ms | 2.94ms | ~same |
| 1K rows | 3.69ms | 2.97ms | Flight SQL 1.2x faster |
| 10K rows | 9.55ms | 4.87ms | Flight SQL 2.0x faster |
| 100K rows | 54.5ms | 19.5ms | Flight SQL 2.8x faster |
Crossover point is ~1K rows. Below this threshold, MySQL is faster due to gRPC connection overhead.
Describe alternatives you've considered
-
MySQL protocol with compression: MySQL supports
zlibandzstdcompression, which could reduce network overhead. However, this does not address the row-by-row serialization cost on the client side, nor the high allocation count. Benchmarks with compression showed <10% improvement for typical analytical queries. -
JDBC with Arrow extensions: Some JDBC drivers (e.g., Snowflake) support fetching results as Arrow batches. StarRocks' JDBC driver does not currently support this, and JDBC adds Java dependency overhead.
Additional context
Requirements
- StarRocks 4.0.4+: Earlier versions have Arrow Flight serialization bugs for certain types (e.g., VARBINARY). See StarRocks PR #65889.
- FE/BE config:
arrow_flight_portmust be set in bothfe.conf(default 9408) andbe.conf(default 9419). - Network: FE Arrow Flight port must be accessible from the Rill runtime. BE Arrow Flight port must also be accessible unless FE proxy mode is enabled (
arrow_flight_proxy_enabled = trueinfe.conf).
Architecture
Query() / QuerySchema()
|
+-- ExecutionTimeout -> context.WithTimeout (if set)
|
+-- transport=mysql (default) -> MySQL protocol (port 9030)
|
+-- transport=flight_sql
+-- stmt.Args non-empty? -> fallback to MySQL
+-- flightSem.Acquire() -> concurrency limit
+-- Execute() -> FE (port 9408)
+-- DoGet() -> BE (via endpoint Location URI, with pooled clients)
+-- on failure: evict cached client, next query reconnects
Exec() / DryRun -> Always MySQL
Full benchmark results (100K rows, full table scan)
| Metric | MySQL | Flight SQL | Improvement |
|---|---|---|---|
| Throughput | 1.79M rows/sec | 4.77M rows/sec | 2.7x faster |
| Memory | 17.2MB/op | 12.7MB/op | 26% less |
| Allocations | 1.2M allocs/op | 469K allocs/op | 61% fewer |
Scope
In scope:
transportconfig property (mysql/flight_sql)- Flight SQL client initialization with Basic Auth over gRPC
- DoGet BE routing via endpoint Location URIs
- BE client pooling with eviction on failure
- Arrow RecordBatch to
drivers.Resulttype conversion flightRowsimplementingRowsinterface (Next/MapScan/Scan/Close)- Concurrency limiter (
flight_sql_max_connssemaphore) ExecutionTimeoutsupport- Automatic MySQL fallback for parameterized queries
flight_sql_be_addroverride for Docker/NAT environments- Unified integration tests (MySQL + FlightSQL, single container)
- Performance benchmarks
Out of scope:
- Routing
Exec()through Flight SQL (no performance benefit for DDL/DML) - Multi-endpoint consumption (StarRocks returns single endpoint for normal SQL queries)
- TLS certificate pinning for BE connections (follows FE SSL setting)