This project demonstrates a client-server application for executing SQL read queries using the Apache Arrow Flight SQL protocol. The server uses a DuckDB database file as its backend, and the client communicates with it to fetch query results efficiently.
The core idea is to showcase a minimal but functional implementation of the Flight SQL specification. This provides a high-performance interface for data services that speak SQL.
- Server: Listens for SQL queries from clients.
- Client: Sends a SQL query to the server and prints the results.
- Generator: A utility to create a sample database for the server to use.
The project is organized into three main binaries, defined in Cargo.toml:
duckdb-generator: Creates thedb.duckdbdatabase file and populates it with sample data.sql-flight-server: The main Arrow Flight SQL server that listens for client connections and executes queries againstdb.duckdb.sql-flight-client: A client application that connects to the server, sends a SQL query, and prints the results.
- You must have the Rust and Cargo toolchain installed.
Follow these steps to run the full client-server flow. Each command should be run from the root of the project directory in a separate terminal window.
Step 1: Generate the Database
First, create the DuckDB database file. This command only needs to be run once.
cargo run --bin duckdb-generatorThis will create a file named db.duckdb in the project root, containing a table with sample data.
Step 2: Start the Server
Next, start the Arrow Flight SQL server. It will listen for incoming client connections on 0.0.0.0:50051.
cargo run --bin sql-flight-serverStep 3: Run the Client
Finally, with the server running, you can use the client to execute a SQL query. The client will connect, send the user's query, and print the resulting Arrow RecordBatch to the console.
cargo run --bin sql-flight-client "SELECT books.book_title, authors.author_name FROM books INNER JOIN authors ON books.author_id = authors.author_id"TODO: Need to figure out how to solve the problem of streaming of recordbatches after capturing info.