Strake is a high-performance federated SQL engine built on Apache Arrow DataFusion. It enables users to query across disparate data sources—including PostgreSQL, Parquet, and JSON—using a single SQL interface without the need for data movement or ETL.
📚 Full Documentation: Check out the complete documentation for installation, architecture, and API references.
Strake acts as an "Intelligent Pipe," sitting between your data sources and your analysis tools. It focuses on operational stability, ensuring that federated queries are executed efficiently and safely through aggressive pushdown optimization and memory-limit enforcement.
- GitOps Native: Manage your data mesh configuration as code. Version control your sources, policies, and metrics.
- Developer First: Built for engineers. Type-safe configuration, rich CLI tooling, and local development workflows.
- High Performance: Sub-second latency for federated joins using Apache Arrow.
- Pluggable Sources: Postgres, S3, Local Files, REST, gRPC, and more.
- Enterprise Governance: Row-Level Security (RLS), Column Masking, and OIDC Authentication (Enterprise Edition).
- Python Native: Zero-copy integration with Pandas and Polars via PyO3.
- Observability: Built-in OpenTelemetry tracing and Prometheus metrics.
- Enterprise Features: OIDC, Row-Level Security, and Data Contracts (see Enterprise Edition).
curl -sSfL https://strakedata.com/install.sh | shcargo install --path crates/cli
cargo install --path crates/serverpip install strakeInitialize and apply your data source configuration:
# Initialize a new config
strake-cli init
# Validate configuration
strake-cli validate sources.yaml
# Apply to the metadata store (Sync)
strake-cli apply sources.yaml --forceStrake provides a seamless interface for data scientists and engineers:
import strake
import polars as pl
# Connect using a local configuration file (Embedded Mode)
LOCAL_CONFIG = "config/strake.yaml"
conn = strake.StrakeConnection(LOCAL_CONFIG)
print("\n--- List Tables ---")
print(conn.describe())
print("\n--- Describe Table ---")
print(conn.describe("measurements"))
print("\n--- Query PyArrow Table ---")
data = conn.sql("SELECT * FROM measurements LIMIT 5")
# Convert to Polars DataFrame
print(pl.from_arrow(data))| Component | Description |
|---|---|
| strake-runtime | Orchestration layer (Federation Engine, Sidecar). |
| strake-connectors | Data source implementations (Postgres, S3, REST, etc). |
| strake-sql | SQL Dialects, Query Optimization, and Substrait generation. |
| strake-common | Shared types, configuration, and error handling. |
| strake-server | Arrow Flight SQL server implementation. |
| strake-cli | GitOps CLI for managing data mesh configurations. |
| strake-python | Python bindings for high-performance data access. |
We welcome contributions! Please see our Contributing Guidelines for details on how to get started.
Strake is licensed under the Apache 2.0 license.
