Skip to content

The Data Layer for Agents. A high-performance federated SQL engine that gives AI agents governed, zero-copy access to your entire data stack (Postgres, S3, APIs).

License

Notifications You must be signed in to change notification settings

strake-data/strake

Strake Logo

Strake

High-Performance Federated SQL Engine

License PRs Welcome


Strake is a high-performance federated SQL engine built on Apache Arrow DataFusion. It enables users to query across disparate data sources—including PostgreSQL, Parquet, and JSON—using a single SQL interface without the need for data movement or ETL.

📚 Full Documentation: Check out the complete documentation for installation, architecture, and API references.


Overview

Strake acts as an "Intelligent Pipe," sitting between your data sources and your analysis tools. It focuses on operational stability, ensuring that federated queries are executed efficiently and safely through aggressive pushdown optimization and memory-limit enforcement.

Key Features

  • GitOps Native: Manage your data mesh configuration as code. Version control your sources, policies, and metrics.
  • Developer First: Built for engineers. Type-safe configuration, rich CLI tooling, and local development workflows.
  • High Performance: Sub-second latency for federated joins using Apache Arrow.
  • Pluggable Sources: Postgres, S3, Local Files, REST, gRPC, and more.
  • Enterprise Governance: Row-Level Security (RLS), Column Masking, and OIDC Authentication (Enterprise Edition).
  • Python Native: Zero-copy integration with Pandas and Polars via PyO3.
  • Observability: Built-in OpenTelemetry tracing and Prometheus metrics.
  • Enterprise Features: OIDC, Row-Level Security, and Data Contracts (see Enterprise Edition).

Quick Start

1. Installation

Quick Install (Linux/macOS)

curl -sSfL https://strakedata.com/install.sh | sh

Install via Cargo (Rust)

cargo install --path crates/cli
cargo install --path crates/server

Python Client

pip install strake

2. Configuration (GitOps)

Initialize and apply your data source configuration:

# Initialize a new config
strake-cli init

# Validate configuration
strake-cli validate sources.yaml

# Apply to the metadata store (Sync)
strake-cli apply sources.yaml --force

3. Usage (Python)

Strake provides a seamless interface for data scientists and engineers:

import strake
import polars as pl

# Connect using a local configuration file (Embedded Mode)
LOCAL_CONFIG = "config/strake.yaml" 
conn = strake.StrakeConnection(LOCAL_CONFIG)

print("\n--- List Tables ---")
print(conn.describe())

print("\n--- Describe Table ---")
print(conn.describe("measurements"))

print("\n--- Query PyArrow Table ---")
data = conn.sql("SELECT * FROM measurements LIMIT 5")
# Convert to Polars DataFrame
print(pl.from_arrow(data))

Project Structure

Component Description
strake-runtime Orchestration layer (Federation Engine, Sidecar).
strake-connectors Data source implementations (Postgres, S3, REST, etc).
strake-sql SQL Dialects, Query Optimization, and Substrait generation.
strake-common Shared types, configuration, and error handling.
strake-server Arrow Flight SQL server implementation.
strake-cli GitOps CLI for managing data mesh configurations.
strake-python Python bindings for high-performance data access.

Contributing

We welcome contributions! Please see our Contributing Guidelines for details on how to get started.

License

Strake is licensed under the Apache 2.0 license.

About

The Data Layer for Agents. A high-performance federated SQL engine that gives AI agents governed, zero-copy access to your entire data stack (Postgres, S3, APIs).

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published