-
Notifications
You must be signed in to change notification settings - Fork 3
Project structure
This document provides a comprehensive overview of the Embucket project structure, with a particular focus on the crates/
directory, which contains the core components of this Snowflake-compatible lakehouse platform.
Embucket is organized as a Rust workspace containing multiple crates, each with a specific responsibility. The project follows a modular architecture with clear separation of concerns, allowing components to be developed and tested independently.
The project structure is organized as follows:
embucket/
├── crates/ # Core Rust crates (libraries and binaries)
├── test/ # Test frameworks and SQL query tests
└── ui/ # Frontend user interface code
The crates/
directory is the heart of the project, containing all Rust code organized into focused, single-responsibility libraries. These crates can be broadly categorized into several groups:
API crates provide REST API interfaces for different clients and protocols:
-
api-iceberg-rest
: Implements the Iceberg REST catalog API, enabling standard Iceberg clients to interact with Embucket's tables and metadata. -
api-snowflake-rest
: Provides a Snowflake V1 SQL API compatible interface, allowing tools built for Snowflake to connect to Embucket without modification. -
api-internal-rest
: Implements internal REST endpoints used for application-specific operations and management, not typically exposed to end users. -
api-sessions
: Manages user and API sessions, including authentication, session creation, storage, and retrieval. Used inapi-ui
andapi-snowflake-rest
to automatically create sessions (as defined incore-executor
) for users. -
api-ui
: Backend implementation for the Embucket web user interface, handling UI-specific logic and serving API endpoints for the frontend. -
api-ui-static-assets
: Bundles compiled frontend assets (CSS, JavaScript, images) into the application at compile time, making them available to the server.
Core crates handle fundamental functionality and abstractions:
-
core-executor
: The central query execution engine built on Apache DataFusion, responsible for SQL parsing, planning, optimization, and execution. -
core-metastore
: Manages metadata persistence and provides abstractions for interacting with the underlying storage system, defining models for catalogs/volumes, databases, schemas, tables, etc. -
core-history
: Records and manages query execution history for auditing, monitoring, and user reference. Provides persistence for query results and worksheets models. -
core-utils
: Provides common utility functions, data structures, and helpers used across multiple crates.
These crates extend and customize the Apache DataFusion query engine:
-
df-catalog
: Implements DataFusion'sCatalogProvider
and related traits, bridging Embucket's metadata store based internal catalog (core-metastore
) and external catalogs (currently only AWS S3 tables are supported) with the query engine. -
df-builtins
: Defines custom User-Defined Functions (UDFs) and User-Defined Aggregate Functions (UDAFs).
-
embucketd
: The main executable that brings everything together, initializing components, orchestrating services, and providing the main entry point for the Embucket server.
The UI code in the ui/
directory is organized as a modern TypeScript/React application with components, hooks, and modules that interact with the backend APIs.