Project structure

This document provides a comprehensive overview of the Embucket project structure, with a particular focus on the crates/ directory, which contains the core components of this Snowflake-compatible lakehouse platform.

Overview

Embucket is organized as a Rust workspace containing multiple crates, each with a specific responsibility. The project follows a modular architecture with clear separation of concerns, allowing components to be developed and tested independently.

The project structure is organized as follows:

embucket/
├── crates/         # Core Rust crates (libraries and binaries)
├── test/           # Test frameworks and SQL query tests
└── ui/             # Frontend user interface code

Crates directory

The crates/ directory is the heart of the project, containing all Rust code organized into focused, single-responsibility libraries. These crates can be broadly categorized into several groups:

API crates (`api-*`)

API crates provide REST API interfaces for different clients and protocols:

api-iceberg-rest: Implements the Iceberg REST catalog API, enabling standard Iceberg clients to interact with Embucket's tables and metadata.
api-snowflake-rest: Provides a Snowflake V1 SQL API compatible interface, allowing tools built for Snowflake to connect to Embucket without modification.
api-internal-rest: Implements internal REST endpoints used for application-specific operations and management, not typically exposed to end users.
api-sessions: Manages user and API sessions, including authentication, session creation, storage, and retrieval. Used in api-ui and api-snowflake-rest to automatically create sessions (as defined in core-executor) for users.
api-ui: Backend implementation for the Embucket web user interface, handling UI-specific logic and serving API endpoints for the frontend.
api-ui-static-assets: Bundles compiled frontend assets (CSS, JavaScript, images) into the application at compile time, making them available to the server.

Core crates (`core-*`)

Core crates handle fundamental functionality and abstractions:

core-executor: The central query execution engine built on Apache DataFusion, responsible for SQL parsing, planning, optimization, and execution.
core-metastore: Manages metadata persistence and provides abstractions for interacting with the underlying storage system, defining models for catalogs/volumes, databases, schemas, tables, etc.
core-history: Records and manages query execution history for auditing, monitoring, and user reference. Provides persistence for query results and worksheets models.
core-utils: Provides common utility functions, data structures, and helpers used across multiple crates.

DataFusion extensions (`df-*`)

These crates extend and customize the Apache DataFusion query engine:

df-catalog: Implements DataFusion's CatalogProvider and related traits, bridging Embucket's metadata store based internal catalog (core-metastore) and external catalogs (currently only AWS S3 tables are supported) with the query engine.
df-builtins: Defines custom User-Defined Functions (UDFs) and User-Defined Aggregate Functions (UDAFs).

Main application (`embucketd`)

embucketd: The main executable that brings everything together, initializing components, orchestrating services, and providing the main entry point for the Embucket server.

UI structure

The UI code in the ui/ directory is organized as a modern TypeScript/React application with components, hooks, and modules that interact with the backend APIs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Project structure

Project structure

Overview

Crates directory

API crates (`api-*`)

Core crates (`core-*`)

DataFusion extensions (`df-*`)

Main application (`embucketd`)

UI structure

Uh oh!

Clone this wiki locally

Project structure

Project structure

Overview

Crates directory

API crates (api-*)

Core crates (core-*)

DataFusion extensions (df-*)

Main application (embucketd)

UI structure

Uh oh!

Clone this wiki locally

API crates (`api-*`)

Core crates (`core-*`)

DataFusion extensions (`df-*`)

Main application (`embucketd`)