Skip to content

Project structure

Sergei Turukin edited this page May 20, 2025 · 1 revision

Project structure

This document provides a comprehensive overview of the Embucket project structure, with a particular focus on the crates/ directory, which contains the core components of this Snowflake-compatible lakehouse platform.

Overview

Embucket is organized as a Rust workspace containing multiple crates, each with a specific responsibility. The project follows a modular architecture with clear separation of concerns, allowing components to be developed and tested independently.

The project structure is organized as follows:

embucket/
├── crates/         # Core Rust crates (libraries and binaries)
├── test/           # Test frameworks and SQL query tests
└── ui/             # Frontend user interface code

Crates directory

The crates/ directory is the heart of the project, containing all Rust code organized into focused, single-responsibility libraries. These crates can be broadly categorized into several groups:

API crates (api-*)

API crates provide REST API interfaces for different clients and protocols:

  • api-iceberg-rest: Implements the Iceberg REST catalog API, enabling standard Iceberg clients to interact with Embucket's tables and metadata.

  • api-snowflake-rest: Provides a Snowflake V1 SQL API compatible interface, allowing tools built for Snowflake to connect to Embucket without modification.

  • api-internal-rest: Implements internal REST endpoints used for application-specific operations and management, not typically exposed to end users.

  • api-sessions: Manages user and API sessions, including authentication, session creation, storage, and retrieval. Used in api-ui and api-snowflake-rest to automatically create sessions (as defined in core-executor) for users.

  • api-ui: Backend implementation for the Embucket web user interface, handling UI-specific logic and serving API endpoints for the frontend.

  • api-ui-static-assets: Bundles compiled frontend assets (CSS, JavaScript, images) into the application at compile time, making them available to the server.

Core crates (core-*)

Core crates handle fundamental functionality and abstractions:

  • core-executor: The central query execution engine built on Apache DataFusion, responsible for SQL parsing, planning, optimization, and execution.

  • core-metastore: Manages metadata persistence and provides abstractions for interacting with the underlying storage system, defining models for catalogs/volumes, databases, schemas, tables, etc.

  • core-history: Records and manages query execution history for auditing, monitoring, and user reference. Provides persistence for query results and worksheets models.

  • core-utils: Provides common utility functions, data structures, and helpers used across multiple crates.

DataFusion extensions (df-*)

These crates extend and customize the Apache DataFusion query engine:

  • df-catalog: Implements DataFusion's CatalogProvider and related traits, bridging Embucket's metadata store based internal catalog (core-metastore) and external catalogs (currently only AWS S3 tables are supported) with the query engine.

  • df-builtins: Defines custom User-Defined Functions (UDFs) and User-Defined Aggregate Functions (UDAFs).

Main application (embucketd)

  • embucketd: The main executable that brings everything together, initializing components, orchestrating services, and providing the main entry point for the Embucket server.

UI structure

The UI code in the ui/ directory is organized as a modern TypeScript/React application with components, hooks, and modules that interact with the backend APIs.