A high-performance, asynchronous log ingestion service written in Rust. xem-log is designed to receive OTLP/gRPC log streams, optimize them using zero-copy parsing, and persist them as partitioned Apache Parquet files in S3-compatible storage.
The service acts as a bridge between high-frequency log producers and long-term analytical storage. It minimizes overhead by leveraging the Rust ownership model and the Tokio asynchronous runtime.
- Ingestion: Receives logs via gRPC using the OpenTelemetry Protocol (OTLP).
- Processing: Implements zero-copy parsing to transform raw protobuf data into structured internal formats without unnecessary allocations.
- Buffering: Manages an in-memory buffer that triggers a flush based on configurable time-intervals or batch-size thresholds.
- Storage: Encodes batches into Apache Parquet (columnar format) and uploads them to S3/MinIO for efficient downstream analysis.
- Observability: Provides real-time telemetry via a dedicated Prometheus metrics endpoint.
- Asynchronous I/O: Fully powered by
tokioandtonicfor non-blocking network operations. - Zero-Copy Design: Utilizes
serdeand specialized memory management to ensure high throughput and low CPU usage. - Columnar Efficiency: Direct conversion to Parquet ensures that stored logs occupy minimal space and remain highly queryable.
- Containerized & Orchestrated: Multi-stage Docker builds ensure a minimal runtime footprint (Debian Slim), with full orchestration via Docker Compose.
- CI/CD Ready: Integrated GitHub Actions pipeline for automated linting, unit testing, and integration testing with localized infrastructure.
- Rust Toolchain (1.75+ recommended)
- Docker and Docker Compose
-
Configure Environment: Setup the required variables.
cp .env.example .env
-
Start Infrastructure: Spin up MinIO (S3 Emulator) and Prometheus.
cd infra docker-compose up -d -
Launch xem-log: Build and run the ingester.
docker-compose up --build -d
The service will be available at:
- gRPC Ingest:
localhost:4317 - Prometheus Metrics:
localhost:9091/metrics - MinIO Console:
localhost:9001(Credentials:minioadmin/minioadmin)
Configuration is managed via environment variables. You can customize the behavior by editing the .env file in the root directory:
| Variable | Description | Default |
|---|---|---|
XEMLOG_GRPC_PORT |
Port for the OTLP/gRPC listener | 4317 |
XEMLOG_METRICS_PORT |
Port for the Prometheus metrics server | 9090 |
XEMLOG_S3_ENDPOINT |
URL for S3-compatible storage | http://host.docker.internal:9000 |
XEMLOG_S3_BUCKET |
Target bucket for Parquet files | xemlog-bucket |
BATCH_SIZE_THRESHOLD |
Max logs in memory before flushing | 1000 |
The project is designed for extensibility. Planned features include:
- DuckDB Analytical CLI: A lightweight companion tool to perform SQL queries directly on the S3 Parquet files without requiring a full OLAP database.
- Continuous Delivery (CD) Pipeline: Automated GitHub Actions to build and push optimized images to GitHub Container Registry (GHCR).
- Write-Ahead Log (WAL): Implementation of a local persistent buffer to ensure zero data loss in the event of an unexpected service interruption.
- Dynamic Filtering: A DSL (Domain Specific Language) to filter or mask sensitive log data before it reaches the storage layer.
- S3 Partitioning Strategy: Enhanced path logic to organize files by
YYYY/MM/DD/HHfor optimized data discovery.
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for the full text.