Gaia Console - Knowledge Base Snapshot Generator

A web-based platform for converting raw knowledge base files into deployable vector database snapshots for the Gaia Network. Upload your documents, get AI-powered embeddings, and deploy instantly to your Gaia Node.

Features

Multi-Format Support: Process TXT, Markdown, PDF, and CSV files
AI-Powered Embeddings: Uses gte-Qwen2-1.5B model with 1536-dimensional vectors
WasmEdge Acceleration: High-performance processing when available
Drag & Drop Interface: Intuitive web-based file upload
Real-Time Progress: Live updates during processing
Auto-Deploy Ready: Generates Hugging Face URLs for instant Gaia Node deployment
Cloud & Local: Works with Qdrant Cloud or local instances

Quick Start

Prerequisites

Python 3.8 or higher
Qdrant instance (local or cloud)
Hugging Face account (for uploads)
Optional: WasmEdge for performance optimization

Installation

Clone the repository

git clone https://github.com/your-org/gaia-console.git
cd gaia-console

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

cp .env.example .env
# Edit .env with your configuration

Configure your environment

# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=your_qdrant_api_key

# Hugging Face Configuration
HF_TOKEN=your_hugging_face_token
HF_DATASET_NAME=your_username/your_dataset_name

# Optional Settings
MAX_FILE_SIZE=10485760  # 10MB

Start the server

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Open your browser and navigate to http://localhost:8000

Usage

Basic Workflow

Upload Files: Drag and drop your knowledge base files (TXT, MD, PDF, CSV)
Process: Click "Generate Snapshot" and watch real-time progress
Deploy: Copy the generated Hugging Face URL and configuration commands

Use with Gaia Node:

gaianet config \
  --snapshot YOUR_SNAPSHOT_URL \
  --embedding-url https://huggingface.co/gaianet/gte-Qwen2-1.5B-instruct-GGUF/resolve/main/gte-Qwen2-1.5B-instruct-f16.gguf \
  --embedding-ctx-size 8192

gaianet init
gaianet start

Supported File Formats

Format	Description	Processing Method
TXT	Plain text files	Paragraph-based chunking
MD	Markdown documents	Header-based sectioning
PDF	PDF documents	Converted to Markdown via markitdown
CSV	Tabular data	Row-based processing

File Size Limits

Maximum file size: 10MB per file
No limit on number of files per session
Batch processing supported

Configuration

Environment Variables

Variable	Description	Required	Default
`QDRANT_URL`	Qdrant instance URL	Yes	`http://localhost:6333`
`QDRANT_API_KEY`	Qdrant API key (for cloud)	No	-
`HF_TOKEN`	Hugging Face write token	Yes	-
`HF_DATASET_NAME`	Target HF dataset	Yes	-
`MAX_FILE_SIZE`	Max file size in bytes	No	`10485760`

Setting up Qdrant

Local Qdrant (Docker)

docker run -p 6333:6333 qdrant/qdrant

Qdrant Cloud

Sign up at Qdrant Cloud
Create a cluster
Get your API key and URL
Update your .env file

Setting up Hugging Face

Create a Hugging Face account at huggingface.co
Generate an access token with write permissions
Create a dataset or use existing one for snapshots
Update your .env with token and dataset name

Optional: WasmEdge Setup

For optimal performance, install WasmEdge:

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash
source ~/.bashrc

The application will work without WasmEdge but with reduced performance.

API Reference

Endpoints

GET / - Web interface
POST /process - Process uploaded files
GET /process-stream - Server-sent events for progress updates
GET /check-wasm - Check WasmEdge availability

Process API

curl -X POST "http://localhost:8000/process" \
  -F "files=@document1.txt" \
  -F "files=@document2.pdf" \
  -F "session_id=my_session_123"

Response:

{
  "status": "success",
  "snapshot_url": "https://huggingface.co/datasets/your_dataset/resolve/main/snapshots/snapshot_20241201_143022.tar.gz",
  "message": "Processed 2 files with 150 embeddings"
}

Architecture

System Components

┌─────────────────┐    ┌──────────────┐    ┌─────────────┐
│   Web Browser   │◄──►│  FastAPI     │◄──►│   Qdrant    │
│   (Frontend)    │    │  (Backend)   │    │ (VectorDB)  │
└─────────────────┘    └──────────────┘    └─────────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │  WasmEdge +      │
                    │  gte-Qwen2-1.5B  │
                    │  (AI Processing) │
                    └──────────────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │  Hugging Face    │
                    │  (Storage)       │
                    └──────────────────┘

Processing Pipeline

File Upload → Temporary storage with validation
Format Detection → Route to appropriate processor
Content Extraction → Text extraction and cleaning
Embedding Generation → AI model creates vectors
Vector Storage → Qdrant collection creation
Snapshot Creation → Database export and compression
Upload & Deploy → Hugging Face hosting

Troubleshooting

Common Issues

Problem: "Qdrant connection failed"

# Solution: Check if Qdrant is running
docker ps | grep qdrant
# Or test connection manually
curl http://localhost:6333/collections

Problem: "WasmEdge not available" warning

# Solution: Install WasmEdge (optional but recommended)
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash

Problem: "Hugging Face upload failed"

# Solution: Check token permissions
huggingface-cli whoami
# Ensure dataset exists and you have write access

Problem: PDF processing fails

# Solution: Install additional markitdown dependencies
pip install markitdown[all]

Debug Mode

Enable debug logging:

export PYTHONPATH=.
python -c "
import logging
logging.basicConfig(level=logging.DEBUG)
import main
"

Performance Tips

Use WasmEdge for 3-5x faster embedding generation
Batch related files for better context understanding
Clean documents before upload (remove headers/footers)
Use local Qdrant for faster processing (if possible)

Development

Running Tests

# Install test dependencies
pip install pytest pytest-asyncio httpx

# Run tests
pytest tests/

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes
Add tests for new functionality
Ensure all tests pass: pytest
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Code Style

This project uses:

Black for code formatting
isort for import sorting
flake8 for linting

# Format code
black .
isort .
flake8 .

Security

File size limits prevent DoS attacks
Input validation on all file types
Temporary file cleanup after processing
No sensitive data stored permanently
Secure token handling for external services

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Gaia Network: Official Website

Built with ❤️ for the Gaia Network ecosystem

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.do		.do
static		static
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
download-models.sh		download-models.sh
main.py		main.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
setup.sh		setup.sh
version.py		version.py

License

GaiaNet-AI/gaia-console

Folders and files

Latest commit

History

Repository files navigation