A web-based platform for converting raw knowledge base files into deployable vector database snapshots for the Gaia Network. Upload your documents, get AI-powered embeddings, and deploy instantly to your Gaia Node.
- Multi-Format Support: Process TXT, Markdown, PDF, and CSV files
 - AI-Powered Embeddings: Uses gte-Qwen2-1.5B model with 1536-dimensional vectors
 - WasmEdge Acceleration: High-performance processing when available
 - Drag & Drop Interface: Intuitive web-based file upload
 - Real-Time Progress: Live updates during processing
 - Auto-Deploy Ready: Generates Hugging Face URLs for instant Gaia Node deployment
 - Cloud & Local: Works with Qdrant Cloud or local instances
 
- Python 3.8 or higher
 - Qdrant instance (local or cloud)
 - Hugging Face account (for uploads)
 - Optional: WasmEdge for performance optimization
 
- 
Clone the repository
git clone https://github.com/your-org/gaia-console.git cd gaia-console - 
Install dependencies
pip install -r requirements.txt
 - 
Set up environment variables
cp .env.example .env # Edit .env with your configuration - 
Configure your environment
# Qdrant Configuration QDRANT_URL=http://localhost:6333 QDRANT_API_KEY=your_qdrant_api_key # Hugging Face Configuration HF_TOKEN=your_hugging_face_token HF_DATASET_NAME=your_username/your_dataset_name # Optional Settings MAX_FILE_SIZE=10485760 # 10MB
 - 
Start the server
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
 - 
Open your browser and navigate to
http://localhost:8000 
- Upload Files: Drag and drop your knowledge base files (TXT, MD, PDF, CSV)
 - Process: Click "Generate Snapshot" and watch real-time progress
 - Deploy: Copy the generated Hugging Face URL and configuration commands
 - Use with Gaia Node:
gaianet config \ --snapshot YOUR_SNAPSHOT_URL \ --embedding-url https://huggingface.co/gaianet/gte-Qwen2-1.5B-instruct-GGUF/resolve/main/gte-Qwen2-1.5B-instruct-f16.gguf \ --embedding-ctx-size 8192 gaianet init gaianet start
 
| Format | Description | Processing Method | 
|---|---|---|
| TXT | Plain text files | Paragraph-based chunking | 
| MD | Markdown documents | Header-based sectioning | 
| PDF documents | Converted to Markdown via markitdown | |
| CSV | Tabular data | Row-based processing | 
- Maximum file size: 10MB per file
 - No limit on number of files per session
 - Batch processing supported
 
| Variable | Description | Required | Default | 
|---|---|---|---|
QDRANT_URL | 
Qdrant instance URL | Yes | http://localhost:6333 | 
QDRANT_API_KEY | 
Qdrant API key (for cloud) | No | - | 
HF_TOKEN | 
Hugging Face write token | Yes | - | 
HF_DATASET_NAME | 
Target HF dataset | Yes | - | 
MAX_FILE_SIZE | 
Max file size in bytes | No | 10485760 | 
docker run -p 6333:6333 qdrant/qdrant- Sign up at Qdrant Cloud
 - Create a cluster
 - Get your API key and URL
 - Update your 
.envfile 
- Create a Hugging Face account at huggingface.co
 - Generate an access token with write permissions
 - Create a dataset or use existing one for snapshots
 - Update your 
.envwith token and dataset name 
For optimal performance, install WasmEdge:
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash
source ~/.bashrcThe application will work without WasmEdge but with reduced performance.
GET /- Web interfacePOST /process- Process uploaded filesGET /process-stream- Server-sent events for progress updatesGET /check-wasm- Check WasmEdge availability
curl -X POST "http://localhost:8000/process" \
  -F "files=@document1.txt" \
  -F "files=@document2.pdf" \
  -F "session_id=my_session_123"Response:
{
  "status": "success",
  "snapshot_url": "https://huggingface.co/datasets/your_dataset/resolve/main/snapshots/snapshot_20241201_143022.tar.gz",
  "message": "Processed 2 files with 150 embeddings"
}┌─────────────────┐    ┌──────────────┐    ┌─────────────┐
│   Web Browser   │◄──►│  FastAPI     │◄──►│   Qdrant    │
│   (Frontend)    │    │  (Backend)   │    │ (VectorDB)  │
└─────────────────┘    └──────────────┘    └─────────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │  WasmEdge +      │
                    │  gte-Qwen2-1.5B  │
                    │  (AI Processing) │
                    └──────────────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │  Hugging Face    │
                    │  (Storage)       │
                    └──────────────────┘
- File Upload → Temporary storage with validation
 - Format Detection → Route to appropriate processor
 - Content Extraction → Text extraction and cleaning
 - Embedding Generation → AI model creates vectors
 - Vector Storage → Qdrant collection creation
 - Snapshot Creation → Database export and compression
 - Upload & Deploy → Hugging Face hosting
 
Problem: "Qdrant connection failed"
# Solution: Check if Qdrant is running
docker ps | grep qdrant
# Or test connection manually
curl http://localhost:6333/collectionsProblem: "WasmEdge not available" warning
# Solution: Install WasmEdge (optional but recommended)
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bashProblem: "Hugging Face upload failed"
# Solution: Check token permissions
huggingface-cli whoami
# Ensure dataset exists and you have write accessProblem: PDF processing fails
# Solution: Install additional markitdown dependencies
pip install markitdown[all]Enable debug logging:
export PYTHONPATH=.
python -c "
import logging
logging.basicConfig(level=logging.DEBUG)
import main
"- Use WasmEdge for 3-5x faster embedding generation
 - Batch related files for better context understanding
 - Clean documents before upload (remove headers/footers)
 - Use local Qdrant for faster processing (if possible)
 
# Install test dependencies
pip install pytest pytest-asyncio httpx
# Run tests
pytest tests/- Fork the repository
 - Create a feature branch: 
git checkout -b feature/amazing-feature - Make your changes
 - Add tests for new functionality
 - Ensure all tests pass: 
pytest - Commit your changes: 
git commit -m 'Add amazing feature' - Push to the branch: 
git push origin feature/amazing-feature - Open a Pull Request
 
This project uses:
- Black for code formatting
 - isort for import sorting
 - flake8 for linting
 
# Format code
black .
isort .
flake8 .- File size limits prevent DoS attacks
 - Input validation on all file types
 - Temporary file cleanup after processing
 - No sensitive data stored permanently
 - Secure token handling for external services
 
- Issues: GitHub Issues
 - Discussions: GitHub Discussions
 - Gaia Network: Official Website
 
Built with ❤️ for the Gaia Network ecosystem