Skip to content

flamehaven01/Flamehaven-Filesearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”₯ FLAMEHAVEN FileSearch

Open-source semantic document search you can self-host in minutes.

CI/CD PyPI Python Versions License: MIT

The lightweight RAG stack that makes your documents searchable in minutes

Quick Start β€’ Features β€’ Documentation β€’ API Reference β€’ Examples


🎯 Why Flamehaven?

⚑ Fast

From zero to production in under 5 minutes. No complex infrastructure.

πŸ”’ Private

100% self-hosted. Your data never leaves your servers.

πŸ’° Affordable

Leverages Gemini's generous free tier. Process thousands of docs free.

πŸ†š Comparison with Alternatives

Feature Flamehaven Pinecone Weaviate Custom RAG
Setup Time < 5 min ~20 min ~30 min Days
Self-Hosted βœ… ❌ βœ… βœ…
Free Tier Generous Limited Yes N/A
Code Complexity Low Medium High Very High
Maintenance Minimal None Medium High
Best For Quick POCs, SMBs Enterprise scale ML teams Full control

✨ Features

Feature Description
πŸ“„ Multi-Format PDF, DOCX, TXT, MD files up to 50MB
πŸ” Semantic Search Natural language queries with AI-powered answers
πŸ“Ž Source Citations Every answer links back to source documents
πŸ—‚οΈ Store Management Organize documents into separate collections
πŸ”Œ Dual Interface Python SDK + REST API with Swagger UI
🐳 Docker Ready One-command deployment with persistence

πŸ†• New in v1.1.0 (Production-Ready)

Feature Description Impact
⚑ LRU Caching 1-hour TTL, 1000 items 99% faster on cache hits (<10ms)
πŸ›‘οΈ Rate Limiting Per-endpoint limits 10/min uploads, 100/min searches
πŸ“Š Prometheus Metrics 17 metrics exported Real-time monitoring & alerting
πŸ”’ Security Headers OWASP-compliant CSP, HSTS, X-Frame-Options
πŸ“ JSON Logging Structured logs ELK/Splunk compatible
🎯 Request Tracing X-Request-ID headers Distributed tracing support

v1.1.0 Highlights: 40-60% cost reduction β€’ Zero critical vulnerabilities β€’ SIDRCE Certified (0.94) β†’ Full Changelog


⚑ Quick Start

1️⃣ Install

pip install flamehaven-filesearch[api]  # ~30 seconds

2️⃣ Set API Key

export GEMINI_API_KEY="your-google-gemini-key"

πŸ’‘ Get your free key at Google AI Studio (2 min signup)

3️⃣ Start Searching

Option A: Python SDK

from flamehaven_filesearch import FlamehavenFileSearch

fs = FlamehavenFileSearch()
fs.upload_file("company-handbook.pdf")

result = fs.search("What is our vacation policy?")
print(result["answer"])
# Expected: "Employees receive 15 days of paid vacation annually..."
print(f"πŸ“Ž Sources: {result['sources'][0]['filename']}, page {result['sources'][0]['page']}")

Option B: REST API

# Start server
flamehaven-api

# Upload (in new terminal)
curl -X POST "http://localhost:8000/upload" -F "file=@handbook.pdf"

# Search
curl "http://localhost:8000/search?q=vacation+policy"

🌐 Interactive Docs: Visit http://localhost:8000/docs

⚠️ Troubleshooting:

  • ModuleNotFoundError: Run pip install -U pip first
  • API errors: Check your key has no spaces
  • More solutions β†’

πŸ’‘ Usage Examples

Organize Documents by Type

fs = FlamehavenFileSearch()

# Separate stores for different contexts
fs.create_store("hr-docs")
fs.create_store("engineering")

fs.upload_file("handbook.pdf", store="hr-docs")
fs.upload_file("api-spec.md", store="engineering")

# Search specific context
result = fs.search("PTO policy", store="hr-docs")

Batch Upload Directory

import glob

for pdf in glob.glob("./documents/*.pdf"):
    print(f"πŸ“€ {pdf}...")
    fs.upload_file(pdf, store="company-docs")

βš™οΈ Configuration

Configure via environment variables or .env file:

Variable Default Description
GEMINI_API_KEY required Your Google Gemini API key
ENVIRONMENT production Logging mode: production (JSON) or development (readable)
DATA_DIR ./data Document storage location
MAX_FILE_SIZE_MB 50 Maximum file size (Gemini limit)
MAX_SOURCES 5 Number of source citations
DEFAULT_MODEL gemini-2.5-flash Gemini model to use
HOST 0.0.0.0 API server host (v1.1.0+)
PORT 8000 API server port (v1.1.0+)
WORKERS 1 Number of workers (v1.1.0+)

Example .env:

GEMINI_API_KEY=AIza...your-key-here
ENVIRONMENT=production        # JSON logs for production
DATA_DIR=/var/flamehaven/data
MAX_SOURCES=3
WORKERS=4                     # Production deployment

β†’ Complete configuration reference


πŸ“¦ Installation

Prerequisites

  • Python 3.8 or higher
  • Google Gemini API key (Get it free)

Install from PyPI

# Basic installation
pip install flamehaven-filesearch

# With API Server
pip install flamehaven-filesearch[api]

Development Setup

git clone https://github.com/flamehaven01/Flamehaven-Filesearch.git
cd Flamehaven-Filesearch
pip install -e .[dev,api]
pytest tests/

🐳 Docker Deployment

Quick Start:

docker run -e GEMINI_API_KEY="your-key" \
  -p 8000:8000 \
  -v flamehaven-data:/app/data \
  flamehaven/filesearch:latest

Docker Compose:

version: '3.8'
services:
  flamehaven:
    image: flamehaven/filesearch:latest
    ports:
      - "8000:8000"
    environment:
      - GEMINI_API_KEY=${GEMINI_API_KEY}
    volumes:
      - ./data:/app/data
    restart: unless-stopped

β†’ Production deployment guide


πŸ“‘ API Reference

Core Endpoints

POST /upload

Upload a document to a store.

Request:

curl -X POST "http://localhost:8000/upload" \
  -F "file=@report.pdf" \
  -F "store=default"

Response:

{
  "filename": "report.pdf",
  "store": "default",
  "file_id": "abc123...",
  "status": "uploaded"
}

GET /search

Search documents with natural language.

Request:

curl "http://localhost:8000/search?q=vacation+policy&store=default&max_sources=3"

Response:

{
  "answer": "Employees receive 15 days of paid vacation annually...",
  "sources": [
    {
      "filename": "handbook.pdf",
      "page": 42,
      "excerpt": "Vacation Policy: All full-time employees..."
    }
  ],
  "query": "vacation policy",
  "model": "gemini-2.5-flash"
}

GET /stores | DELETE /stores/{name}

Manage document stores.

GET /prometheus (v1.1.0+)

Prometheus metrics endpoint for monitoring.

Exported Metrics:

  • HTTP requests, duration, active requests
  • Upload/search counts, duration, results
  • Cache hits/misses, size
  • Rate limit exceeded events
  • System metrics (CPU, memory, disk)

Setup Prometheus scraping:

scrape_configs:
  - job_name: 'flamehaven'
    static_configs:
      - targets: ['localhost:8000']
    metrics_path: /prometheus

GET /metrics (Enhanced in v1.1.0)

Service metrics with cache statistics.

Response:

{
  "stores_count": 3,
  "uptime_seconds": 3600,
  "system": {"cpu_percent": 25.3, "memory_percent": 45.2},
  "cache": {
    "search_cache": {
      "hits": 89,
      "misses": 42,
      "hit_rate_percent": 67.94,
      "current_size": 127,
      "max_size": 1000
    }
  }
}

Error Handling

All errors return:

{
  "error": "Error message",
  "code": "ERROR_CODE"
}
Code Status Solution
FILE_TOO_LARGE 413 Reduce file size or increase limit
INVALID_API_KEY 401 Check your Gemini API key
STORE_NOT_FOUND 404 Create store first
RATE_LIMIT_EXCEEDED 429 Wait or upgrade API plan

Python SDK

from flamehaven_filesearch import FlamehavenFileSearch

fs = FlamehavenFileSearch(
    api_key="your-key",        # Optional if env var set
    data_dir="./data",         # Custom storage
    max_file_size_mb=100       # Override defaults
)

# API methods
fs.create_store(name)
fs.list_stores()
fs.delete_store(name)
fs.upload_file(path, store="default")
fs.search(query, store="default", max_sources=5)

β†’ Complete API documentation


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client     β”‚
β”‚ (SDK/REST)   β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FlamehavenFileSearch Core          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Document Processor            β”‚ β”‚
β”‚  β”‚  β†’ PDF/DOCX/TXT/MD parsing     β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚               β–Ό                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Google Gemini API             β”‚ β”‚
β”‚  β”‚  β†’ Embedding & Generation      β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚               β–Ό                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Store Manager                 β”‚ β”‚
β”‚  β”‚  β†’ SQLite + File System        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β†’ Detailed architecture docs


πŸ“Š Performance

Test Environment: Ubuntu 22.04, 2 vCPU, 4GB RAM, SSD

Operation Time (v1.0.0) Time (v1.1.0) Improvement
Upload 10MB PDF ~5s ~5s Same (no cache)
Search query (first) ~2-3s ~2-3s Same (cache miss)
Search query (repeat) ~2-3s <10ms 99% faster ⚑
Batch 3Γ—5MB ~12s ~12s Same (sequential)

v1.1.0 Caching Impact:

  • Cache Hit Rate: 40-60% (typical usage)
  • Response Time (P50): <100ms (down from 2-3s)
  • API Cost Reduction: 40-60% fewer Gemini calls
  • Throughput: ~100 cached searches/sec (vs ~10 non-cached)

Throughput: ~100 cached searches/sec β€’ ~10 API searches/sec β€’ ~2MB/s processing β†’ Detailed benchmarks


πŸ”’ Security

v1.1.0 Security Features

  • βœ… Path Traversal Protection: File upload sanitization with os.path.basename()
  • βœ… Rate Limiting: Per-endpoint limits prevent abuse (10/min uploads, 100/min searches)
  • βœ… OWASP Security Headers: CSP, HSTS, X-Frame-Options, X-Content-Type-Options
  • βœ… Input Validation: XSS/SQL injection detection, filename sanitization
  • βœ… Request Tracing: X-Request-ID headers for audit trails
  • βœ… Zero Critical CVEs: Patched CVE-2024-47874, CVE-2025-54121 (Starlette)

Best Practices

  • API Keys: Use environment variables, never commit to git
  • Data Privacy: All documents stored locally in DATA_DIR
  • Network: Run behind reverse proxy with SSL in production
  • Encryption: Implement encryption at rest for sensitive docs
  • Monitoring: Track rate_limit_exceeded and errors_total Prometheus metrics

β†’ Security guide β€’ Security audit results


❓ Troubleshooting

Common issues:

Problem Solution
ModuleNotFoundError pip install flamehaven-filesearch[api]
"API key invalid" Verify key: echo $GEMINI_API_KEY
Slow uploads Check file size, enable debug logs
Irrelevant results Reduce max_sources, lower temperature

Debug Mode:

export FLAMEHAVEN_DEBUG=1
flamehaven-api

β†’ Full troubleshooting guide


πŸ—ΊοΈ Roadmap

v1.1.0 (Released 2025-11-13): βœ… Caching β€’ Rate limiting β€’ Security fixes β€’ Monitoring v1.2.0 (Q1 2025): Authentication β€’ Batch API β€’ WebSocket streaming v2.0.0 (Q2 2025): Multi-language β€’ Analytics β€’ Custom embeddings

Recent Releases:

  • v1.1.0: Production-ready with caching, rate limiting, Prometheus metrics
  • v1.0.0: Initial release with core file search capabilities

β†’ Full changelog β€’ Roadmap & voting


🀝 Contributing

We welcome contributions!

Quick start:

  1. Fork & clone the repo
  2. Install: pip install -e .[dev,api]
  3. Create branch: git checkout -b feature/amazing
  4. Add tests & commit changes
  5. Open Pull Request

β†’ Contributing guidelines β€’ Good first issues


πŸ“š Resources

Documentation

  • Wiki - Guides, recipes, best practices
  • API Docs - Interactive Swagger UI
  • Examples - Code samples & use cases

Community

Contact


πŸ™ Acknowledgments

Built with: FastAPI β€’ Google Gemini API β€’ PyPDF2 β€’ python-docx


πŸ“„ License

MIT License - see LICENSE for details.


If Flamehaven helps you, please ⭐ the repo!

Made with ❀️ by the Flamehaven Team

⬆️ Back to Top