-
Notifications
You must be signed in to change notification settings - Fork 0
Technical Overview
Complete technical architecture documentation for Indexa.
Indexa is built on a foundation of performance, privacy, and simplicity. The technical stack prioritizes embedded solutions, local processing, and minimal external dependencies to deliver a fast, secure search experience.
| Component | Version | Purpose |
|---|---|---|
| Node.js | 18.0.0+ | Asynchronous I/O, event-driven architecture |
| Fastify | 4.26 | High-performance HTTP server, plugin-based |
| TypeScript | 5.3.3 | Full type safety, ES2022 target |
SQLite Primary Database
Embedded relational database with zero configuration:
- Synchronous bindings for predictable performance
- Asynchronous operations for non-blocking tasks
- Write-Ahead Logging (WAL) mode for improved concurrency
- FTS5 extension with BM25 ranking algorithm
- Built-in full-text search capabilities
- No separate database server required
Redis In-Memory Store
Advanced caching and distributed system coordination:
- Multi-tier caching architecture
- Distributed job queue management
- FIFO list operations for crawl coordination
- Set and hash data structures for efficient lookups
- High-performance read operations
Transformer Models
On-device neural network inference:
- ONNX runtime for cross-platform compatibility
- BGE-base-en-v1.5 model for English text embeddings
- Int8 quantization for 75% memory reduction
- No external API dependencies
- Fully offline operation for privacy
- Zero per-query costs
Vector Search Implementation
Custom approximate nearest neighbor search:
- Hierarchical Navigable Small World (HNSW) graph structure
- Optimized for high-dimensional vector spaces
- O(log N) search complexity
- Custom implementation for full control
- Graph persistence separate from main database
Dufus AI Assistant
Local large language model support:
- Native LLaMA model integration
- Streaming response generation
- Server-Sent Events (SSE) for real-time output
- Context aggregation from search results
- Intent classification for query understanding
- Multi-query decomposition for complex questions
Request Handling
- Automatic retry on transient failures
- Custom timeouts and connection pooling
- HTTP/2 support for performance
- Compression handling (gzip, brotli)
Content Parsing
- jQuery-like HTML parsing for structure extraction
- DOM implementation for JavaScript execution
- Mozilla Readability algorithm for article extraction
- Robots.txt compliance checking and enforcement
Browser Automation
Headless browser for modern web applications:
- Chromium-based rendering engine
- Custom browser pool with resource controls
- Two-tier render cache (memory and persistent storage)
- Anti-detection measures for reliability
- Selective rendering based on content analysis
Rendering Strategy
Two-phase intelligent rendering:
- Static HTML processing for traditional websites
- Browser rendering for single-page applications
- Detection logic determines appropriate method
- Performance optimization through caching
Duplicate Detection
SimHash fingerprinting for near-duplicate identification:
- 64-bit fingerprints for similarity comparison
- Custom implementation for web-scale deduplication
- Efficient comparison operations
- Configurable similarity thresholds
Structured Data
Schema.org extraction for rich results:
- Recipe, Product, Event, and Article types
- JSON-LD and Microdata format support
- Automatic structured data discovery
- Enhanced result presentation
Secure user authentication infrastructure:
Password Security
- Industry-standard password hashing algorithm (bcrypt)
- Salt generation for each password
- Configurable work factor for computational cost
- Secure password comparison
Session Management
- Token-based authentication
- Secure cookie handling
- Manual parsing for control and security
- Separate authentication database
- Session expiration and renewal
Security Practices
- No plaintext password storage
- Secure token generation
- HTTPS-ready configuration
- Protection against common vulnerabilities
Minimal framework overhead for maximum performance:
Native JavaScript
- Modern ES6+ language features
- Fetch API for HTTP requests
- Promise-based asynchronous operations
- Native async/await patterns
- No framework bundle overhead
UI Components
- Modern icon library via CDN (Lucide)
- Syntax highlighting for code blocks
- Custom font loading optimization
- Responsive design patterns
- Progressive enhancement approach
Production Optimization
- JavaScript minification for reduced payload
- CSS minification and compression
- HTML minification with configuration
- Code transformation for production
- Asset optimization
Large File Processing
- BZ2 decompression for Wikipedia dumps
- Streaming XML parsing for memory efficiency
- Gzip support for sitemap processing
- Progressive parsing to avoid memory spikes
Concurrency Control
- Configurable concurrency limits
- Rate limiting for external requests
- Resource pooling for efficiency
- Backpressure handling
| Algorithm | Purpose | Scale |
|---|---|---|
| PageRank | Authority scoring | Iterative with convergence detection |
| Domain Scoring | Trust calculation | 0-100 scale normalization |
| SimHash | Duplicate detection | 64-bit fingerprints |
| BM25 | Text relevance | Probabilistic ranking |
| HNSW | Vector search | O(log N) complexity |
| Setting | Value |
|---|---|
| Target | ES2022 |
| Module | CommonJS |
| Strict | Enabled |
| Source Maps | Enabled |
| Incremental | Supported |
Hot Reload System
- Automatic restart on file changes
- Direct TypeScript execution
- Transpile-only mode for speed
- Fast iteration cycles
- Error reporting and logging
Production Runtime
- Automatic restart on crashes
- Process clustering for multi-core utilization
- Log aggregation and management
- Environment variable configuration
- Zero-downtime deployment support
- Request logging and metrics
- Error tracking and reporting
- Performance monitoring
- Resource utilization tracking
Manifest V3 Specification
- Modern Chrome extension API
- New tab page integration
- Minimal permissions model
- Native JavaScript implementation
- No build process required
| Decision | Rationale |
|---|---|
| SQLite over PostgreSQL | Zero config, single-file, excellent read performance, built-in FTS |
| Local AI over Cloud APIs | Complete privacy, zero per-query costs, offline functionality |
| Native JS over Frameworks | Faster load times, smaller bundles, direct browser API access |
| TypeScript Strict Mode | Compile-time errors, enhanced IDE support, self-documenting code |
| Redis for Scaling | Distributed crawling, multi-tier caching, job queues |
| Custom HNSW | Full algorithm control, tailored performance, custom persistence |
| Aspect | Target |
|---|---|
| Query Response | Sub-second |
| Cached Queries | 10-20ms |
| Uncached Queries | 150-300ms |
| Cache Hit Rate | 80-85% |
| Crawl Rate | 1000+ pages/sec (10 workers) |
Technical Inquiries: tech@indexa.site
Developer Support: developers@indexa.site
Technology choices reflect ongoing optimization and may evolve based on performance requirements and best practices.
© 2026 Indexa Inc. • Privacy Policy • Terms of Service • indexa.site