Skip to content

Technical Overview

Yohaan Dsouza edited this page Feb 1, 2026 · 1 revision

Technical Overview

Complete technical architecture documentation for Indexa.

Architecture Philosophy

Indexa is built on a foundation of performance, privacy, and simplicity. The technical stack prioritizes embedded solutions, local processing, and minimal external dependencies to deliver a fast, secure search experience.


Backend Infrastructure

Runtime Environment

Component Version Purpose
Node.js 18.0.0+ Asynchronous I/O, event-driven architecture
Fastify 4.26 High-performance HTTP server, plugin-based
TypeScript 5.3.3 Full type safety, ES2022 target

Data Storage

SQLite Primary Database

Embedded relational database with zero configuration:

  • Synchronous bindings for predictable performance
  • Asynchronous operations for non-blocking tasks
  • Write-Ahead Logging (WAL) mode for improved concurrency
  • FTS5 extension with BM25 ranking algorithm
  • Built-in full-text search capabilities
  • No separate database server required

Redis In-Memory Store

Advanced caching and distributed system coordination:

  • Multi-tier caching architecture
  • Distributed job queue management
  • FIFO list operations for crawl coordination
  • Set and hash data structures for efficient lookups
  • High-performance read operations

Artificial Intelligence & Machine Learning

Local Embedding Generation

Transformer Models

On-device neural network inference:

  • ONNX runtime for cross-platform compatibility
  • BGE-base-en-v1.5 model for English text embeddings
  • Int8 quantization for 75% memory reduction
  • No external API dependencies
  • Fully offline operation for privacy
  • Zero per-query costs

Vector Search Implementation

Custom approximate nearest neighbor search:

  • Hierarchical Navigable Small World (HNSW) graph structure
  • Optimized for high-dimensional vector spaces
  • O(log N) search complexity
  • Custom implementation for full control
  • Graph persistence separate from main database

Language Model Integration

Dufus AI Assistant

Local large language model support:

  • Native LLaMA model integration
  • Streaming response generation
  • Server-Sent Events (SSE) for real-time output
  • Context aggregation from search results
  • Intent classification for query understanding
  • Multi-query decomposition for complex questions

Web Crawling & Content Processing

HTTP Operations

Request Handling

  • Automatic retry on transient failures
  • Custom timeouts and connection pooling
  • HTTP/2 support for performance
  • Compression handling (gzip, brotli)

Content Parsing

  • jQuery-like HTML parsing for structure extraction
  • DOM implementation for JavaScript execution
  • Mozilla Readability algorithm for article extraction
  • Robots.txt compliance checking and enforcement

JavaScript Rendering

Browser Automation

Headless browser for modern web applications:

  • Chromium-based rendering engine
  • Custom browser pool with resource controls
  • Two-tier render cache (memory and persistent storage)
  • Anti-detection measures for reliability
  • Selective rendering based on content analysis

Rendering Strategy

Two-phase intelligent rendering:

  1. Static HTML processing for traditional websites
  2. Browser rendering for single-page applications
  3. Detection logic determines appropriate method
  4. Performance optimization through caching

Content Quality

Duplicate Detection

SimHash fingerprinting for near-duplicate identification:

  • 64-bit fingerprints for similarity comparison
  • Custom implementation for web-scale deduplication
  • Efficient comparison operations
  • Configurable similarity thresholds

Structured Data

Schema.org extraction for rich results:

  • Recipe, Product, Event, and Article types
  • JSON-LD and Microdata format support
  • Automatic structured data discovery
  • Enhanced result presentation

Authentication & Security

OwlGuard System

Secure user authentication infrastructure:

Password Security

  • Industry-standard password hashing algorithm (bcrypt)
  • Salt generation for each password
  • Configurable work factor for computational cost
  • Secure password comparison

Session Management

  • Token-based authentication
  • Secure cookie handling
  • Manual parsing for control and security
  • Separate authentication database
  • Session expiration and renewal

Security Practices

  • No plaintext password storage
  • Secure token generation
  • HTTPS-ready configuration
  • Protection against common vulnerabilities

Frontend Architecture

Design Philosophy

Minimal framework overhead for maximum performance:

Native JavaScript

  • Modern ES6+ language features
  • Fetch API for HTTP requests
  • Promise-based asynchronous operations
  • Native async/await patterns
  • No framework bundle overhead

UI Components

  • Modern icon library via CDN (Lucide)
  • Syntax highlighting for code blocks
  • Custom font loading optimization
  • Responsive design patterns
  • Progressive enhancement approach

Build Pipeline

Production Optimization

  • JavaScript minification for reduced payload
  • CSS minification and compression
  • HTML minification with configuration
  • Code transformation for production
  • Asset optimization

Data Processing

Compression & Streaming

Large File Processing

  • BZ2 decompression for Wikipedia dumps
  • Streaming XML parsing for memory efficiency
  • Gzip support for sitemap processing
  • Progressive parsing to avoid memory spikes

Concurrency Control

  • Configurable concurrency limits
  • Rate limiting for external requests
  • Resource pooling for efficiency
  • Backpressure handling

Algorithmic Components

Algorithm Purpose Scale
PageRank Authority scoring Iterative with convergence detection
Domain Scoring Trust calculation 0-100 scale normalization
SimHash Duplicate detection 64-bit fingerprints
BM25 Text relevance Probabilistic ranking
HNSW Vector search O(log N) complexity

Development Infrastructure

TypeScript Configuration

Setting Value
Target ES2022
Module CommonJS
Strict Enabled
Source Maps Enabled
Incremental Supported

Development Workflow

Hot Reload System

  • Automatic restart on file changes
  • Direct TypeScript execution
  • Transpile-only mode for speed
  • Fast iteration cycles
  • Error reporting and logging

Deployment & Operations

Process Management

Production Runtime

  • Automatic restart on crashes
  • Process clustering for multi-core utilization
  • Log aggregation and management
  • Environment variable configuration
  • Zero-downtime deployment support

Monitoring & Logging

  • Request logging and metrics
  • Error tracking and reporting
  • Performance monitoring
  • Resource utilization tracking

Browser Extension

Extension Architecture

Manifest V3 Specification

  • Modern Chrome extension API
  • New tab page integration
  • Minimal permissions model
  • Native JavaScript implementation
  • No build process required

Key Technical Decisions

Decision Rationale
SQLite over PostgreSQL Zero config, single-file, excellent read performance, built-in FTS
Local AI over Cloud APIs Complete privacy, zero per-query costs, offline functionality
Native JS over Frameworks Faster load times, smaller bundles, direct browser API access
TypeScript Strict Mode Compile-time errors, enhanced IDE support, self-documenting code
Redis for Scaling Distributed crawling, multi-tier caching, job queues
Custom HNSW Full algorithm control, tailored performance, custom persistence

Performance Characteristics

Aspect Target
Query Response Sub-second
Cached Queries 10-20ms
Uncached Queries 150-300ms
Cache Hit Rate 80-85%
Crawl Rate 1000+ pages/sec (10 workers)

Contact

Technical Inquiries: tech@indexa.site
Developer Support: developers@indexa.site


Technology choices reflect ongoing optimization and may evolve based on performance requirements and best practices.

Clone this wiki locally