Production-ready multi-backend ML inference engine for HarperDB.
Run ONNX, TensorFlow.js, Transformers.js, and Ollama models with unified API, async model fetching, profile-based management, and performance benchmarking.
Harper Edge AI is a complete inference engine that makes it easy to deploy and run ML models in production:
- Multi-Backend Support - Use the best runtime for each model (ONNX, TensorFlow.js, Transformers.js, Ollama)
- Async Model Fetching - Download models from HuggingFace, HTTP, or filesystem with progress tracking
- Profile Management - Deploy model sets with simple configuration files
- Performance Benchmarking - Compare equivalent models across backends
- Deployment Automation - Scripts for deploy, verify, benchmark, and demo
- Production Ready - Full test coverage, pre-commit hooks, comprehensive docs
# Node.js 18+
node --version
# Harper 4.0+
# Install: https://docs.harperdb.io/docs/getting-started/installation
harper --version
# Ollama (optional, for LLM inference)
# Download: https://ollama.ai/
ollama --versiongit clone https://github.com/HarperDB/harper-edge-ai-example.git
cd harper-edge-ai-example
npm install
# Start Harper server
npm run dev # Server at http://localhost:9926
# Load test models
npm run preload:testing
# Run tests
npm test# Run a quick demo
./demo.sh
# Or make a direct API call
curl -X POST http://localhost:9926/Predict \
-H "Content-Type: application/json" \
-d '{
"modelName": "test-transformers-embedding",
"modelVersion": "v1",
"inputs": {"text": "This is a test"}
}'Run models on the optimal backend for your use case:
| Backend | Use Cases | Status | Package |
|---|---|---|---|
| ONNX Runtime | Optimized production models | ✅ Ready | onnxruntime-node 1.20.1 |
| TensorFlow.js | Universal Sentence Encoder, Keras models | ✅ Ready | @tensorflow/tfjs-node |
| Transformers.js | Hugging Face models (WASM) | ✅ Ready | @xenova/transformers 2.17.2 |
| Ollama | Local LLMs (chat, embeddings) | ✅ Ready | External service |
Features:
- Automatic backend routing based on framework
- LRU model caching with configurable size
- Unified prediction API
- File-backed blob storage for large models (86MB+)
Download models from multiple sources with full job tracking:
# Fetch from HuggingFace
harper-ai fetch huggingface Xenova/all-MiniLM-L6-v2 \
--name minilm --version v1 --framework transformers
# Check job status
harper-ai jobs --status pending
# View model inventory
harper-ai modelsCapabilities:
- Multi-source downloads (HuggingFace, HTTP URLs, filesystem)
- Async job queue with real-time progress
- Retry logic with exponential backoff
- Optional token authentication
- Complete CLI tool
Full Guide: Model Fetch System Documentation
Deploy model sets with configuration files:
// profiles/testing.json
{
"name": "testing",
"description": "One model per backend for CI/CD",
"models": [
{
"modelName": "test-transformers-embedding",
"framework": "transformers",
"modelBlob": {
"modelName": "Xenova/all-MiniLM-L6-v2",
"taskType": "feature-extraction"
}
}
]
}# Deploy testing profile
npm run preload:testing
# Deploy benchmarking profile
npm run preload:benchmarking
# Deploy custom profile
node scripts/preload-models.js --profile productionAvailable Profiles:
testing- One model per backend for CI/CDbenchmarking- Equivalence groups for performance comparisondevelopment- Full test suiteproduction- Production-ready models onlyminimal- Single model for quick tests
Full Guide: Profile Testing Documentation
Compare equivalent models across backends to optimize deployment:
# Deploy benchmarking models
npm run preload:benchmarking
# Run benchmarks
./benchmark.sh --iterations 100
# View results
cat benchmark-*.json | jq '.winner'Capabilities:
- Cross-backend comparison (ONNX vs TensorFlow vs Transformers.js vs Ollama)
- Equivalence group validation (compatible output dimensions)
- Statistical metrics (avg, p50, p95, p99 latency, error rates)
- Historical benchmark tracking
- Automated test data generation
Full Guide: Benchmarking Documentation
Four focused scripts for complete deployment lifecycle:
| Script | Purpose | Example |
|---|---|---|
deploy.sh |
Deploy code to Harper instance | ./deploy.sh --remote |
verify.sh |
Verify deployed system | ./verify.sh --full |
benchmark.sh |
Run performance benchmarks | ./benchmark.sh --iterations 500 |
demo.sh |
Interactive demonstrations | ./demo.sh --remote |
All scripts use .env configuration (no hardcoded values).
Full Guide: Scripts Reference
├── src/
│ ├── resources.js # Harper resource definitions
│ ├── core/
│ │ ├── backends/
│ │ │ ├── Onnx.js # ONNX Runtime backend
│ │ │ ├── TensorFlow.js # TensorFlow.js backend
│ │ │ ├── Transformers.js # Transformers.js backend
│ │ │ └── Ollama.js # Ollama backend
│ │ ├── InferenceEngine.js # Unified inference router
│ │ └── MonitoringBackend.js # Telemetry tracking
│ └── workers/
│ └── ModelFetchWorker.js # Async model download worker
│
├── scripts/
│ ├── preload-models.js # Profile-based model deployment
│ ├── cli/harper-ai.js # CLI tool for model management
│ └── lib/
│ ├── model-fetch-client.js # Model Fetch API client
│ └── shell-utils.sh # Shared shell utilities
│
├── tests/
│ ├── unit/ # 11 unit test files (63 tests)
│ └── integration/ # 10 integration test files
│
├── profiles/ # Model profile definitions
│ ├── minimal.json # Minimal profile (1 model)
│ ├── testing.json # Testing profile (4 models)
│ ├── benchmarking.json # Benchmarking profile (equivalence groups)
│ ├── development.json # Development profile (8 models)
│ └── production.json # Production profile
│
├── deploy.sh # Deployment script
├── verify.sh # Verification script
├── benchmark.sh # Benchmarking script
└── demo.sh # Demo script
// Text embedding
const response = await fetch('http://localhost:9926/Predict', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
modelName: 'test-transformers-embedding',
modelVersion: 'v1',
inputs: { text: 'product search query' },
}),
});
const { embedding } = await response.json();
console.log('Embedding:', embedding); // [0.123, -0.456, ...]// List all deployed models
const models = await fetch('http://localhost:9926/Model/').then((r) => r.json());
// Get specific model
const model = await fetch('http://localhost:9926/Model/minilm:v1').then((r) => r.json());
// Delete model
await fetch('http://localhost:9926/Model/minilm:v1', { method: 'DELETE' });// Start async fetch job
const job = await fetch('http://localhost:9926/FetchModel', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
source: 'huggingface',
sourceReference: 'Xenova/all-MiniLM-L6-v2',
modelName: 'minilm',
modelVersion: 'v1',
framework: 'transformers',
}),
}).then((r) => r.json());
// Check job status
const status = await fetch(`http://localhost:9926/ModelFetchJobs?id=${job.jobId}`).then((r) => r.json());# Development
npm run dev # Start Harper server
npm run preload # Load development models
npm run preload:testing # Load testing profile
npm run preload:benchmarking # Load benchmarking profile
# Testing
npm test # Run unit tests
npm run test:integration # Run integration tests
npm run test:all # Run all tests
# Deployment
./deploy.sh # Deploy to remote Harper
./verify.sh --full # Verify deployment
./benchmark.sh # Run benchmarks
./demo.sh # Interactive demo
# Linting & Formatting
npm run lint # Check code style
npm run lint:fix # Auto-fix issues
npm run format # Format all files
npm run format:check # Check formatting- Node.js 18.0.0 or higher
- npm 9.0.0 or higher
- Harper 4.0.0 or higher
- Operating System: macOS or Linux
- Ollama - For LLM inference (chat and embeddings)
- GPU - For accelerated ONNX inference (optional)
onnxruntime-node@1.20.1- ONNX Runtime@xenova/transformers@2.17.2- Transformers.js@tensorflow/tfjs-node- TensorFlow.js (optional)uuid@11.0.3- ID generationsharp@0.32.6- Image processing (optional)
- Quick Start - Get running in 5 minutes
- Deployment Guide - Local and remote deployment
- Scripts Reference - deploy.sh, verify.sh, benchmark.sh, demo.sh
- Model Fetch System - Async model downloads (HuggingFace, HTTP, filesystem)
- Benchmarking - Performance comparison across backends
- Profile Testing - Profile-based model management
- Model Metadata - Metadata conventions for benchmarking
- ONNX Runtime Guide - ONNX backend details
- Roadmap - Future plans and milestones
- Contributing - How to contribute
- Unit Tests: 11 files, 63 tests (all passing)
- Integration Tests: 10 files covering end-to-end workflows
- Coverage: ~70% (unit + integration), 20% integration-only
- Pre-commit Hooks: Lint, format, and test before every commit
npm test # Run unit tests (63 tests, ~2s)
npm run test:integration # Run integration tests (~30s)
npm run test:all # Run all testsWe welcome contributions! Here's how to get started:
- Read the docs - Understand current capabilities
- Check the roadmap - See ROADMAP.md for planned features
- Follow conventions - ESLint + Prettier configured
- Write tests - All code must have tests
- Update docs - Keep documentation current
See CONTRIBUTING.md for detailed guidelines.
Apache License 2.0 - See LICENSE for details.
- Documentation: docs/
- Issues: GitHub Issues
- Harper Docs: https://docs.harperdb.io/
Built with:
- HarperDB - Application and database platform
- ONNX Runtime - Optimized inference
- Transformers.js - Hugging Face models in JS
- Ollama - Local LLMs
- TensorFlow.js - Universal Sentence Encoder