Harper Edge AI

Production-ready multi-backend ML inference engine for HarperDB.

Run ONNX, TensorFlow.js, Transformers.js, and Ollama models with unified API, async model fetching, profile-based management, and performance benchmarking.

What It Does

Harper Edge AI is a complete inference engine that makes it easy to deploy and run ML models in production:

Multi-Backend Support - Use the best runtime for each model (ONNX, TensorFlow.js, Transformers.js, Ollama)
Async Model Fetching - Download models from HuggingFace, HTTP, or filesystem with progress tracking
Profile Management - Deploy model sets with simple configuration files
Performance Benchmarking - Compare equivalent models across backends
Deployment Automation - Scripts for deploy, verify, benchmark, and demo
Production Ready - Full test coverage, pre-commit hooks, comprehensive docs

Quick Start (5 Minutes)

Prerequisites

# Node.js 18+
node --version

# Harper 4.0+
# Install: https://docs.harperdb.io/docs/getting-started/installation
harper --version

# Ollama (optional, for LLM inference)
# Download: https://ollama.ai/
ollama --version

Installation

git clone https://github.com/HarperDB/harper-edge-ai-example.git
cd harper-edge-ai-example
npm install

# Start Harper server
npm run dev              # Server at http://localhost:9926

# Load test models
npm run preload:testing

# Run tests
npm test

First Inference

# Run a quick demo
./demo.sh

# Or make a direct API call
curl -X POST http://localhost:9926/Predict \
  -H "Content-Type: application/json" \
  -d '{
    "modelName": "test-transformers-embedding",
    "modelVersion": "v1",
    "inputs": {"text": "This is a test"}
  }'

Core Features

1. Multi-Backend Inference

Run models on the optimal backend for your use case:

Backend	Use Cases	Status	Package
ONNX Runtime	Optimized production models	✅ Ready	onnxruntime-node 1.20.1
TensorFlow.js	Universal Sentence Encoder, Keras models	✅ Ready	@tensorflow/tfjs-node
Transformers.js	Hugging Face models (WASM)	✅ Ready	@xenova/transformers 2.17.2
Ollama	Local LLMs (chat, embeddings)	✅ Ready	External service

Features:

Automatic backend routing based on framework
LRU model caching with configurable size
Unified prediction API
File-backed blob storage for large models (86MB+)

2. Model Fetch System

Download models from multiple sources with full job tracking:

# Fetch from HuggingFace
harper-ai fetch huggingface Xenova/all-MiniLM-L6-v2 \
  --name minilm --version v1 --framework transformers

# Check job status
harper-ai jobs --status pending

# View model inventory
harper-ai models

Capabilities:

Multi-source downloads (HuggingFace, HTTP URLs, filesystem)
Async job queue with real-time progress
Retry logic with exponential backoff
Optional token authentication
Complete CLI tool

Full Guide: Model Fetch System Documentation

3. Profile-Based Model Management

Deploy model sets with configuration files:

// profiles/testing.json
{
	"name": "testing",
	"description": "One model per backend for CI/CD",
	"models": [
		{
			"modelName": "test-transformers-embedding",
			"framework": "transformers",
			"modelBlob": {
				"modelName": "Xenova/all-MiniLM-L6-v2",
				"taskType": "feature-extraction"
			}
		}
	]
}

# Deploy testing profile
npm run preload:testing

# Deploy benchmarking profile
npm run preload:benchmarking

# Deploy custom profile
node scripts/preload-models.js --profile production

Available Profiles:

testing - One model per backend for CI/CD
benchmarking - Equivalence groups for performance comparison
development - Full test suite
production - Production-ready models only
minimal - Single model for quick tests

Full Guide: Profile Testing Documentation

4. Performance Benchmarking

Compare equivalent models across backends to optimize deployment:

# Deploy benchmarking models
npm run preload:benchmarking

# Run benchmarks
./benchmark.sh --iterations 100

# View results
cat benchmark-*.json | jq '.winner'

Capabilities:

Cross-backend comparison (ONNX vs TensorFlow vs Transformers.js vs Ollama)
Equivalence group validation (compatible output dimensions)
Statistical metrics (avg, p50, p95, p99 latency, error rates)
Historical benchmark tracking
Automated test data generation

Full Guide: Benchmarking Documentation

5. Deployment Automation

Four focused scripts for complete deployment lifecycle:

Script	Purpose	Example
`deploy.sh`	Deploy code to Harper instance	`./deploy.sh --remote`
`verify.sh`	Verify deployed system	`./verify.sh --full`
`benchmark.sh`	Run performance benchmarks	`./benchmark.sh --iterations 500`
`demo.sh`	Interactive demonstrations	`./demo.sh --remote`

All scripts use .env configuration (no hardcoded values).

Full Guide: Scripts Reference

Project Structure

├── src/
│   ├── resources.js              # Harper resource definitions
│   ├── core/
│   │   ├── backends/
│   │   │   ├── Onnx.js          # ONNX Runtime backend
│   │   │   ├── TensorFlow.js    # TensorFlow.js backend
│   │   │   ├── Transformers.js  # Transformers.js backend
│   │   │   └── Ollama.js        # Ollama backend
│   │   ├── InferenceEngine.js   # Unified inference router
│   │   └── MonitoringBackend.js # Telemetry tracking
│   └── workers/
│       └── ModelFetchWorker.js  # Async model download worker
│
├── scripts/
│   ├── preload-models.js        # Profile-based model deployment
│   ├── cli/harper-ai.js         # CLI tool for model management
│   └── lib/
│       ├── model-fetch-client.js # Model Fetch API client
│       └── shell-utils.sh        # Shared shell utilities
│
├── tests/
│   ├── unit/                    # 11 unit test files (63 tests)
│   └── integration/             # 10 integration test files
│
├── profiles/                    # Model profile definitions
│   ├── minimal.json             # Minimal profile (1 model)
│   ├── testing.json             # Testing profile (4 models)
│   ├── benchmarking.json        # Benchmarking profile (equivalence groups)
│   ├── development.json         # Development profile (8 models)
│   └── production.json          # Production profile
│
├── deploy.sh                    # Deployment script
├── verify.sh                    # Verification script
├── benchmark.sh                 # Benchmarking script
└── demo.sh                      # Demo script

API Examples

Inference API

// Text embedding
const response = await fetch('http://localhost:9926/Predict', {
	method: 'POST',
	headers: { 'Content-Type': 'application/json' },
	body: JSON.stringify({
		modelName: 'test-transformers-embedding',
		modelVersion: 'v1',
		inputs: { text: 'product search query' },
	}),
});

const { embedding } = await response.json();
console.log('Embedding:', embedding); // [0.123, -0.456, ...]

Model Management

// List all deployed models
const models = await fetch('http://localhost:9926/Model/').then((r) => r.json());

// Get specific model
const model = await fetch('http://localhost:9926/Model/minilm:v1').then((r) => r.json());

// Delete model
await fetch('http://localhost:9926/Model/minilm:v1', { method: 'DELETE' });

Model Fetch API

// Start async fetch job
const job = await fetch('http://localhost:9926/FetchModel', {
	method: 'POST',
	headers: { 'Content-Type': 'application/json' },
	body: JSON.stringify({
		source: 'huggingface',
		sourceReference: 'Xenova/all-MiniLM-L6-v2',
		modelName: 'minilm',
		modelVersion: 'v1',
		framework: 'transformers',
	}),
}).then((r) => r.json());

// Check job status
const status = await fetch(`http://localhost:9926/ModelFetchJobs?id=${job.jobId}`).then((r) => r.json());

Scripts

# Development
npm run dev                      # Start Harper server
npm run preload                  # Load development models
npm run preload:testing          # Load testing profile
npm run preload:benchmarking     # Load benchmarking profile

# Testing
npm test                         # Run unit tests
npm run test:integration         # Run integration tests
npm run test:all                 # Run all tests

# Deployment
./deploy.sh                      # Deploy to remote Harper
./verify.sh --full               # Verify deployment
./benchmark.sh                   # Run benchmarks
./demo.sh                        # Interactive demo

# Linting & Formatting
npm run lint                     # Check code style
npm run lint:fix                 # Auto-fix issues
npm run format                   # Format all files
npm run format:check             # Check formatting

Requirements

System Requirements

Node.js 18.0.0 or higher
npm 9.0.0 or higher
Harper 4.0.0 or higher
Operating System: macOS or Linux

Optional Dependencies

Ollama - For LLM inference (chat and embeddings)
GPU - For accelerated ONNX inference (optional)

Installed Packages

onnxruntime-node@1.20.1 - ONNX Runtime
@xenova/transformers@2.17.2 - Transformers.js
@tensorflow/tfjs-node - TensorFlow.js (optional)
uuid@11.0.3 - ID generation
sharp@0.32.6 - Image processing (optional)

Documentation

Getting Started

Quick Start - Get running in 5 minutes
Deployment Guide - Local and remote deployment
Scripts Reference - deploy.sh, verify.sh, benchmark.sh, demo.sh

Features

Model Fetch System - Async model downloads (HuggingFace, HTTP, filesystem)
Benchmarking - Performance comparison across backends
Profile Testing - Profile-based model management
Model Metadata - Metadata conventions for benchmarking

Technical

ONNX Runtime Guide - ONNX backend details
Roadmap - Future plans and milestones
Contributing - How to contribute

Test Coverage

Unit Tests: 11 files, 63 tests (all passing)
Integration Tests: 10 files covering end-to-end workflows
Coverage: ~70% (unit + integration), 20% integration-only
Pre-commit Hooks: Lint, format, and test before every commit

npm test                 # Run unit tests (63 tests, ~2s)
npm run test:integration # Run integration tests (~30s)
npm run test:all         # Run all tests

Contributing

We welcome contributions! Here's how to get started:

Read the docs - Understand current capabilities
Check the roadmap - See ROADMAP.md for planned features
Follow conventions - ESLint + Prettier configured
Write tests - All code must have tests
Update docs - Keep documentation current

See CONTRIBUTING.md for detailed guidelines.

License

Apache License 2.0 - See LICENSE for details.

Support

Documentation: docs/
Issues: GitHub Issues
Harper Docs: https://docs.harperdb.io/

Acknowledgments

Built with:

HarperDB - Application and database platform
ONNX Runtime - Optimized inference
Transformers.js - Hugging Face models in JS
Ollama - Local LLMs
TensorFlow.js - Universal Sentence Encoder

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.husky		.husky
docs		docs
examples		examples
models		models
postman		postman
profiles		profiles
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
benchmark-8f1f04fe-b1a2-4e69-a469-aa9662283eb4.json		benchmark-8f1f04fe-b1a2-4e69-a469-aa9662283eb4.json
benchmark.sh		benchmark.sh
config.yaml		config.yaml
demo.sh		demo.sh
deploy.sh		deploy.sh
eslint.config.mjs		eslint.config.mjs
model-profiles.json		model-profiles.json
package-lock.json		package-lock.json
package.json		package.json
prettier.config.js		prettier.config.js
schemas.graphql		schemas.graphql
verify.sh		verify.sh

License

HarperFast/edge-ai-ops

Folders and files

Latest commit

History

Repository files navigation

Harper Edge AI

What It Does

Quick Start (5 Minutes)

Prerequisites

Installation

First Inference

Core Features

1. Multi-Backend Inference

2. Model Fetch System

3. Profile-Based Model Management

4. Performance Benchmarking

5. Deployment Automation

Project Structure

API Examples

Inference API

Model Management

Model Fetch API

Scripts

Requirements

System Requirements

Optional Dependencies

Installed Packages

Documentation

Getting Started

Features

Technical

Test Coverage

Contributing

License

Support

Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages