TensorGate

Production-grade ASP.NET Core middleware for AI safety — a zero-allocation YARP reverse proxy with local ONNX inference for real-time LLM payload inspection, prompt injection detection, and semantic sanitization.

Overview

TensorGate is an out-of-process containerized sidecar that intercepts, evaluates, and sanitizes Large Language Model (LLM) traffic in real time. It sits between your application and upstream LLM providers as a YARP reverse proxy, running local INT8-quantized ONNX classification models to detect prompt injections and adversarial payloads within a strict sub-50ms latency budget on pure CPU hardware.

Key Design Principles

Zero-Allocation Pipeline — From raw HTTP bytes to ONNX tensor evaluation, the hot path avoids managed heap allocations using Span<T>, ArrayPool<T>, and Utf8JsonReader/Utf8JsonWriter to eliminate GC pauses under high concurrency.
CPU-Only Inference — INT8 statically quantized all-MiniLM-L6-v2 achieves 8–12ms classification latency via AVX-512 VNNI, fitting entirely within L3 cache (~23 MB).
SSE Stream Preservation — Transparent forwarding of text/event-stream responses without buffering, maintaining real-time token streaming from upstream providers.
Lock-Free Hot Reload — Atomic reference-counted model swapping via RefCountDisposable pattern enables zero-downtime weight updates without race conditions or access violations.
NIST AI RMF Alignment — Architecture maps directly to the Govern, Map, Measure, and Manage pillars of NIST AI 600-1.

Architecture

┌─────────────┐     ┌──────────────────────────────────────────┐     ┌──────────────┐
│  Application │────▶│              TensorGate Sidecar           │────▶│  LLM Provider │
│  (Internal)  │◀────│                                          │◀────│  (Upstream)   │
└─────────────┘     │  ┌────────┐  ┌───────────┐  ┌─────────┐ │     └──────────────┘
                    │  │  YARP   │─▶│ Tokenizer │─▶│  ONNX   │ │
                    │  │ Proxy   │  │ (Zero-    │  │ Runtime │ │
                    │  │         │  │  Alloc)   │  │ (INT8)  │ │
                    │  └────────┘  └───────────┘  └─────────┘ │
                    └──────────────────────────────────────────┘

Pipeline Flow

Network Interception — YARP captures outbound LLM API traffic via AddRequestTransform
Zero-Alloc JSON Parsing — Utf8JsonReader state machine extracts prompt fields directly from the byte stream
Tokenization — Microsoft.ML.Tokenizers (BertTokenizer/WordPiece) encodes over ReadOnlySpan<char> without intermediate string allocations
Tensor Binding — ArrayPool<long> leased buffers are pinned and bound to OrtValue.CreateTensorValueFromMemory
Classification — Single forward pass through INT8 MiniLM yields Safe/Malicious probability in 8–12ms
Decision Gate — Malicious payloads are blocked synchronously; safe payloads stream through unmodified

Technology Stack

Layer	Technology	Purpose
Reverse Proxy	YARP	Traffic interception and SSE stream forwarding
JSON Processing	`Utf8JsonReader` / `Utf8JsonWriter`	Zero-allocation payload parsing
Tokenization	Microsoft.ML.Tokenizers	Allocation-free BPE/WordPiece encoding
Inference	ONNX Runtime	INT8 quantized CPU inference
Model	all-MiniLM-L6-v2	Sequence classification (22.7M params)
Concurrency	`Interlocked` / `Volatile` / CAS loops	Lock-free reference counting
Validation	HarmBench	Adversarial red-team evaluation

Performance Targets

Metric	Target	Mechanism
End-to-end latency	< 50ms	INT8 quantization + AVX-512 VNNI
Inference latency	8–12.3ms	Static quantization, L3 cache residency
Heap allocations	0 bytes on hot path	`Span<T>`, `ArrayPool<T>`, `Utf8JsonReader`
Model memory	~23 MB	INT8 weight compression
Model hot-reload	Zero downtime	Atomic `RefCountDisposable` double buffering

Project Status

Sprint 1 scaffolding is in progress: the solution builds, YARP proxies /v1/* to a configurable OpenAI-compatible upstream, and /health is exposed for orchestration probes.

cd ~/TensorGate
./scripts/setup-local-dev.sh
./scripts/smoke-yarp.sh   # mock upstream on :9090, proxy on :8080
dotnet run --project src/TensorGate.Proxy

This project is under active development following a structured sprint cadence:

Sprint	Focus	Duration
Sprint 1	Foundational Scaffolding & Proxy Mechanics	Days 1–14
Sprint 2	Memory Optimization & Inference Engines	Days 15–28
Sprint 3	Concurrency, Hot-Swapping & Validation	Days 29–42

Track progress on the TensorGate Project Board.

Getting Started

Prerequisites: .NET 10.0 SDK (LTS), Docker (optional for sidecar deployment)

Language policy: TensorGate tracks the latest stable C# language version via central build settings.

# Clone the repository
git clone https://github.com/TensorGateLabs/TensorGate.git
cd TensorGate

# Build
dotnet build

# Run tests
dotnet test

# Run the sidecar
dotnet run --project src/TensorGate.Proxy

Contributing

Contributions are welcome. Please read the Contributing Guidelines before submitting a pull request.

Engineering runbooks in this repository:

Operating Pipeline — issue/PR flow and quality gates
Phase 2 Intelligent Orchestration — issue/PR workflow design
Agentic Shipping Research and Playbook — local-first shipping practices
Phase 2.3 Traceability and Evidence — requirement-to-merge automation
Agentic Development Research 2026 — maturity recommendations
Workflow Cost Optimization Policy — CI minute policy
Repository Organization — org board and automation

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
docs		docs
scripts		scripts
src		src
tests/TensorGate.Tests		tests/TensorGate.Tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.lycheeignore		.lycheeignore
.markdownlint.jsonc		.markdownlint.jsonc
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Directory.Build.props		Directory.Build.props
Directory.Packages.props		Directory.Packages.props
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
TensorGate.slnx		TensorGate.slnx
docker-compose.yml		docker-compose.yml
global.json		global.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorGate

Overview

Key Design Principles

Architecture

Pipeline Flow

Technology Stack

Performance Targets

Project Status

Getting Started

Contributing

License

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TensorGate

Overview

Key Design Principles

Architecture

Pipeline Flow

Technology Stack

Performance Targets

Project Status

Getting Started

Contributing

License

References

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages