Releases: elbruno/ElBruno.LocalLLMs
v0.16.0 — BitNet Auto-Download & Native NuGet Packages
What's New
BitNet Auto-Download from HuggingFace
- GGUF models now auto-download on first run — zero manual setup for model files
BitNetModelDownloaderwith cache-first logic viaElBruno.HuggingFace.DownloaderBitNetChatClient.CreateAsync()factory with progress reportingEnsureModelDownloadedoption (default: true)
Platform-Specific Native NuGet Packages
ElBruno.LocalLLMs.BitNet.Native.win-x64— Windows x64 native library (llama.dll)ElBruno.LocalLLMs.BitNet.Native.linux-x64— Linux x64 native library (libllama.so)ElBruno.LocalLLMs.BitNet.Native.osx-arm64— macOS ARM64 native library (libllama.dylib)- NativeLibraryLoader probes
runtimes/{rid}/native/paths (NuGet convention) - Improved error messages with platform-specific NuGet package suggestions
CI/CD
build-bitnet-native.yml— cross-platform bitnet.cpp build workflow (3 runners)publish-bitnet-native.yml— native NuGet publish via OIDC trusted publishing
Tests
- 58 new BitNet tests (NativeLibraryLoaderTests + NativePackageValidationTests)
- Total: 229 BitNet tests passing
Full Changelog: v0.15.0...v0.16.0
v0.15.0 — BitNet 1.58-bit Model Support
What's New in v0.15.0
New Package: ElBruno.LocalLLMs.BitNet
Run Microsoft's BitNet 1.58-bit ternary models in .NET through IChatClient.
Install:
dotnet add package ElBruno.LocalLLMs.BitNet
Highlights:
- BitNetChatClient implementing IChatClient from Microsoft.Extensions.AI
- Wraps bitnet.cpp (llama.cpp fork with ternary kernels) via P/Invoke
- 5 model catalog: BitNet 0.7B, 2B-4T (default), 3B, Falcon3 1B, Falcon3 3B
- Models as small as 150 MB with excellent CPU performance
- Streaming, DI registration, platform-specific native lib loading
- 155 unit tests
Samples:
- BitNetChat - basic chat completion
- BitNetPerformance - benchmark BitNet vs ONNX models
Docs:
- BitNet Guide: docs/bitnet-guide.md
- Blog Post: docs/blog-bitnet-launch.md
v0.11.0 — RAG XML Docs + Comprehensive Tests
What's Changed
Fixed
- XML Documentation (Issue #12): Added comprehensive XML doc comments to all 13 public types in ElBruno.LocalLLMs.Rag, eliminating all 116 CS1591 warnings
Added
- 60 new unit tests (Issue #11): Comprehensive test coverage for the RAG package
- RagRecordTests.cs (27 tests) — record types construction, equality, immutability
- SqliteDocumentStoreTests.cs (14 tests) — SQLite persistence layer
- RagServiceExtensionsTests.cs (13 tests) — DI registration
- LocalRagPipelineConstructorTests.cs (6 tests) — constructor validation
Stats
- 0 build warnings (was 116)
- 813 total tests (718 xUnit + 95 MSTest), all pass
- ElBruno.LocalLLMs: 0.10.0 → 0.11.0
- ElBruno.LocalLLMs.Rag: 0.1.0 → 0.2.0
v0.10.0 — Zero-Cloud RAG Sample
What's New in v0.10.0
🔌 Zero-Cloud RAG Sample (Closes #9)
A complete offline RAG pipeline — no cloud APIs needed. Combines three ElBruno packages:
ElBruno.LocalEmbeddings— real ONNX-based vector embeddings (all-MiniLM model)ElBruno.LocalLLMs— local LLM inference via Phi-3.5-mini-instructElBruno.LocalLLMs.Rag— chunking, storage, and retrieval pipeline
dotnet run --project src/samples/ZeroCloudRagThe sample loads documents, chunks them, generates embeddings, indexes in an in-memory vector store, retrieves relevant context for a query, and streams a grounded answer from the local LLM. 11 steps, zero cloud.
🧪 Tests
27 new tests across 3 files:
- 10 MSTest unit tests for
LocalRagPipeline - 4 MSTest E2E integration tests (gated behind
RUN_INTEGRATION_TESTS) - 13 xUnit tests for RAG record types
Total: 757 tests, all pass.
📚 Documentation
docs/rag-guide.md— new Zero-Cloud RAG section with DI architectureREADME.md— ZeroCloudRag added to samples tabledocs/supported-models.md— RAG model recommendations
Full Changelog: v0.9.0...v0.10.0
v0.9.0 — Qwen2.5-Coder-7B ONNX + OpenAI Server
What's New in v0.9.0
🧑💻 Qwen2.5-Coder-7B-Instruct — Code Assistant Model
The first code-specialized model in the library! Converted to ONNX INT4 (6.3 GB) and published to elbruno/Qwen2.5-Coder-7B-Instruct-onnx on HuggingFace.
- Same Qwen chat template as the rest of the Qwen2.5 family
- Supports tool calling for agent-based coding workflows
- Works out of the box —
HasNativeOnnx = true
var options = new LocalLLMsOptions { Model = KnownModels.Qwen25Coder_7BInstruct };
var client = new LocalChatClient(options);
var response = await client.GetResponseAsync("Write a quicksort in C#");🌐 OpenAI-Compatible HTTP Server Sample
New src/samples/OpenAiServer — a minimal ASP.NET Core server that exposes local ONNX models via OpenAI-compatible REST endpoints.
POST /v1/chat/completions(streaming SSE + non-streaming)GET /v1/models(list available models)- Works with VS Code Copilot custom models, Continue, Cody, and any OpenAI SDK client
- Includes
chatLanguageModels.jsonconfig for VS Code integration
📋 Blocked Models Documentation
- Codestral 22B v0.1 — MNPL-0.1 license prohibits production use
- Devstral Small 2 (24B) — no ONNX conversion path (custom Tekken tokenizer, FP8 quantization)
✅ Quality
- 705 tests pass (8 new for Qwen2.5-Coder model definition)
- 0 warnings, 0 errors across entire solution
Full Changelog: v0.8.0...v0.9.0
v0.7.2
v0.7.1
v0.7.0
What's Changed
Full Changelog: v0.6.0...v0.7.0