Skip to content

Releases: elbruno/ElBruno.LocalLLMs

v0.16.0 — BitNet Auto-Download & Native NuGet Packages

17 Apr 13:32

Choose a tag to compare

What's New

BitNet Auto-Download from HuggingFace

  • GGUF models now auto-download on first run — zero manual setup for model files
  • BitNetModelDownloader with cache-first logic via ElBruno.HuggingFace.Downloader
  • BitNetChatClient.CreateAsync() factory with progress reporting
  • EnsureModelDownloaded option (default: true)

Platform-Specific Native NuGet Packages

  • ElBruno.LocalLLMs.BitNet.Native.win-x64 — Windows x64 native library (llama.dll)
  • ElBruno.LocalLLMs.BitNet.Native.linux-x64 — Linux x64 native library (libllama.so)
  • ElBruno.LocalLLMs.BitNet.Native.osx-arm64 — macOS ARM64 native library (libllama.dylib)
  • NativeLibraryLoader probes runtimes/{rid}/native/ paths (NuGet convention)
  • Improved error messages with platform-specific NuGet package suggestions

CI/CD

  • build-bitnet-native.yml — cross-platform bitnet.cpp build workflow (3 runners)
  • publish-bitnet-native.yml — native NuGet publish via OIDC trusted publishing

Tests

  • 58 new BitNet tests (NativeLibraryLoaderTests + NativePackageValidationTests)
  • Total: 229 BitNet tests passing

Full Changelog: v0.15.0...v0.16.0

v0.15.0 — BitNet 1.58-bit Model Support

16 Apr 22:30
8ff6130

Choose a tag to compare

What's New in v0.15.0

New Package: ElBruno.LocalLLMs.BitNet

Run Microsoft's BitNet 1.58-bit ternary models in .NET through IChatClient.

Install:

dotnet add package ElBruno.LocalLLMs.BitNet

Highlights:

  • BitNetChatClient implementing IChatClient from Microsoft.Extensions.AI
  • Wraps bitnet.cpp (llama.cpp fork with ternary kernels) via P/Invoke
  • 5 model catalog: BitNet 0.7B, 2B-4T (default), 3B, Falcon3 1B, Falcon3 3B
  • Models as small as 150 MB with excellent CPU performance
  • Streaming, DI registration, platform-specific native lib loading
  • 155 unit tests

Samples:

  • BitNetChat - basic chat completion
  • BitNetPerformance - benchmark BitNet vs ONNX models

Docs:

  • BitNet Guide: docs/bitnet-guide.md
  • Blog Post: docs/blog-bitnet-launch.md

v0.11.0 — RAG XML Docs + Comprehensive Tests

04 Apr 15:55

Choose a tag to compare

What's Changed

Fixed

  • XML Documentation (Issue #12): Added comprehensive XML doc comments to all 13 public types in ElBruno.LocalLLMs.Rag, eliminating all 116 CS1591 warnings

Added

  • 60 new unit tests (Issue #11): Comprehensive test coverage for the RAG package
    • RagRecordTests.cs (27 tests) — record types construction, equality, immutability
    • SqliteDocumentStoreTests.cs (14 tests) — SQLite persistence layer
    • RagServiceExtensionsTests.cs (13 tests) — DI registration
    • LocalRagPipelineConstructorTests.cs (6 tests) — constructor validation

Stats

  • 0 build warnings (was 116)
  • 813 total tests (718 xUnit + 95 MSTest), all pass
  • ElBruno.LocalLLMs: 0.10.0 → 0.11.0
  • ElBruno.LocalLLMs.Rag: 0.1.0 → 0.2.0

v0.10.0 — Zero-Cloud RAG Sample

04 Apr 15:19
1153540

Choose a tag to compare

What's New in v0.10.0

🔌 Zero-Cloud RAG Sample (Closes #9)

A complete offline RAG pipeline — no cloud APIs needed. Combines three ElBruno packages:

  • ElBruno.LocalEmbeddings — real ONNX-based vector embeddings (all-MiniLM model)
  • ElBruno.LocalLLMs — local LLM inference via Phi-3.5-mini-instruct
  • ElBruno.LocalLLMs.Rag — chunking, storage, and retrieval pipeline
dotnet run --project src/samples/ZeroCloudRag

The sample loads documents, chunks them, generates embeddings, indexes in an in-memory vector store, retrieves relevant context for a query, and streams a grounded answer from the local LLM. 11 steps, zero cloud.

🧪 Tests

27 new tests across 3 files:

  • 10 MSTest unit tests for LocalRagPipeline
  • 4 MSTest E2E integration tests (gated behind RUN_INTEGRATION_TESTS)
  • 13 xUnit tests for RAG record types

Total: 757 tests, all pass.

📚 Documentation

  • docs/rag-guide.md — new Zero-Cloud RAG section with DI architecture
  • README.md — ZeroCloudRag added to samples table
  • docs/supported-models.md — RAG model recommendations

Full Changelog: v0.9.0...v0.10.0

v0.9.0 — Qwen2.5-Coder-7B ONNX + OpenAI Server

04 Apr 01:18

Choose a tag to compare

What's New in v0.9.0

🧑‍💻 Qwen2.5-Coder-7B-Instruct — Code Assistant Model

The first code-specialized model in the library! Converted to ONNX INT4 (6.3 GB) and published to elbruno/Qwen2.5-Coder-7B-Instruct-onnx on HuggingFace.

  • Same Qwen chat template as the rest of the Qwen2.5 family
  • Supports tool calling for agent-based coding workflows
  • Works out of the box — HasNativeOnnx = true
var options = new LocalLLMsOptions { Model = KnownModels.Qwen25Coder_7BInstruct };
var client = new LocalChatClient(options);
var response = await client.GetResponseAsync("Write a quicksort in C#");

🌐 OpenAI-Compatible HTTP Server Sample

New src/samples/OpenAiServer — a minimal ASP.NET Core server that exposes local ONNX models via OpenAI-compatible REST endpoints.

  • POST /v1/chat/completions (streaming SSE + non-streaming)
  • GET /v1/models (list available models)
  • Works with VS Code Copilot custom models, Continue, Cody, and any OpenAI SDK client
  • Includes chatLanguageModels.json config for VS Code integration

📋 Blocked Models Documentation

  • Codestral 22B v0.1 — MNPL-0.1 license prohibits production use
  • Devstral Small 2 (24B) — no ONNX conversion path (custom Tekken tokenizer, FP8 quantization)

✅ Quality

  • 705 tests pass (8 new for Qwen2.5-Coder model definition)
  • 0 warnings, 0 errors across entire solution

Full Changelog: v0.8.0...v0.9.0

v0.7.2

28 Mar 17:33

Choose a tag to compare

What's Changed

  • feat: DX improvements — Issue #7 fix + exceptions + logging + diagnostics + builder (#7) by @elbruno in #8

Full Changelog: v0.7.0...v0.7.2

v0.7.1

28 Mar 15:53
17192f2

Choose a tag to compare

What's Changed

  • fix: MaxSequenceLength reports effective runtime limit by @elbruno in #6

Full Changelog: v0.6.0...v0.7.1

v0.7.0

28 Mar 15:52
17192f2

Choose a tag to compare

What's Changed

  • fix: MaxSequenceLength reports effective runtime limit by @elbruno in #6

Full Changelog: v0.6.0...v0.7.0

v0.6.1

28 Mar 15:13

Choose a tag to compare

What's Changed

  • feat: expose model metadata (context window, model name) via LocalChatClient by @elbruno in #4

New Contributors

Full Changelog: v0.5.0...v0.6.1

v0.6.0

28 Mar 15:12

Choose a tag to compare

What's Changed

  • feat: expose model metadata (context window, model name) via LocalChatClient by @elbruno in #4

New Contributors

Full Changelog: v0.5.0...v0.6.0