Releases · elbruno/ElBruno.LocalLLMs

17 Apr 13:32

elbruno

v0.16.0

15c0577

v0.16.0 — BitNet Auto-Download & Native NuGet Packages Latest

Latest

What's New

BitNet Auto-Download from HuggingFace

GGUF models now auto-download on first run — zero manual setup for model files
BitNetModelDownloader with cache-first logic via ElBruno.HuggingFace.Downloader
BitNetChatClient.CreateAsync() factory with progress reporting
EnsureModelDownloaded option (default: true)

Platform-Specific Native NuGet Packages

ElBruno.LocalLLMs.BitNet.Native.win-x64 — Windows x64 native library (llama.dll)
ElBruno.LocalLLMs.BitNet.Native.linux-x64 — Linux x64 native library (libllama.so)
ElBruno.LocalLLMs.BitNet.Native.osx-arm64 — macOS ARM64 native library (libllama.dylib)
NativeLibraryLoader probes runtimes/{rid}/native/ paths (NuGet convention)
Improved error messages with platform-specific NuGet package suggestions

CI/CD

build-bitnet-native.yml — cross-platform bitnet.cpp build workflow (3 runners)
publish-bitnet-native.yml — native NuGet publish via OIDC trusted publishing

Tests

58 new BitNet tests (NativeLibraryLoaderTests + NativePackageValidationTests)
Total: 229 BitNet tests passing

Full Changelog: v0.15.0...v0.16.0

Assets 2

16 Apr 22:30

github-actions

v0.15.0

8ff6130

v0.15.0 — BitNet 1.58-bit Model Support

What's New in v0.15.0

New Package: ElBruno.LocalLLMs.BitNet

Run Microsoft's BitNet 1.58-bit ternary models in .NET through IChatClient.

Install:

dotnet add package ElBruno.LocalLLMs.BitNet

Highlights:

BitNetChatClient implementing IChatClient from Microsoft.Extensions.AI
Wraps bitnet.cpp (llama.cpp fork with ternary kernels) via P/Invoke
5 model catalog: BitNet 0.7B, 2B-4T (default), 3B, Falcon3 1B, Falcon3 3B
Models as small as 150 MB with excellent CPU performance
Streaming, DI registration, platform-specific native lib loading
155 unit tests

Samples:

BitNetChat - basic chat completion
BitNetPerformance - benchmark BitNet vs ONNX models

Docs:

BitNet Guide: docs/bitnet-guide.md
Blog Post: docs/blog-bitnet-launch.md

Assets 2

04 Apr 15:55

elbruno

v0.11.0

949ca97

v0.11.0 — RAG XML Docs + Comprehensive Tests

What's Changed

Fixed

XML Documentation (Issue #12): Added comprehensive XML doc comments to all 13 public types in ElBruno.LocalLLMs.Rag, eliminating all 116 CS1591 warnings

Added

60 new unit tests (Issue #11): Comprehensive test coverage for the RAG package
- RagRecordTests.cs (27 tests) — record types construction, equality, immutability
- SqliteDocumentStoreTests.cs (14 tests) — SQLite persistence layer
- RagServiceExtensionsTests.cs (13 tests) — DI registration
- LocalRagPipelineConstructorTests.cs (6 tests) — constructor validation

Stats

0 build warnings (was 116)
813 total tests (718 xUnit + 95 MSTest), all pass
ElBruno.LocalLLMs: 0.10.0 → 0.11.0
ElBruno.LocalLLMs.Rag: 0.1.0 → 0.2.0

Assets 2

04 Apr 15:19

elbruno

v0.10.0

1153540

v0.10.0 — Zero-Cloud RAG Sample

What's New in v0.10.0

🔌 Zero-Cloud RAG Sample (Closes #9)

A complete offline RAG pipeline — no cloud APIs needed. Combines three ElBruno packages:

ElBruno.LocalEmbeddings — real ONNX-based vector embeddings (all-MiniLM model)
ElBruno.LocalLLMs — local LLM inference via Phi-3.5-mini-instruct
ElBruno.LocalLLMs.Rag — chunking, storage, and retrieval pipeline

dotnet run --project src/samples/ZeroCloudRag

The sample loads documents, chunks them, generates embeddings, indexes in an in-memory vector store, retrieves relevant context for a query, and streams a grounded answer from the local LLM. 11 steps, zero cloud.

🧪 Tests

27 new tests across 3 files:

10 MSTest unit tests for LocalRagPipeline
4 MSTest E2E integration tests (gated behind RUN_INTEGRATION_TESTS)
13 xUnit tests for RAG record types

Total: 757 tests, all pass.

📚 Documentation

docs/rag-guide.md — new Zero-Cloud RAG section with DI architecture
README.md — ZeroCloudRag added to samples table
docs/supported-models.md — RAG model recommendations

Full Changelog: v0.9.0...v0.10.0

Assets 2

04 Apr 01:18

elbruno

v0.9.0

e81990e

v0.9.0 — Qwen2.5-Coder-7B ONNX + OpenAI Server

What's New in v0.9.0

🧑‍💻 Qwen2.5-Coder-7B-Instruct — Code Assistant Model

The first code-specialized model in the library! Converted to ONNX INT4 (6.3 GB) and published to elbruno/Qwen2.5-Coder-7B-Instruct-onnx on HuggingFace.

Same Qwen chat template as the rest of the Qwen2.5 family
Supports tool calling for agent-based coding workflows
Works out of the box — HasNativeOnnx = true

var options = new LocalLLMsOptions { Model = KnownModels.Qwen25Coder_7BInstruct };
var client = new LocalChatClient(options);
var response = await client.GetResponseAsync("Write a quicksort in C#");

🌐 OpenAI-Compatible HTTP Server Sample

New src/samples/OpenAiServer — a minimal ASP.NET Core server that exposes local ONNX models via OpenAI-compatible REST endpoints.

POST /v1/chat/completions (streaming SSE + non-streaming)
GET /v1/models (list available models)
Works with VS Code Copilot custom models, Continue, Cody, and any OpenAI SDK client
Includes chatLanguageModels.json config for VS Code integration

📋 Blocked Models Documentation

Codestral 22B v0.1 — MNPL-0.1 license prohibits production use
Devstral Small 2 (24B) — no ONNX conversion path (custom Tekken tokenizer, FP8 quantization)

✅ Quality

705 tests pass (8 new for Qwen2.5-Coder model definition)
0 warnings, 0 errors across entire solution

Full Changelog: v0.8.0...v0.9.0

Assets 2

28 Mar 17:33

elbruno

v0.7.2

16de59c

v0.7.2

What's Changed

feat: DX improvements — Issue #7 fix + exceptions + logging + diagnostics + builder (#7) by @elbruno in #8

Full Changelog: v0.7.0...v0.7.2

Contributors

elbruno

Assets 2

28 Mar 15:53

elbruno

v0.7.1

17192f2

v0.7.1

What's Changed

fix: MaxSequenceLength reports effective runtime limit by @elbruno in #6

Full Changelog: v0.6.0...v0.7.1

Contributors

elbruno

Assets 2

28 Mar 15:52

github-actions

v0.7.0

17192f2

v0.7.0

What's Changed

fix: MaxSequenceLength reports effective runtime limit by @elbruno in #6

Full Changelog: v0.6.0...v0.7.0

Contributors

elbruno

Assets 2

28 Mar 15:13

elbruno

v0.6.1

0fec513

v0.6.1

What's Changed

feat: expose model metadata (context window, model name) via LocalChatClient by @elbruno in #4

New Contributors

@elbruno made their first contribution in #4

Full Changelog: v0.5.0...v0.6.1

Contributors

elbruno

Assets 2

28 Mar 15:12

github-actions

v0.6.0

0fec513

v0.6.0

What's Changed

feat: expose model metadata (context window, model name) via LocalChatClient by @elbruno in #4

New Contributors

@elbruno made their first contribution in #4

Full Changelog: v0.5.0...v0.6.0

Contributors

elbruno

Assets 2

Releases: elbruno/ElBruno.LocalLLMs

v0.16.0 — BitNet Auto-Download & Native NuGet Packages

What's New

BitNet Auto-Download from HuggingFace

Platform-Specific Native NuGet Packages

CI/CD

Tests

Uh oh!

v0.15.0 — BitNet 1.58-bit Model Support

What's New in v0.15.0

New Package: ElBruno.LocalLLMs.BitNet

Uh oh!

v0.11.0 — RAG XML Docs + Comprehensive Tests

What's Changed

Fixed

Added

Stats

Uh oh!

v0.10.0 — Zero-Cloud RAG Sample

What's New in v0.10.0

🔌 Zero-Cloud RAG Sample (Closes #9)

🧪 Tests

📚 Documentation

Uh oh!

v0.9.0 — Qwen2.5-Coder-7B ONNX + OpenAI Server

What's New in v0.9.0

🧑‍💻 Qwen2.5-Coder-7B-Instruct — Code Assistant Model

🌐 OpenAI-Compatible HTTP Server Sample

📋 Blocked Models Documentation

✅ Quality

Uh oh!

v0.7.2

What's Changed

Contributors

Uh oh!

v0.7.1

What's Changed

Contributors

Uh oh!

v0.7.0

What's Changed

Contributors

Uh oh!

v0.6.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.0

What's Changed

New Contributors

Contributors

Uh oh!