tag:github.com,2008:https://github.com/elbruno/ElBruno.LocalLLMs/releases
Release notes from ElBruno.LocalLLMs
2026-04-17T13:31:47Z
tag:github.com,2008:Repository/1184789679/v0.16.0
2026-04-17T13:32:00Z
v0.16.0 — BitNet Auto-Download & Native NuGet Packages
<h2>What's New</h2>
<h3>BitNet Auto-Download from HuggingFace</h3>
<ul>
<li>GGUF models now auto-download on first run — zero manual setup for model files</li>
<li><code>BitNetModelDownloader</code> with cache-first logic via <code>ElBruno.HuggingFace.Downloader</code></li>
<li><code>BitNetChatClient.CreateAsync()</code> factory with progress reporting</li>
<li><code>EnsureModelDownloaded</code> option (default: true)</li>
</ul>
<h3>Platform-Specific Native NuGet Packages</h3>
<ul>
<li><code>ElBruno.LocalLLMs.BitNet.Native.win-x64</code> — Windows x64 native library (llama.dll)</li>
<li><code>ElBruno.LocalLLMs.BitNet.Native.linux-x64</code> — Linux x64 native library (libllama.so)</li>
<li><code>ElBruno.LocalLLMs.BitNet.Native.osx-arm64</code> — macOS ARM64 native library (libllama.dylib)</li>
<li>NativeLibraryLoader probes <code>runtimes/{rid}/native/</code> paths (NuGet convention)</li>
<li>Improved error messages with platform-specific NuGet package suggestions</li>
</ul>
<h3>CI/CD</h3>
<ul>
<li><code>build-bitnet-native.yml</code> — cross-platform bitnet.cpp build workflow (3 runners)</li>
<li><code>publish-bitnet-native.yml</code> — native NuGet publish via OIDC trusted publishing</li>
</ul>
<h3>Tests</h3>
<ul>
<li>58 new BitNet tests (NativeLibraryLoaderTests + NativePackageValidationTests)</li>
<li>Total: 229 BitNet tests passing</li>
</ul>
<hr>
<p><strong>Full Changelog:</strong> <a class="commit-link" href="https://github.com/elbruno/ElBruno.LocalLLMs/compare/v0.15.0...v0.16.0"><tt>v0.15.0...v0.16.0</tt></a></p>
elbruno
tag:github.com,2008:Repository/1184789679/v0.15.0
2026-04-16T22:31:01Z
v0.15.0 — BitNet 1.58-bit Model Support
<h2>What's New in v0.15.0</h2>
<h3>New Package: ElBruno.LocalLLMs.BitNet</h3>
<p>Run Microsoft's BitNet 1.58-bit ternary models in .NET through IChatClient.</p>
<p><strong>Install:</strong></p>
<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="dotnet add package ElBruno.LocalLLMs.BitNet"><pre class="notranslate"><code>dotnet add package ElBruno.LocalLLMs.BitNet
</code></pre></div>
<p><strong>Highlights:</strong></p>
<ul>
<li>BitNetChatClient implementing IChatClient from Microsoft.Extensions.AI</li>
<li>Wraps bitnet.cpp (llama.cpp fork with ternary kernels) via P/Invoke</li>
<li>5 model catalog: BitNet 0.7B, 2B-4T (default), 3B, Falcon3 1B, Falcon3 3B</li>
<li>Models as small as 150 MB with excellent CPU performance</li>
<li>Streaming, DI registration, platform-specific native lib loading</li>
<li>155 unit tests</li>
</ul>
<p><strong>Samples:</strong></p>
<ul>
<li>BitNetChat - basic chat completion</li>
<li>BitNetPerformance - benchmark BitNet vs ONNX models</li>
</ul>
<p><strong>Docs:</strong></p>
<ul>
<li>BitNet Guide: docs/bitnet-guide.md</li>
<li>Blog Post: docs/blog-bitnet-launch.md</li>
</ul>
github-actions[bot]
tag:github.com,2008:Repository/1184789679/v0.11.0
2026-04-04T16:12:23Z
v0.11.0 — RAG XML Docs + Comprehensive Tests
<h2>What's Changed</h2>
<h3>Fixed</h3>
<ul>
<li><strong>XML Documentation (Issue <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4205211822" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/12" data-hovercard-type="issue" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/issues/12/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/issues/12">#12</a>):</strong> Added comprehensive XML doc comments to all 13 public types in ElBruno.LocalLLMs.Rag, eliminating all 116 CS1591 warnings</li>
</ul>
<h3>Added</h3>
<ul>
<li><strong>60 new unit tests (Issue <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4205211814" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/11" data-hovercard-type="issue" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/issues/11/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/issues/11">#11</a>):</strong> Comprehensive test coverage for the RAG package
<ul>
<li>RagRecordTests.cs (27 tests) — record types construction, equality, immutability</li>
<li>SqliteDocumentStoreTests.cs (14 tests) — SQLite persistence layer</li>
<li>RagServiceExtensionsTests.cs (13 tests) — DI registration</li>
<li>LocalRagPipelineConstructorTests.cs (6 tests) — constructor validation</li>
</ul>
</li>
</ul>
<h3>Stats</h3>
<ul>
<li><strong>0 build warnings</strong> (was 116)</li>
<li><strong>813 total tests</strong> (718 xUnit + 95 MSTest), all pass</li>
<li>ElBruno.LocalLLMs: 0.10.0 → 0.11.0</li>
<li>ElBruno.LocalLLMs.Rag: 0.1.0 → 0.2.0</li>
</ul>
elbruno
tag:github.com,2008:Repository/1184789679/v0.10.0
2026-04-04T15:19:48Z
v0.10.0 — Zero-Cloud RAG Sample
<h2>What's New in v0.10.0</h2>
<h3>🔌 Zero-Cloud RAG Sample (Closes <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4204938514" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/9" data-hovercard-type="issue" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/issues/9/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/issues/9">#9</a>)</h3>
<p>A complete offline RAG pipeline — no cloud APIs needed. Combines three ElBruno packages:</p>
<ul>
<li><strong><code>ElBruno.LocalEmbeddings</code></strong> — real ONNX-based vector embeddings (all-MiniLM model)</li>
<li><strong><code>ElBruno.LocalLLMs</code></strong> — local LLM inference via Phi-3.5-mini-instruct</li>
<li><strong><code>ElBruno.LocalLLMs.Rag</code></strong> — chunking, storage, and retrieval pipeline</li>
</ul>
<div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="dotnet run --project src/samples/ZeroCloudRag"><pre>dotnet run --project src/samples/ZeroCloudRag</pre></div>
<p>The sample loads documents, chunks them, generates embeddings, indexes in an in-memory vector store, retrieves relevant context for a query, and streams a grounded answer from the local LLM. 11 steps, zero cloud.</p>
<h3>🧪 Tests</h3>
<p>27 new tests across 3 files:</p>
<ul>
<li>10 MSTest unit tests for <code>LocalRagPipeline</code></li>
<li>4 MSTest E2E integration tests (gated behind <code>RUN_INTEGRATION_TESTS</code>)</li>
<li>13 xUnit tests for RAG record types</li>
</ul>
<p><strong>Total: 757 tests, all pass.</strong></p>
<h3>📚 Documentation</h3>
<ul>
<li><code>docs/rag-guide.md</code> — new Zero-Cloud RAG section with DI architecture</li>
<li><code>README.md</code> — ZeroCloudRag added to samples table</li>
<li><code>docs/supported-models.md</code> — RAG model recommendations</li>
</ul>
<p><strong>Full Changelog</strong>: <a class="commit-link" href="https://github.com/elbruno/ElBruno.LocalLLMs/compare/v0.9.0...v0.10.0"><tt>v0.9.0...v0.10.0</tt></a></p>
elbruno
tag:github.com,2008:Repository/1184789679/v0.9.0
2026-04-04T01:18:49Z
v0.9.0 — Qwen2.5-Coder-7B ONNX + OpenAI Server
<h2>What's New in v0.9.0</h2>
<h3>🧑💻 Qwen2.5-Coder-7B-Instruct — Code Assistant Model</h3>
<p>The first code-specialized model in the library! Converted to ONNX INT4 (6.3 GB) and published to <a href="https://huggingface.co/elbruno/Qwen2.5-Coder-7B-Instruct-onnx" rel="nofollow">elbruno/Qwen2.5-Coder-7B-Instruct-onnx</a> on HuggingFace.</p>
<ul>
<li>Same Qwen chat template as the rest of the Qwen2.5 family</li>
<li>Supports tool calling for agent-based coding workflows</li>
<li>Works out of the box — <code>HasNativeOnnx = true</code></li>
</ul>
<div class="highlight highlight-source-cs notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="var options = new LocalLLMsOptions { Model = KnownModels.Qwen25Coder_7BInstruct };
var client = new LocalChatClient(options);
var response = await client.GetResponseAsync("Write a quicksort in C#");"><pre><span class="pl-k">var</span> <span class="pl-s1">options</span> <span class="pl-c1">=</span> <span class="pl-k">new</span> <span class="pl-smi">LocalLLMsOptions</span> <span class="pl-kos">{</span> <span class="pl-s1">Model</span> <span class="pl-c1">=</span> <span class="pl-s1">KnownModels</span><span class="pl-kos">.</span><span class="pl-s1">Qwen25Coder_7BInstruct</span> <span class="pl-kos">}</span><span class="pl-kos">;</span>
<span class="pl-k">var</span> <span class="pl-s1">client</span> <span class="pl-c1">=</span> <span class="pl-k">new</span> <span class="pl-smi">LocalChatClient</span><span class="pl-kos">(</span><span class="pl-s1">options</span><span class="pl-kos">)</span><span class="pl-kos">;</span>
<span class="pl-k">var</span> <span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-k">await</span> <span class="pl-s1">client</span><span class="pl-kos">.</span><span class="pl-en">GetResponseAsync</span><span class="pl-kos">(</span><span class="pl-s">"Write a quicksort in C#"</span><span class="pl-kos">)</span><span class="pl-kos">;</span></pre></div>
<h3>🌐 OpenAI-Compatible HTTP Server Sample</h3>
<p>New <code>src/samples/OpenAiServer</code> — a minimal ASP.NET Core server that exposes local ONNX models via OpenAI-compatible REST endpoints.</p>
<ul>
<li><code>POST /v1/chat/completions</code> (streaming SSE + non-streaming)</li>
<li><code>GET /v1/models</code> (list available models)</li>
<li>Works with <strong>VS Code Copilot</strong> custom models, Continue, Cody, and any OpenAI SDK client</li>
<li>Includes <code>chatLanguageModels.json</code> config for VS Code integration</li>
</ul>
<h3>📋 Blocked Models Documentation</h3>
<ul>
<li><strong>Codestral 22B v0.1</strong> — MNPL-0.1 license prohibits production use</li>
<li><strong>Devstral Small 2 (24B)</strong> — no ONNX conversion path (custom Tekken tokenizer, FP8 quantization)</li>
</ul>
<h3>✅ Quality</h3>
<ul>
<li><strong>705 tests pass</strong> (8 new for Qwen2.5-Coder model definition)</li>
<li><strong>0 warnings, 0 errors</strong> across entire solution</li>
</ul>
<p><strong>Full Changelog</strong>: <a class="commit-link" href="https://github.com/elbruno/ElBruno.LocalLLMs/compare/v0.8.0...v0.9.0"><tt>v0.8.0...v0.9.0</tt></a></p>
elbruno
tag:github.com,2008:Repository/1184789679/v0.8.0
2026-04-03T20:42:22Z
v0.8.0 — Gemma 4 model family support
<p>New features:</p>
<ul>
<li>4 Gemma 4 model definitions (E2B, E4B, 26B MoE, 31B Dense)</li>
<li>215+ new tests (model definitions, tool-calling, multilingual)</li>
<li>Dedicated ONNX conversion scripts for Gemma 4</li>
<li>Blog post announcing Gemma 4 support</li>
<li>Comprehensive documentation updates</li>
</ul>
<p>ONNX conversion status: Pending onnxruntime-genai runtime support<br>
for Gemma 4's PLE architecture.</p>
elbruno
tag:github.com,2008:Repository/1184789679/v0.7.2
2026-03-28T17:33:56Z
v0.7.2
<h2>What's Changed</h2>
<ul>
<li>feat: DX improvements — Issue <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4160822063" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/7" data-hovercard-type="issue" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/issues/7/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/issues/7">#7</a> fix + exceptions + logging + diagnostics + builder (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4160822063" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/7" data-hovercard-type="issue" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/issues/7/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/issues/7">#7</a>) by <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/elbruno/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/elbruno">@elbruno</a> in <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4160989619" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/8" data-hovercard-type="pull_request" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/pull/8/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/pull/8">#8</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a class="commit-link" href="https://github.com/elbruno/ElBruno.LocalLLMs/compare/v0.7.0...v0.7.2"><tt>v0.7.0...v0.7.2</tt></a></p>
elbruno
tag:github.com,2008:Repository/1184789679/v0.7.1
2026-03-28T15:53:17Z
v0.7.1
<h2>What's Changed</h2>
<ul>
<li>fix: MaxSequenceLength reports effective runtime limit by <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/elbruno/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/elbruno">@elbruno</a> in <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4160516146" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/6" data-hovercard-type="pull_request" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/pull/6/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/pull/6">#6</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a class="commit-link" href="https://github.com/elbruno/ElBruno.LocalLLMs/compare/v0.6.0...v0.7.1"><tt>v0.6.0...v0.7.1</tt></a></p>
elbruno
tag:github.com,2008:Repository/1184789679/v0.7.0
2026-03-28T15:52:57Z
v0.7.0
<h2>What's Changed</h2>
<ul>
<li>fix: MaxSequenceLength reports effective runtime limit by <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/elbruno/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/elbruno">@elbruno</a> in <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4160516146" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/6" data-hovercard-type="pull_request" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/pull/6/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/pull/6">#6</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a class="commit-link" href="https://github.com/elbruno/ElBruno.LocalLLMs/compare/v0.6.0...v0.7.0"><tt>v0.6.0...v0.7.0</tt></a></p>
github-actions[bot]
tag:github.com,2008:Repository/1184789679/v0.6.1
2026-03-28T15:13:16Z
v0.6.1
<h2>What's Changed</h2>
<ul>
<li>feat: expose model metadata (context window, model name) via LocalChatClient by <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/elbruno/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/elbruno">@elbruno</a> in <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4160300102" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/4" data-hovercard-type="pull_request" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/pull/4/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/pull/4">#4</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/elbruno/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/elbruno">@elbruno</a> made their first contribution in <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4160300102" data-permission-text="Title is private" data-url="https://github.com/elbruno/ElBruno.LocalLLMs/issues/4" data-hovercard-type="pull_request" data-hovercard-url="/elbruno/ElBruno.LocalLLMs/pull/4/hovercard" href="https://github.com/elbruno/ElBruno.LocalLLMs/pull/4">#4</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a class="commit-link" href="https://github.com/elbruno/ElBruno.LocalLLMs/compare/v0.5.0...v0.6.1"><tt>v0.5.0...v0.6.1</tt></a></p>
elbruno