Skip to content

Tags: allozaur/llama.cpp

Tags

b6904

Toggle b6904's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: Volta tensor core support for MMF (ggml-org#16843)

* CUDA: Volta tensor core support for MMF

* more generic checks for hardware support

* Update ggml/src/ggml-cuda/mmf.cuh

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

---------

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

b6820

Toggle b6820's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
webui: introduce OpenAI-compatible model selector in JSON payload (gg…

…ml-org#16562)

* webui: introduce OpenAI-compatible model selector in JSON payload

* webui: restore OpenAI-Compatible model source of truth and unify metadata capture

This change re-establishes a single, reliable source of truth for the active model:
fully aligned with the OpenAI-Compat API behavior

It introduces a unified metadata flow that captures the model field from both
streaming and non-streaming responses, wiring a new onModel callback through ChatService
The model name is now resolved directly from the API payload rather than relying on
server /props or UI assumptions

ChatStore records and persists the resolved model for each assistant message during
streaming, ensuring consistency across the UI and database
Type definitions for API and settings were also extended to include model metadata
and the onModel callback, completing the alignment with OpenAI-Compat semantics

* webui: address review feedback from allozaur

* webui: move model selector into ChatForm (idea by @allozaur)

* webui: make model selector more subtle and integrated into ChatForm

* webui: replaced the Flowbite selector with a native Svelte dropdown

* webui: add developer setting to toggle the chat model selector

* webui: address review feedback from allozaur

Normalized streamed model names during chat updates
by trimming input and removing directory components before saving
or persisting them, so the conversation UI shows only the filename

Forced model names within the chat form selector dropdown to render as
a single-line, truncated entry with a tooltip revealing the full name

* webui: toggle displayed model source for legacy vs OpenAI-Compat modes

When the selector is disabled, it falls back to the active server model name from /props

When the model selector is enabled, the displayed model comes from the message metadata
(the one explicitly selected and sent in the request)

* Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormActions.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/constants/localstorage-keys.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/services/chat.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/services/chat.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: refactor model selector and persistence helpers

- Replace inline portal and event listeners with proper Svelte bindings
- Introduce 'persisted' store helper for localStorage sync without runes
- Extract 'normalizeModelName' utils + Vitest coverage
- Simplify ChatFormModelSelector structure and cleanup logic

Replaced the persisted store helper's use of '$state/$effect' runes with
a plain TS implementation to prevent orphaned effect runtime errors
outside component context

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: document normalizeModelName usage with inline examples

* Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/stores/models.svelte.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/stores/models.svelte.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: extract ModelOption type into dedicated models.d.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: refine ChatMessageAssistant displayedModel source logic

* webui: stabilize dropdown, simplify model extraction, and init assistant model field

* chore: update webui static build

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: npm format, update webui static build

* webui: align sidebar trigger position, remove z-index glitch

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

b6807

Toggle b6807's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Prevent premature submission on IME input (ggml-org#16673)

* fix: Prevent premature submission on IME input

* chore: update webui static build

* refactor: Put IME completion checker in a helper function and add checking for `KeyboardEvent.eventKey === 229`

* chore: update webui static build

* chore: update webui static build

* chore: update webui static build

b6800

Toggle b6800's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
model : add Granite Hybrid types (ggml-org#16635)

add Granite 4 models mapping their embedding dimensions to the # of
parameters.

Information taken from https://huggingface.co/ibm-granite/granite-4.0-h-tiny

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

b6782

Toggle b6782's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
mtmd : support home-cooked Mistral Small Omni (ggml-org#14928)

b6779

Toggle b6779's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CANN: format code using .clang-format (ggml-org#15863)

This commit applies .clang-format rules to all source files under the
ggml-cann directory to ensure consistent coding style and readability.
The .clang-format option `SortIncludes: false` has been set to disable
automatic reordering of include directives.
No functional changes are introduced.

Co-authored-by: hipudding <huafengchun@gmail.com>

b6765

Toggle b6765's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : avoid using Metal's gpuAddress property (ggml-org#16576)

* metal : avoid using Metal's gpuAddress property

* metal : fix rope kernels buffer check

b6749

Toggle b6749's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: add remark plugin to render raw HTML as literal text (ggml-org#1…

…6505)

* fix: add remark plugin to render raw HTML as literal text

Implemented a missing MDAST stage to neutralize raw HTML like major LLM WebUIs
do ensuring consistent and safe Markdown rendering

Introduced 'remarkLiteralHtml', a plugin that converts raw HTML nodes in the
Markdown AST into plain-text equivalents while preserving indentation and
line breaks. This ensures consistent rendering and prevents unintended HTML
execution, without altering valid Markdown structure

Kept 'remarkRehype' in the pipeline since it performs the required conversion
from MDAST to HAST for KaTeX, syntax highlighting, and HTML serialization

Refined the link-enhancement logic to skip unnecessary DOM rewrites,
fixing a subtle bug where extra paragraphs were injected after the first
line due to full innerHTML reconstruction, and ensuring links open in new
tabs only when required

Final pipeline: remarkGfm -> remarkMath -> remarkBreaks -> remarkLiteralHtml
-> remarkRehype -> rehypeKatex -> rehypeHighlight -> rehypeStringify

* fix: address review feedback from allozaur

* chore: update webui build output

b6746

Toggle b6746's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CANN: Update several operators to support FP16 data format (ggml-org#…

…16251)

Many Ascend operators internally use FP16 precision for computation.
If input data is in FP32, it must first be cast to FP16 before
computation, and then cast back to FP32 after computation, which
introduces unnecessary cast operations. Moreover, FP16 computation
requires significantly less workload compared to FP32, leading to
noticeable efficiency improvements.

In this change, `get_rows`, `rms_norm`, and `flash_attn_ext` are extended
to support multiple data types. Validation on the Qwen2 0.5b model shows
correct accuracy and about 10% performance gain in concurrent scenarios.

Co-authored-by: noemotiovon <757486878@qq.com>

b6725

Toggle b6725's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
webui: updated the chat service to only include max_tokens in the req… (

ggml-org#16489)

* webui: updated the chat service to only include max_tokens in the request payload when the setting is explicitly provided, while still mapping explicit zero or null values to the infinite-token sentinel

* chore: update webui build output