Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 36 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ Current architecture:
- Global shortcut handling goes through `KGlobalAccel`
- Audio capture uses Qt Multimedia
- Transcription is in-process through vendored `whisper.cpp`
- Native Mutterkey model packages are now the canonical model artifact; raw
whisper.cpp-compatible `.bin` files remain only as a migration/import path
- The public runtime seam is streaming-first through app-owned chunks, events, and compatibility helpers
- Static backend support lives in `BackendCapabilities`, while runtime/device/model inspection lives in `RuntimeDiagnostics`
- Clipboard writes prefer `KSystemClipboard` with `QClipboard` fallback
Expand All @@ -35,6 +37,11 @@ This repository is intentionally kept minimal:
- `src/clipboardwriter.*`: clipboard integration, preferring KDE system clipboard support
- `src/audio/recordingnormalizer.*`: conversion to runtime-ready mono `float32` at `16 kHz`
- `src/transcription/audiochunker.*`: deterministic chunking of normalized audio for the streaming runtime path
- `src/transcription/modelpackage.*`: product-owned manifest and validated package value types
- `src/transcription/modelvalidator.*`: package integrity, compatibility, and bounds validation
- `src/transcription/modelcatalog.*`: model artifact inspection and resolution
- `src/transcription/rawwhisperprobe.*`: lightweight raw whisper.cpp header inspection used for migration compatibility
- `src/transcription/rawwhisperimporter.*`: import path from raw Whisper `.bin` files into native Mutterkey packages
- `src/transcription/transcriptassembler.*`: final transcript assembly from streaming transcript events
- `src/transcription/transcriptioncompat.*`: compatibility wrapper that routes one-shot recordings through the streaming runtime seam
- `src/transcription/whispercpptranscriber.*`: in-process Whisper integration and whisper-specific engine construction
Expand Down Expand Up @@ -112,7 +119,7 @@ QT_QPA_PLATFORM=offscreen "$BUILD_DIR/mutterkey" diagnose 1

Notes:

- `once` mode requires microphone access and a valid Whisper model path
- `once` mode requires microphone access and a valid model artifact path
- Real transcription verification needs a configured model in `~/.config/mutterkey/config.json` or a custom config path
- A small `Qt Test` + `CTest` suite exists for config loading, audio normalization, streaming-runtime helpers, and transcription-worker orchestration, including malformed JSON, wrong-type config inputs, recording-normalizer edge cases, and fake streaming backend behavior
- Repo-owned test cases are expected to carry `WHAT/HOW/WHY` comments near the start of each real test body; `scripts/check-test-commentary.sh` and `scripts/check-release-hygiene.sh` enforce that convention
Expand All @@ -130,6 +137,9 @@ Notes:
- Use `cmake --build "$BUILD_DIR" --target docs` when touching repo-owned public headers, Doxygen config, the Doxygen main page, or CI/docs wiring
- If install rules or licensing files change, confirm the temporary install contains the expected files under `share/licenses/mutterkey`
- If you add or change public methods in repo-owned headers, expect `cmake --build "$BUILD_DIR" --target docs` to fail until the new API is documented; treat that as part of the normal implementation loop, not follow-up polish
- Newly added repo-owned public structs and free functions in public headers also
need Doxygen comments immediately; the `docs` target treats undocumented new
API surface as a real failure, not optional cleanup

## Tooling Best Practices

Expand Down Expand Up @@ -166,11 +176,17 @@ Notes:
- Avoid introducing optional backends, plugin systems, or cross-platform abstractions unless the task requires them
- Keep the audio path explicit: recorder output may not already match Whisper input requirements, so preserve normalization behavior
- Prefer product-owned naming such as runtime audio, chunks, events, diagnostics, and compatibility wrappers over backend-shaped naming when touching app-owned code
- Prefer product-owned model terminology too: package, manifest, catalog, metadata,
compatibility marker, and model artifact path are the primary nouns now;
reserve backend-shaped wording for the whisper adapter or raw-file migration path
- Prefer narrow shared value types across subsystems; for example, consumers that only need captured audio should include `src/audio/recording.h`, not the full recorder class
- Keep JSON and other transport details at subsystem boundaries; prefer typed C++ snapshots/results once data crosses into app-owned control, tray, or service code
- Prefer dependency injection for tray-shell and control-surface code from the first implementation so headless Qt tests stay simple
- When preparing the transcription path for future runtime work, prefer app-owned engine/session seams and injected sessions over leaking concrete backend types into CLI, service, or worker orchestration. Keep immutable capability reporting on the engine side, keep runtime inspection data in `RuntimeDiagnostics`, and keep the session side focused on mutable decode state, warmup, chunk ingestion, finish, and cancellation
- Prefer product-owned runtime interfaces, model/session separation, and deterministic backend selection before adding new inference backends or widening cross-platform support
- Keep model validation, metadata extraction, and compatibility checks app-owned.
`whisper.cpp` should not be the first component that tells Mutterkey whether a
model artifact is obviously malformed, incompatible, or oversized
- Keep compatibility shims explicit in naming. If a one-shot daemon/CLI path is implemented on top of the streaming runtime seam, name it as a compatibility wrapper rather than making the old one-shot shape look like the primary contract
- Keep backend-specific validation out of `src/config.*` when practical. Product config parsing should normalize and preserve user input, while backend support checks should live in the app-owned runtime layer near `src/transcription/*`
- Preserve the current product direction: embedded `whisper.cpp`, KDE-first, CLI/service-first
Expand Down Expand Up @@ -199,6 +215,9 @@ Apply the C++ Core Guidelines selectively and pragmatically. For this repo, the
- `scripts/update-whisper.sh` requires a clean Git work tree before it will fetch or run subtree operations
- Treat `third_party/whisper.cpp` as subtree-managed vendor content and update it through the helper script rather than manual directory replacement
- Prefer changing app-side integration code before patching vendored dependency code
- Prefer resolving model-package, metadata, and import work entirely in app-owned
code. Raw whisper.cpp `.bin` support is now a compatibility/import concern, not
the canonical product contract
- Prefer keeping fake runtime tests and app-owned helpers free of vendored whisper linkage unless the test is specifically about the whisper adapter or engine factory
- Prefer fixing vendored target metadata from the top-level CMake when the issue is Mutterkey packaging or warning noise, instead of patching upstream vendored files directly
- If you must modify vendored code, document why in the final response and record the deviation in `third_party/whisper.cpp.UPSTREAM.md`
Expand All @@ -209,6 +228,9 @@ Apply the C++ Core Guidelines selectively and pragmatically. For this repo, the
- Repo-owned source is MIT-licensed in `LICENSE`
- Third-party licensing and provenance notes live in `THIRD_PARTY_NOTICES.md`
- `whisper.cpp` model files are not bundled; do not add model binaries to the repository
- Native Mutterkey model packages also must not be committed to the repository;
if a release needs to ship one, include it only in the release artifact or as a
separate release asset outside Git
- Do not introduce machine-specific home-directory paths, absolute local Markdown links, or generated build artifacts into tracked files
- If a task changes install layout or shipped assets, keep the CMake install rules and license installs aligned with the new behavior
- The installed shared-library payload is runtime-focused; do not start installing vendored upstream public headers unless the package contract intentionally changes
Expand All @@ -232,15 +254,24 @@ Default config path:
Typical model location:

```text
~/.local/share/mutterkey/models/ggml-base.en.bin
~/.local/share/mutterkey/models/<package-id>
```

Current `transcriber.model_path` semantics:

- package directory is the canonical target
- `model.json` manifest path is also accepted
- raw whisper.cpp-compatible `.bin` files are accepted only as a migration
compatibility path

## Agent Workflow

- Read `README.md` first, especially `Overview`, `Quick Start`, `Run As Service`, and `Development`, then read the touched source files before editing
- Prefer targeted changes over speculative cleanup
- If a change grows daemon, tray, or control-plane behavior, prefer extracting or extending repo-owned libraries under `src/app/`, `src/control/`, or other focused modules instead of piling more orchestration into `src/main.cpp`
- Update `README.md` and `config.example.json` when behavior or setup changes
- Update `RELEASE_CHECKLIST.md` too when release-facing model packaging, shipped
assets, or release-bundle guidance changes
- Update `contrib/mutterkey.service` and `contrib/org.mutterkey.mutterkey.desktop` when service/desktop behavior changes
- Update `LICENSE`, `THIRD_PARTY_NOTICES.md`, CMake install rules, and `third_party/whisper.cpp.UPSTREAM.md` when packaging, licensing, or vendored dependency behavior changes
- Keep `README.md`, `AGENTS.md`, and any relevant local skills aligned with the current `scripts/update-whisper.sh` workflow when the vendor-update process changes
Expand All @@ -262,7 +293,9 @@ Typical model location:
- Prefer the `lint` target for a full pre-handoff analyzer pass, and use the individual analyzer targets when iterating on one class of warnings
- Run `bash scripts/run-valgrind.sh "$BUILD_DIR"` before handoff when the task is specifically about memory, ownership, lifetime, shutdown, or release hardening
- Run `bash scripts/check-release-hygiene.sh` before handoff when the task touches publication-facing files or repository metadata
- Remember that the release-hygiene script now also enforces test commentary coverage, so changes to test structure or helper scripts may need both test updates and commentary updates
- Remember that the release-hygiene script now also enforces test commentary
coverage and rejects tracked `.bin` / `.gguf` artifacts, so release-facing or
helper-script changes may need both commentary updates and binary-artifact policy checks
- If `QT_QPA_PLATFORM=offscreen "$BUILD_DIR/mutterkey" diagnose 1` fails in a headless environment after model loading or during KDE/session-dependent startup, note that limitation explicitly rather than assuming the runtime seam or docs-only change regressed behavior
- A headless `diagnose 1` failure after whisper model loading still does not necessarily indicate a streaming-runtime regression; separate runtime-contract changes from KDE/session or headless-environment limits
- Do not leave generated artifacts in the repository tree at the end of the task
Expand Down
10 changes: 10 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,16 @@ set(MUTTERKEY_CORE_SOURCES
src/transcription/transcriptionengine.h
src/transcription/audiochunker.cpp
src/transcription/audiochunker.h
src/transcription/modelcatalog.cpp
src/transcription/modelcatalog.h
src/transcription/modelpackage.cpp
src/transcription/modelpackage.h
src/transcription/modelvalidator.cpp
src/transcription/modelvalidator.h
src/transcription/rawwhisperimporter.cpp
src/transcription/rawwhisperimporter.h
src/transcription/rawwhisperprobe.cpp
src/transcription/rawwhisperprobe.h
src/transcription/transcriptassembler.cpp
src/transcription/transcriptassembler.h
src/transcription/transcriptioncompat.cpp
Expand Down
51 changes: 39 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Build requirements:

Runtime requirements:

1. a local Whisper model file
1. a local Mutterkey model package, or a raw Whisper `.bin` file for migration compatibility
2. a config file at `~/.config/mutterkey/config.json` or a custom `--config` path

Optional developer tooling:
Expand All @@ -81,9 +81,9 @@ Optional developer tooling:
- `valgrind`
- `libc6-dbg` on Debian-family systems so Valgrind Memcheck can start cleanly

The repository vendors `whisper.cpp`, but it does not bundle Whisper model
files. Any model file you download separately may be subject to its own license
or usage terms.
The repository vendors `whisper.cpp`, but it does not bundle speech model
artifacts. Any model file you download separately may be subject to its own
license or usage terms.

If CMake fails before compilation starts, the most common cause is missing Qt 6
development packages for `Core`, `Gui`, `Multimedia`, or KDE Frameworks
Expand Down Expand Up @@ -165,9 +165,30 @@ Notes:
- `MUTTERKEY_ENABLE_WHISPER_BLAS=ON` improves CPU inference speed rather than enabling GPU execution
- these options are forwarded to the vendored `whisper.cpp` / `ggml` build and install any resulting backend libraries alongside Mutterkey

### 2. Put a Whisper model on disk
### 2. Put a model on disk

Example location:
Preferred Phase 4 path:

1. place a raw Whisper `.bin` file somewhere temporary
2. import it into a native Mutterkey package:

```bash
~/.local/bin/mutterkey model import /path/to/ggml-base.en.bin
```

This creates a package directory under:

```text
~/.local/share/mutterkey/models/<package-id>/
```

You can inspect a package or a legacy raw file with:

```bash
~/.local/bin/mutterkey model inspect /path/to/ggml-base.en.bin
```

Legacy compatibility path:

```text
~/.local/share/mutterkey/models/ggml-base.en.bin
Expand All @@ -176,7 +197,7 @@ Example location:
### 3. Create the config file

```bash
mutterkey config init --model-path ~/.local/share/mutterkey/models/ggml-base.en.bin
mutterkey config init --model-path ~/.local/share/mutterkey/models/<package-id>
```

`mutterkey config init` writes the Linux config file to:
Expand Down Expand Up @@ -213,7 +234,7 @@ Minimal example:
"sequence": "F8"
},
"transcriber": {
"model_path": "/absolute/path/to/ggml-base.en.bin",
"model_path": "/absolute/path/to/mutterkey-model-package",
"language": "en",
"translate": false,
"threads": 0,
Expand All @@ -228,6 +249,7 @@ Config notes:

- `transcriber.threads: 0` means auto-detect based on the local machine
- `transcriber.language` accepts a Whisper language code such as `en` or `fi`, or `auto` for language detection
- `transcriber.model_path` may point to a native Mutterkey package directory, a `model.json` manifest, or a legacy raw Whisper `.bin` file
- invalid numeric values fall back to safe defaults and log a warning
- invalid `transcriber.language` values fall back to the default and log a warning
- empty `shortcut.sequence` or `transcriber.model_path` values fall back to defaults and log a warning
Expand Down Expand Up @@ -306,7 +328,8 @@ installed setup looks like:
Useful config commands:

```bash
~/.local/bin/mutterkey config init --model-path ~/.local/share/mutterkey/models/ggml-base.en.bin
~/.local/bin/mutterkey config init --model-path ~/.local/share/mutterkey/models/<package-id>
~/.local/bin/mutterkey model inspect ~/.local/share/mutterkey/models/<package-id>
~/.local/bin/mutterkey config set shortcut.sequence Meta+F8
~/.local/bin/mutterkey config set transcriber.language fi
```
Expand All @@ -329,10 +352,9 @@ journalctl --user -u mutterkey.service -f

Common failures:

`Embedded Whisper model not found: ...`
`Model artifact not found: ...`

- the embedded backend is active
- the configured model path does not exist
- the configured package path, manifest path, or raw compatibility artifact does not exist
- fix `transcriber.model_path`

`Recorder returned no audio`
Expand Down Expand Up @@ -375,6 +397,11 @@ Repository layout:
- `src/transcription/audiochunker.*`: fixed-size normalized streaming chunk generation
- `src/transcription/transcriptassembler.*`: final transcript assembly from streaming events
- `src/transcription/transcriptioncompat.*`: compatibility wrapper from one-shot recordings to the streaming runtime path
- `src/transcription/modelpackage.*`: product-owned manifest and validated package value types
- `src/transcription/modelvalidator.*`: package integrity and compatibility validation
- `src/transcription/modelcatalog.*`: model artifact inspection and resolution
- `src/transcription/rawwhisperprobe.*`: lightweight raw Whisper header inspection
- `src/transcription/rawwhisperimporter.*`: migration path from raw Whisper files to native packages
- `src/transcription/whispercpptranscriber.*`: embedded Whisper integration behind the app-owned runtime seam
- `src/transcription/transcriptionworker.*`: worker object on a dedicated `QThread`
- `src/transcription/transcriptiontypes.h`: runtime diagnostics, normalized-audio, chunk, event, and error value types
Expand Down
53 changes: 51 additions & 2 deletions RELEASE_CHECKLIST.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,10 @@ bash scripts/check-release-hygiene.sh
- Review [THIRD_PARTY_NOTICES.md](THIRD_PARTY_NOTICES.md) for accuracy.
- Review [third_party/whisper.cpp.UPSTREAM.md](third_party/whisper.cpp.UPSTREAM.md)
and make sure the recorded upstream version/ref is current.
- Confirm no Whisper model binaries or other large third-party artifacts are
tracked in the repository.
- Confirm no speech model binaries, native model packages, or other large
third-party artifacts are tracked in the repository source tree.
- If the release is intended to ship a model, treat that as a release-bundle or
release-asset decision, not a Git-tracked source-tree decision.

## Build And Test

Expand Down Expand Up @@ -150,11 +152,58 @@ cmake --install "$BUILD_DIR" --prefix "$INSTALL_DIR"
install rules ship the runtime libraries but intentionally clear vendored
`PUBLIC_HEADER` metadata to avoid upstream header-install warnings.

## Model Packaging For Releases

- Decide explicitly whether the release ships:
- no model at all
- a separate downloadable model package
- a release bundle that includes a model package alongside the binaries
- Keep model artifacts out of Git history even when the release ships one.
The repository source tree should stay free of raw Whisper `.bin` files and
native Mutterkey model packages.
- If you need a model for the release, start from a raw whisper.cpp-compatible
`ggml` `.bin` file and import it into a native Mutterkey package:

```bash
MODEL_SRC="/path/to/ggml-base.en.bin"
MODEL_OUT="$(mktemp -d /tmp/mutterkey-release-model-XXXXXX)/base-en"
"$BUILD_DIR/mutterkey" model import "$MODEL_SRC" --output "$MODEL_OUT"
```

- Inspect the resulting package before shipping it:

```bash
"$BUILD_DIR/mutterkey" model inspect "$MODEL_OUT"
```

- Confirm the package contains at least:
- `model.json`
- `assets/model.bin`
- Review the inspected metadata and make sure the release notes record:
- model family / size
- language profile
- source provenance
- any separate model license or usage terms
- If the release bundle is meant to include a model, add the package directory
to the release artifact outside the Git source tree. Preferred locations are:
- a separate downloadable release asset such as `mutterkey-model-base-en.tar.zst`
- a bundled runtime tree under `share/mutterkey/models/<package-id>/`
- If you include a model in an installable release bundle, validate the final
staged tree after copying the package in:
- the package directory is intact
- `mutterkey model inspect <bundled-package-path>` succeeds
- release notes and packaging docs tell users where `transcriber.model_path`
should point
- Do not commit the raw `.bin` source file, the generated native package, or
any unpacked release-bundle copy back into the repository.

## Documentation And User Flow

- Review [README.md](README.md) for consistency with current behavior.
- Review `docs/mainpage.md` and `docs/Doxyfile.in` if the release touched
repo-owned API docs or docs/CI wiring.
- Confirm the docs describe native Mutterkey model packages as the canonical
artifact and raw Whisper `.bin` files as migration compatibility only.
- Confirm the documented recommended path is still the `systemd --user` service.
- Confirm [contrib/mutterkey.service](contrib/mutterkey.service) matches the
recommended installed-binary setup.
Expand Down
Loading
Loading