Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ This repository is intentionally kept minimal:
- `src/clipboardwriter.*`: clipboard integration, preferring KDE system clipboard support
- `src/audio/recordingnormalizer.*`: conversion to Whisper-ready mono `float32` at `16 kHz`
- `src/transcription/whispercpptranscriber.*`: in-process Whisper integration
- `src/transcription/transcriptionengine.*`: app-owned engine/session seam for backend selection and future runtime evolution
- `src/transcription/transcriptionworker.*`: worker object hosted on a dedicated `QThread`
- `src/transcription/transcriptiontypes.h`: normalized audio and transcription result value types
- `src/config.*`: JSON config loading and defaults
Expand Down Expand Up @@ -136,6 +137,7 @@ Notes:
- When validating inside a restricted sandbox, be ready to disable `ccache` with `CCACHE_DISABLE=1` if the cache location is read-only; that is an execution-environment issue, not a Mutterkey build failure
- Prefer fixing the code over weakening `.clang-tidy` or the Clazy check set; only relax tool config when the warning is clearly low-value for this repo
- In this Qt-heavy repo, treat `misc-include-cleaner` and `readability-redundant-access-specifiers` as low-value `clang-tidy` noise unless the underlying tool behavior improves; they conflict with Qt header-provider reality and `signals` / `slots` / `Q_SLOTS` sectioning more than they improve safety
- Prefer anonymous-namespace `Q_LOGGING_CATEGORY` for file-local logging categories; `Q_STATIC_LOGGING_CATEGORY` is not portable enough across the Qt versions this repo may build against
- Do not add broad Valgrind suppressions by default; only add narrow suppressions after reproducing stable third-party noise and keep them clearly scoped
- When adding tests, prefer small `Qt Test` cases that run headlessly under `CTest` and avoid microphone, clipboard, or KDE session dependencies unless the task is specifically integration-focused
- For tool-driven cleanups, preserve the existing design and behavior; do not perform broad rewrites just to satisfy style-oriented recommendations
Expand All @@ -146,14 +148,16 @@ Notes:
- Stay within the existing style and structure; do not reformat unrelated code
- Prefer small, direct classes over adding abstraction layers without a concrete need
- Keep Qt usage idiomatic: `QObject` ownership, signal/slot wiring, and `QThread` boundaries should remain explicit
- Prefer `Q_STATIC_LOGGING_CATEGORY` for translation-unit-local logging categories instead of global `Q_LOGGING_CATEGORY` declarations when no cross-file declaration is needed
- Prefer anonymous-namespace `Q_LOGGING_CATEGORY` for translation-unit-local logging categories when no cross-file declaration is needed; keep the pattern compatible with older Qt builds used in CI
- When refactoring Qt class declarations, remember that `moc` still cares about section structure: keep explicit `signals`, `slots`, `Q_SLOTS`, and access sections valid for Qt even if a generic style check suggests flattening them
- Prefer explicit validation and safe fallback behavior for config-driven runtime values
- Avoid introducing optional backends, plugin systems, or cross-platform abstractions unless the task requires them
- Keep the audio path explicit: recorder output may not already match Whisper input requirements, so preserve normalization behavior
- Prefer narrow shared value types across subsystems; for example, consumers that only need captured audio should include `src/audio/recording.h`, not the full recorder class
- Keep JSON and other transport details at subsystem boundaries; prefer typed C++ snapshots/results once data crosses into app-owned control, tray, or service code
- Prefer dependency injection for tray-shell and control-surface code from the first implementation so headless Qt tests stay simple
- When preparing the transcription path for future runtime work, prefer app-owned engine/session seams and injected sessions over leaking concrete backend types into CLI, service, or worker orchestration
- Prefer product-owned runtime interfaces, model/session separation, and deterministic backend selection before adding new inference backends or widening cross-platform support
- Preserve the current product direction: embedded `whisper.cpp`, KDE-first, CLI/service-first

## C++ Core Guidelines Priorities
Expand Down Expand Up @@ -225,9 +229,11 @@ Typical model location:
- Update `LICENSE`, `THIRD_PARTY_NOTICES.md`, CMake install rules, and `third_party/whisper.cpp.UPSTREAM.md` when packaging, licensing, or vendored dependency behavior changes
- Keep `README.md`, `AGENTS.md`, and any relevant local skills aligned with the current `scripts/update-whisper.sh` workflow when the vendor-update process changes
- Store upcoming feature plans in `next_feature/` as Markdown files, and update the existing plan there when refining the same upcoming feature instead of scattering notes across the repo
- Keep architecture-evolution plans grounded in incremental slices that preserve the current shipped `whisper.cpp` path while moving ownership of interfaces, tests, and packaging toward repo-owned code
- Treat `mutterkey-tray` as a shipped artifact once it is installed or validated in CI; keep install rules, README/setup notes, release checklist items, and workflow checks aligned with that status
- Verify with a fresh CMake build when the change affects compilation or linkage
- Run `ctest` when touching covered code in `src/config.*` or `src/audio/recordingnormalizer.*`, and extend the deterministic headless tests when practical
- When touching transcription orchestration or backend seams, prefer small headless tests with fake/injected sessions over model-dependent integration tests
- When adding or fixing Qt GUI tests, make the `CTest` registration itself headless with `QT_QPA_PLATFORM=offscreen` so CI does not try to load `xcb`
- Prefer expanding tests around pure parsing, value normalization, and other environment-independent logic before adding KDE-session or device-heavy coverage
- Use `-DMUTTERKEY_ENABLE_ASAN=ON` and `-DMUTTERKEY_ENABLE_UBSAN=ON` for fast iteration on memory and UB bugs, and use the repo-owned Valgrind lane as the slower release-focused confirmation step
Expand Down
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ set(MUTTERKEY_CORE_SOURCES
src/service.cpp
src/service.h
src/transcription/transcriptiontypes.h
src/transcription/transcriptionengine.cpp
src/transcription/transcriptionengine.h
src/transcription/transcriptionworker.cpp
src/transcription/transcriptionworker.h
src/transcription/whispercpptranscriber.cpp
Expand Down
13 changes: 7 additions & 6 deletions src/app/applicationcommands.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
#include "clipboardwriter.h"
#include "control/daemoncontrolserver.h"
#include "service.h"
#include "transcription/transcriptionengine.h"
#include "transcription/transcriptiontypes.h"
#include "transcription/whispercpptranscriber.h"

#include <QClipboard>
#include <QCoreApplication>
Expand Down Expand Up @@ -59,18 +59,19 @@ int runDaemon(QGuiApplication &app, const AppConfig &config, const QString &conf
int runOnce(QGuiApplication &app, const AppConfig &config, double seconds)
{
AudioRecorder recorder(config.audio);
WhisperCppTranscriber transcriber(config.transcriber);
const std::unique_ptr<TranscriptionEngine> transcriptionEngine = createTranscriptionEngine(config.transcriber);
std::unique_ptr<TranscriptionSession> transcriber = transcriptionEngine->createSession();
ClipboardWriter clipboardWriter(QGuiApplication::clipboard());

if (config.transcriber.warmupOnStart) {
QString warmupError;
if (!transcriber.warmup(&warmupError)) {
if (!transcriber->warmup(&warmupError)) {
qCCritical(appLog) << "Failed to warm up transcriber:" << warmupError;
return 1;
}
}

QTimer::singleShot(0, &app, [&app, &recorder, &transcriber, &clipboardWriter, seconds]() {
QTimer::singleShot(0, &app, [&app, &recorder, transcriber = transcriber.get(), &clipboardWriter, seconds]() {
QString errorMessage;
if (!recorder.start(&errorMessage)) {
qCCritical(appLog) << "Failed to start one-shot recording:" << errorMessage;
Expand All @@ -79,15 +80,15 @@ int runOnce(QGuiApplication &app, const AppConfig &config, double seconds)
}

qCInfo(appLog) << "Recording for" << seconds << "seconds";
QTimer::singleShot(static_cast<int>(seconds * 1000), &app, [&app, &recorder, &transcriber, &clipboardWriter]() {
QTimer::singleShot(static_cast<int>(seconds * 1000), &app, [&app, &recorder, transcriber, &clipboardWriter]() {
const Recording recording = recorder.stop();
if (!recording.isValid()) {
qCCritical(appLog) << "Recorder returned no audio";
QGuiApplication::exit(1);
return;
}

const TranscriptionResult result = transcriber.transcribe(recording);
const TranscriptionResult result = transcriber->transcribe(recording);
if (!result.success) {
qCCritical(appLog) << "One-shot transcription failed:" << result.error;
QGuiApplication::exit(1);
Expand Down
37 changes: 37 additions & 0 deletions src/transcription/transcriptionengine.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#include "transcription/transcriptionengine.h"

#include "transcription/whispercpptranscriber.h"

#include <memory>
#include <utility>

namespace {

class WhisperCppTranscriptionEngine final : public TranscriptionEngine
{
public:
explicit WhisperCppTranscriptionEngine(TranscriberConfig config)
: m_config(std::move(config))
{
}

[[nodiscard]] QString backendName() const override
{
return WhisperCppTranscriber::backendNameStatic();
}

[[nodiscard]] std::unique_ptr<TranscriptionSession> createSession() const override
{
return std::make_unique<WhisperCppTranscriber>(m_config);
}

private:
TranscriberConfig m_config;
};

} // namespace

std::unique_ptr<TranscriptionEngine> createTranscriptionEngine(const TranscriberConfig &config)
{
return std::make_unique<WhisperCppTranscriptionEngine>(config);
}
90 changes: 90 additions & 0 deletions src/transcription/transcriptionengine.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#pragma once

#include "config.h"
#include "transcription/transcriptiontypes.h"

#include <memory>

struct Recording;

/**
* @file
* @brief Stable engine/session boundary for embedded transcription backends.
*/

/**
* @brief Mutable per-session transcription interface.
*
* Sessions own backend state that may be warmed up, reused, and kept isolated
* per thread or request flow.
*/
class TranscriptionSession
{
public:
virtual ~TranscriptionSession() = default;
TranscriptionSession(const TranscriptionSession &) = delete;
TranscriptionSession &operator=(const TranscriptionSession &) = delete;
TranscriptionSession(TranscriptionSession &&) = delete;
TranscriptionSession &operator=(TranscriptionSession &&) = delete;

/**
* @brief Returns the backend identifier for this live session.
* @return Short backend name used for logs and diagnostics.
*/
[[nodiscard]] virtual QString backendName() const = 0;

/**
* @brief Performs optional backend warmup for this session.
* @param errorMessage Optional destination for a human-readable failure reason.
* @return `true` if the session is ready for transcription, otherwise `false`.
*/
virtual bool warmup(QString *errorMessage = nullptr) = 0;

/**
* @brief Transcribes a single captured recording.
* @param recording Captured audio payload to normalize and transcribe.
* @return Normalized transcription result for the provided recording.
*/
[[nodiscard]] virtual TranscriptionResult transcribe(const Recording &recording) = 0;

protected:
TranscriptionSession() = default;
};

/**
* @brief Immutable engine configuration that creates backend sessions.
*
* The engine boundary keeps future backend selection and model-loading policy
* out of the app/service orchestration layers.
*/
class TranscriptionEngine
{
public:
virtual ~TranscriptionEngine() = default;
TranscriptionEngine(const TranscriptionEngine &) = delete;
TranscriptionEngine &operator=(const TranscriptionEngine &) = delete;
TranscriptionEngine(TranscriptionEngine &&) = delete;
TranscriptionEngine &operator=(TranscriptionEngine &&) = delete;

/**
* @brief Returns the backend identifier for sessions created by this engine.
* @return Short backend name used for logs and diagnostics.
*/
[[nodiscard]] virtual QString backendName() const = 0;

/**
* @brief Creates a new isolated transcription session.
* @return Newly constructed session that owns its backend state.
*/
[[nodiscard]] virtual std::unique_ptr<TranscriptionSession> createSession() const = 0;

protected:
TranscriptionEngine() = default;
};

/**
* @brief Creates the configured embedded transcription engine.
* @param config Backend configuration copied into the engine.
* @return Engine suitable for creating isolated transcription sessions.
*/
[[nodiscard]] std::unique_ptr<TranscriptionEngine> createTranscriptionEngine(const TranscriberConfig &config);
17 changes: 12 additions & 5 deletions src/transcription/transcriptionworker.cpp
Original file line number Diff line number Diff line change
@@ -1,26 +1,33 @@
#include "transcription/transcriptionworker.h"

#include <cassert>
#include <utility>

TranscriptionWorker::TranscriptionWorker(TranscriberConfig config, QObject *parent)
TranscriptionWorker::TranscriptionWorker(const TranscriberConfig &config, QObject *parent)
: TranscriptionWorker(createTranscriptionEngine(config)->createSession(), parent)
{
}

TranscriptionWorker::TranscriptionWorker(std::unique_ptr<TranscriptionSession> transcriber, QObject *parent)
: QObject(parent)
, m_transcriber(std::move(config))
, m_transcriber(std::move(transcriber))
{
assert(m_transcriber != nullptr);
}

QString TranscriptionWorker::backendName() const
{
return WhisperCppTranscriber::backendName();
return m_transcriber->backendName();
}

bool TranscriptionWorker::warmup(QString *errorMessage)
{
return m_transcriber.warmup(errorMessage);
return m_transcriber->warmup(errorMessage);
}

void TranscriptionWorker::transcribe(const Recording &recording)
{
const TranscriptionResult result = m_transcriber.transcribe(recording);
const TranscriptionResult result = m_transcriber->transcribe(recording);
if (!result.success) {
emit transcriptionFailed(result.error);
return;
Expand Down
13 changes: 10 additions & 3 deletions src/transcription/transcriptionworker.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@

#include "audio/recording.h"
#include "config.h"
#include "transcription/whispercpptranscriber.h"
#include "transcription/transcriptionengine.h"

#include <QObject>
#include <memory>

/**
* @file
Expand All @@ -28,7 +29,13 @@ class TranscriptionWorker final : public QObject
* @param config Transcriber settings copied into the owned backend.
* @param parent Optional QObject parent.
*/
explicit TranscriptionWorker(TranscriberConfig config, QObject *parent = nullptr);
explicit TranscriptionWorker(const TranscriberConfig &config, QObject *parent = nullptr);
/**
* @brief Creates a worker around an already-constructed session.
* @param transcriber Owned session implementation.
* @param parent Optional QObject parent.
*/
explicit TranscriptionWorker(std::unique_ptr<TranscriptionSession> transcriber, QObject *parent = nullptr);
~TranscriptionWorker() override = default;

Q_DISABLE_COPY_MOVE(TranscriptionWorker)
Expand Down Expand Up @@ -67,5 +74,5 @@ class TranscriptionWorker final : public QObject

private:
/// Owned transcription backend implementation.
WhisperCppTranscriber m_transcriber;
std::unique_ptr<TranscriptionSession> m_transcriber;
};
7 changes: 6 additions & 1 deletion src/transcription/whispercpptranscriber.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,16 @@ void WhisperCppTranscriber::freeContext(whisper_context *context) noexcept
}
}

QString WhisperCppTranscriber::backendName()
QString WhisperCppTranscriber::backendNameStatic()
{
return QStringLiteral("whisper.cpp");
}

QString WhisperCppTranscriber::backendName() const
{
return backendNameStatic();
}

bool WhisperCppTranscriber::warmup(QString *errorMessage)
{
return ensureInitialized(errorMessage);
Expand Down
12 changes: 7 additions & 5 deletions src/transcription/whispercpptranscriber.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

#include "audio/recordingnormalizer.h"
#include "config.h"
#include "transcription/transcriptionengine.h"
#include "transcription/transcriptiontypes.h"

#include <memory>
Expand All @@ -20,7 +21,7 @@ struct whisper_context;
* RAII-managed smart pointer so service shutdown and worker teardown stay
* deterministic.
*/
class WhisperCppTranscriber final
class WhisperCppTranscriber final : public TranscriptionSession
{
public:
/**
Expand All @@ -32,7 +33,7 @@ class WhisperCppTranscriber final
/**
* @brief Releases the owned whisper.cpp context.
*/
~WhisperCppTranscriber();
~WhisperCppTranscriber() override;

WhisperCppTranscriber(const WhisperCppTranscriber &) = delete;
WhisperCppTranscriber &operator=(const WhisperCppTranscriber &) = delete;
Expand All @@ -43,21 +44,22 @@ class WhisperCppTranscriber final
* @brief Returns the backend name used in diagnostics.
* @return Human-readable backend identifier.
*/
[[nodiscard]] static QString backendName();
[[nodiscard]] static QString backendNameStatic();
[[nodiscard]] QString backendName() const override;

/**
* @brief Eagerly initializes the whisper.cpp context.
* @param errorMessage Optional output for initialization failures.
* @return `true` when the backend is ready for transcription.
*/
bool warmup(QString *errorMessage = nullptr);
bool warmup(QString *errorMessage = nullptr) override;

/**
* @brief Normalizes and transcribes one captured recording.
* @param recording Captured audio payload and format metadata.
* @return Structured transcription result.
*/
[[nodiscard]] TranscriptionResult transcribe(const Recording &recording);
[[nodiscard]] TranscriptionResult transcribe(const Recording &recording) override;

private:
/**
Expand Down
Loading
Loading