Skip to content

Latest commit

 

History

History
87 lines (53 loc) · 2.8 KB

README.org

File metadata and controls

87 lines (53 loc) · 2.8 KB

whisper-api: Asynchronous Speech-to-Text for Emacs

whisper-api is a lightweight asynchronous client for OpenAI’s Whisper API.

Installation

Make sure you have ffmpeg installed on your system. For example, on Ubuntu:

$ sudo apt update && sudo apt install ffmpeg

Using straight.el and general.el (if you use them), add this to your configuration:

(use-package whisper-api
  :straight (:host github :repo "ileixe/whisper-api")
  :config
  (setq whisper-api-openai-token "YOUR-API-KEY") ;; or you can uses .authinfo file
  :general
  ;; Global bindings for normal, visual and insert states
  (:states '(normal visual insert)
           "C-c w" 'whisper-api-record-dwim
           "C-c c" 'whisper-api-cancel)
  ;; Bindings for the minibuffer-keymaps so that they also work inside the minibuffer.
  ;; (:keymaps '(minibuffer-local-map minibuffer-local-ns-map)
  ;;          "C-c w" 'whisper-api-record-dwim
  ;;          "C-c c" 'whisper-api-cancel)
  )

Features

  • Non-blocking recording and transcription.
  • Graceful cancellation of recording or pending API requests.
  • Supports reading the API key from the user variable or from authinfo.
  • Customizable ffmpeg command (default uses PulseAudio, but can be adjusted to ALSA or other sources).

Usage

  1. Set your API key, for example in your init file:

    (setq whisper-api-openai-token "sk-...")

    If not set, the package will fallback on .authinfo file:

    machine api.openai.com login apikey password YOUR_API_KEY

  2. To start recording, run:

    M-x whisper-api-record-dwim

    The temporary file path is displayed in the minibuffer.

  3. To stop recording (which sends the audio file to the API asynchronously), run:

    M-x whisper-api-record-dwim (again)

    The transcription is then inserted into your current buffer.

  4. To cancel an active recording or pending API call, run:

    M-x whisper-api-cancel

Customization

  • whisper-api-ffmpeg-command

    The format string used to invoke ffmpeg. By default it is:

    ffmpeg -y -t %d -f pulse -i default -ar 16000 %s

    You can modify this if you use another input device; for instance, if you prefer ALSA use:

    (setq whisper-api-ffmpeg-command "ffmpeg -y -t %d -f alsa -i default -ar 16000 %s")

  • whisper-api-base-url

    The base URL for the Whisper API. By default it is set to OpenAI’s endpoint:

    https://api.openai.com/v1/audio/transcriptions

    To use a local inference endpoint or alternative service, set this value accordingly. For example:

    (setq whisper-api-base-url "http://localhost:8000/my-endpoint")

  • The package prints debug messages in the Messages buffer, which can help diagnose issues with the recording or API call.

Enjoy!