Skip to content

Quick LLM scratchpad for Zsh. Toggle into a query mode inside your terminal, send one-off prompts to an LLM backend (e.g. Ollama, Q Chat, Gemini, Codex, MLX LM), and stream responses inline — without leaving your shell.

License

Notifications You must be signed in to change notification settings

oleks-dev/zsh-llm-mode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zsh-llm-mode

Quick LLM scratchpad inside your Zsh — ask, get an answer, move on.
Toggle into a prompt where you can draft multi-line queries, send them to an LLM, and stream replies directly in your shell.


Features

  • Toggle between shell and LLM mode with Ctrl-Q
  • Multi-line editing in LLM mode (Ctrl-J inserts a newline)
  • Takes your entered shell command input into a prompt and returns it back
  • LLM prompt is buffered until you submit it or toggle the tool with a command line input
  • Cancel with Esc or Ctrl-C
  • Spinner while waiting for first response
  • Streamed output from your llm tool
  • Works with any CLI LLM that supports stdin/stdout (Ollama, sgpt, AWS Q Chat, MLX LM, Google Gemini CLI, OpenAI Codex, etc.)

Quick Demo

Interactive LLM scratchpad for Zsh — toggle into query mode and stream answers inline. Demo


Installation

Clone and source manually:

git clone https://github.com/oleks-dev/zsh-llm-mode.git ~/.zsh-llm-mode
echo "source ~/.zsh-llm-mode/zsh-llm-mode.plugin.zsh" >> ~/.zshrc

Or with a plugin manager:

# zinit
zinit light oleks-dev/zsh-llm-mode

# antidote
antidote bundle oleks-dev/zsh-llm-mode

Setup

Set the backend command before loading the plugin:

export ZSH_LLM_MODE_CMD='ollama run phi4-mini:3.8b'

You can change it any time.

Ollama

export ZSH_LLM_MODE_CMD='ollama run qwen3:8b --think=false --hidethinking'

SGPT

export ZSH_LLM_MODE_CMD='sgpt --model gpt-4'

AWS Q Chat

export ZSH_LLM_MODE_CMD='q chat --no-interactive'

MLX LM

export ZSH_LLM_MODE_CMD='mlx_lm.generate --model /Users/guest/.cache/huggingface/hub/models--mlx-community--Mistral-7B-Instruct-v0.3-4bit/snapshots/a4b8f870474b0eb527f466a03fbc187830d271f5 --prompt -'

Google Gemini

export ZSH_LLM_MODE_CMD='gemini'

OpenAI Codex

export ZSH_LLM_MODE_CMD='codex exec --model gpt-5'

Make sure your backend command reads from stdin and writes to stdout — that’s all the plugin needs. If your LLM cli tool works like echo "hello" | gemini then it can be used without problems.

Streaming setup

By default plugin returns LLM output in full when received. For streaming line by line you need to have stdbuf or gstdbuf which would be detected automatically.

On macOS you can install GNU coreutils and use gstdbuf:

brew install coreutils

Usage

  • Ctrl-Q → toggle LLM mode
  • Ctrl-J → insert newline in LLM prompt
  • Enter → send query
  • Esc / Ctrl-C → cancel

Example:

LLM> (Enter=send, ^Q=switch, ^J=new line, Esc=cancel)
➜ ls -al what does this command do?
[LLM query]: ls -al what does this command do?
...

Keybinding

By default, toggle LLM mode with Ctrl-Q (mnemonic: Q for Question).

If Ctrl-Q is taken on your system (e.g. flow control), you can either:

  1. Disable flow control:
stty -ixon
  1. Or rebind to another key (example: Ctrl+G):
bindkey '^G' llm-toggle

License

MIT

About

Quick LLM scratchpad for Zsh. Toggle into a query mode inside your terminal, send one-off prompts to an LLM backend (e.g. Ollama, Q Chat, Gemini, Codex, MLX LM), and stream responses inline — without leaving your shell.

Topics

Resources

License

Stars

Watchers

Forks

Languages