zsh-llm-mode

Quick LLM scratchpad inside your Zsh — ask, get an answer, move on.
Toggle into a prompt where you can draft multi-line queries, send them to an LLM, and stream replies directly in your shell.

Features

Toggle between shell and LLM mode with Ctrl-Q
Multi-line editing in LLM mode (Ctrl-J inserts a newline)
Takes your entered shell command input into a prompt and returns it back
LLM prompt is buffered until you submit it or toggle the tool with a command line input
Cancel with Esc or Ctrl-C
Spinner while waiting for first response
Streamed output from your llm tool
Works with any CLI LLM that supports stdin/stdout (Ollama, sgpt, AWS Q Chat, MLX LM, Google Gemini CLI, OpenAI Codex, etc.)

Quick Demo

Interactive LLM scratchpad for Zsh — toggle into query mode and stream answers inline.

Installation

Clone and source manually:

git clone https://github.com/oleks-dev/zsh-llm-mode.git ~/.zsh-llm-mode
echo "source ~/.zsh-llm-mode/zsh-llm-mode.plugin.zsh" >> ~/.zshrc

Or with a plugin manager:

# zinit
zinit light oleks-dev/zsh-llm-mode

# antidote
antidote bundle oleks-dev/zsh-llm-mode

Setup

Set the backend command before loading the plugin:

export ZSH_LLM_MODE_CMD='ollama run phi4-mini:3.8b'

You can change it any time.

Ollama

export ZSH_LLM_MODE_CMD='ollama run qwen3:8b --think=false --hidethinking'

SGPT

export ZSH_LLM_MODE_CMD='sgpt --model gpt-4'

AWS Q Chat

export ZSH_LLM_MODE_CMD='q chat --no-interactive'

MLX LM

export ZSH_LLM_MODE_CMD='mlx_lm.generate --model /Users/guest/.cache/huggingface/hub/models--mlx-community--Mistral-7B-Instruct-v0.3-4bit/snapshots/a4b8f870474b0eb527f466a03fbc187830d271f5 --prompt -'

Google Gemini

export ZSH_LLM_MODE_CMD='gemini'

OpenAI Codex

export ZSH_LLM_MODE_CMD='codex exec --model gpt-5'

Make sure your backend command reads from stdin and writes to stdout — that’s all the plugin needs. If your LLM cli tool works like echo "hello" | gemini then it can be used without problems.

Streaming setup

By default plugin returns LLM output in full when received. For streaming line by line you need to have stdbuf or gstdbuf which would be detected automatically.

On macOS you can install GNU coreutils and use gstdbuf:

brew install coreutils

Usage

Ctrl-Q → toggle LLM mode
Ctrl-J → insert newline in LLM prompt
Enter → send query
Esc / Ctrl-C → cancel

Example:

LLM> (Enter=send, ^Q=switch, ^J=new line, Esc=cancel)
➜ ls -al what does this command do?
[LLM query]: ls -al what does this command do?
...

Keybinding

By default, toggle LLM mode with Ctrl-Q (mnemonic: Q for Question).

If Ctrl-Q is taken on your system (e.g. flow control), you can either:

Disable flow control:

stty -ixon

Or rebind to another key (example: Ctrl+G):

bindkey '^G' llm-toggle

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.gif		demo.gif
zsh-llm-mode.plugin.zsh		zsh-llm-mode.plugin.zsh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

zsh-llm-mode

Features

Quick Demo

Installation

Setup

Ollama

SGPT

AWS Q Chat

MLX LM

Google Gemini

OpenAI Codex

Streaming setup

Usage

Keybinding

License

About

Uh oh!

Languages

License

oleks-dev/zsh-llm-mode

Folders and files

Latest commit

History

Repository files navigation

zsh-llm-mode

Features

Quick Demo

Installation

Setup

Ollama

SGPT

AWS Q Chat

MLX LM

Google Gemini

OpenAI Codex

Streaming setup

Usage

Keybinding

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages