GitHub - jpreagan/llmnop: A tool for measuring LLM performance metrics.

Installation | Usage

llmnop is a command-line tool for benchmarking the performance of Large Language Models (LLM) inference endpoints that are compatible with the OpenAI API. It measures key performance metrics like time to first token (TTFT), inter-token latency, and overall throughput under concurrent loads.

Features

Realistic Workload Simulation: Generates prompts with variable input and output token lengths based on a normal distribution.
Concurrent Benchmarking: Send multiple requests in parallel to simulate real-world load.
Detailed Performance Metrics:
- Time To First Token (TTFT)
- Inter-Token Latency (average time between subsequent tokens)
- Throughput (tokens/second)
- End-to-end Request Latency
Detailed JSON Output: Saves detailed per-request data and a final summary report.
Tokenizer-Aware: Uses Hugging Face tokenizers to count tokens for prompt generation and metric calculation.

Installation

Quickstart (recommended)
```
curl -sSfL https://github.com/jpreagan/llmnop/releases/latest/download/llmnop-installer.sh | sh
```
The installer places llmnop in $XDG_BIN_HOME or ~/.local/bin. Ensure that directory is on your PATH before running llmnop.
Homebrew
```
brew install jpreagan/tap/llmnop
```

Build from source

git clone https://github.com/jpreagan/llmnop.git
cd llmnop
cargo build --release

Usage

llmnop [OPTIONS] --model <MODEL>

Options

-m, --model <MODEL>
    --tokenizer <TOKENIZER>
    --max-num-completed-requests <MAX_NUM_COMPLETED_REQUESTS>  [default: 2]
    --num-concurrent-requests <NUM_CONCURRENT_REQUESTS>        [default: 1]
    --mean-input-tokens <MEAN_INPUT_TOKENS>                    [default: 550]
    --stddev-input-tokens <STDDEV_INPUT_TOKENS>                [default: 150]
    --mean-output-tokens <MEAN_OUTPUT_TOKENS>                  [default: 150]
    --stddev-output-tokens <STDDEV_OUTPUT_TOKENS>              [default: 10]
    --results-dir <RESULTS_DIR>                                [default: result_outputs]
    --timeout <TIMEOUT>                                        [default: 600]
    --no-progress                                              Disable the progress bar (useful for non-interactive environments)
-h, --help                                                     Print help
-V, --version                                                  Print version

Tokenizer

By default, llmnop uses the model name as the tokenizer for token counting.

Use --tokenizer when the served model name doesn't match a Hugging Face tokenizer name, or when you want a different tokenizer for counting.

Examples:

# Served name differs from HF tokenizer name
llmnop --model gpt-oss:20b --tokenizer openai/gpt-oss-20b

# Force a common tokenizer for consistent counting
llmnop --model openai/gpt-oss-20b --tokenizer hf-internal-testing/llama-tokenizer

Example

export OPENAI_API_BASE=http://localhost:8000/v1
export OPENAI_API_KEY=token-abc123

llmnop \
    --model "Qwen/Qwen3-4B" \
    --mean-input-tokens 550 \
    --stddev-input-tokens 150 \
    --mean-output-tokens 150 \
    --stddev-output-tokens 10 \
    --max-num-completed-requests 2 \
    --timeout 600 \
    --num-concurrent-requests 1 \
    --results-dir "result_outputs"

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
assets		assets
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
dist-workspace.toml		dist-workspace.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features

Installation

Usage

Options

Tokenizer

Example

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Languages

License

jpreagan/llmnop

Folders and files

Latest commit

History

Repository files navigation

Features

Installation

Usage

Options

Tokenizer

Example

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Languages

Packages