Perplexity Calculator

A command-line tool to locally calculate the perplexity (PPL) of a given text using a specified language model.

...perplexity is a measure of uncertainty in the value of a sample from a discrete probability distribution. The larger the perplexity, the less likely it is that an observer can guess the value which will be drawn from the distribution.

This is not to be confused with Perplexity, the search engine product.

This repo largely follows the code provided on the excellent HuggingFace documentation on perplexity.

Supports cuda, mlx (mac m-series) and cpu inference on recurrent llms (Llama, Mistral, etc) and encoder-decoder LLMs (BERT).

Coming Soon

Non-HF hosted models (OpenAI, Anthropic, Gemmini-series)

Masked LLMs

Installation

Clone this repository:

git clone https://github.com/cezarc1/perplexity
cd perplexity

(Optional) If you plan to use the shell script, ensure you have uv installed. If not, the script will prompt you to install it if it's not found. See here for more info on uv.

Usage

Option 1: Running as a Shell Script

Make the script executable:
```
chmod +x calculate_perplexity.sh
```

Run the script with text:

./calculate_perplexity.sh --model_id "google/gemma-2-2b-it" \
  --text "It's simple: Overspecialize, and you breed in weakness. It's slow death."

Or with a text file:

./calculate_perplexity.sh --model_id "google/gemma-2-2b-it" \
  --text_file "path/to/your/text_file.txt"

Option 2a: Running as a Python Script

Run the Python script directly with uv:

uv run --with-requirements requirements.txt calculate_perplexity.py \
  --model_id "google/gemma-2-2b-it" \
  --text "It's simple: Overspecialize, and you breed in weakness. It's slow death."

Or with a text file:

uv run --with-requirements requirements.txt calculate_perplexity.py \
  --model_id "google/gemma-2-2b-it" \
  --text_file "path/to/your/text_file.txt"

Option 2b: Running as a Python Script (venv)

python -m venv .venv

source .venv/bin/activate

pip install -r requirements.txt

python calculate_perplexity.py --model_id "google/gemma-2-2b-it" \
  --text "It's simple: Overspecialize, and you breed in weakness. It's slow death."

Or with a text file:

python calculate_perplexity.py --model_id "google/gemma-2-2b-it" \
  --text_file "path/to/your/text_file.txt"

Arguments

--model_id: The ID of the model to use (e.g., "meta-llama/Meta-Llama-3-8B")
--model_type: The type of model to use (choices: "recurrent", "encoder_decoder", "masked")
--text: The text to calculate perplexity on
--text_file: Path to a text file to calculate perplexity on
--stride (optional): The stride length to use for calculating perplexity (default: 512)

Note: You must provide either --text or --text_file, but not both.

Notes

The shell script version uses uv to manage dependencies and run the Python script.
The Python script version requires you to manually install the dependencies listed in requirements.txt.
Make sure you have sufficient permissions to download and use the specified model on HuggingFace.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
calculate_perplexity.py		calculate_perplexity.py
calculate_perplexity.sh		calculate_perplexity.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Perplexity Calculator

Coming Soon

Installation

Usage

Option 1: Running as a Shell Script

Option 2a: Running as a Python Script

Option 2b: Running as a Python Script (venv)

Arguments

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

cezarc1/perplexity

Folders and files

Latest commit

History

Repository files navigation

Perplexity Calculator

Coming Soon

Installation

Usage

Option 1: Running as a Shell Script

Option 2a: Running as a Python Script

Option 2b: Running as a Python Script (venv)

Arguments

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages