Skip to content

kkew3/token-count

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Token Count

Token Count is a command-line utility that counts the number of tokens in a text string or files similar to the Unix wc utility. It uses the OpenAI tiktoken library for tokenization and is compatible with GPT-3.5-turbo or any other OpenAI model token counts.

Installation

To install Token Count, run the following command in your terminal:

pip install 'git+https://github.com/kkew3/token-count.git'

You may also install as an executable using pipx or uv:

uv tool install 'git+https://github.com/kkew3/token-count.git'

Usage - Python Library

from token_count import TokenCount
tc = TokenCount(model_name="gpt-3.5-turbo")
text = "Your text here"
tokens = tc.num_tokens_from_string(text)
print(f"Tokens in the string: {tokens}")

file_path = "path/to/your/file.txt"
tokens = tc.num_tokens_from_file(file_path)
print(f"Tokens in the file: {tokens}")

Usage - Command Line

Examples:

Count tokens in a text string:

echo -n "Your text here" | token-count

Count tokens in a file:

token-count path/to/your/file.txt

Count tokens in all python files:

find . -type f -name '*.py' -not -path './venv/*' -print0 \
    | token-count --files-from=- --null

Additionally, you can provide any OpenAI model(gpt-4) to get token count according to the model. By default it uses "gpt-3.5-turbo".

echo -n "Your text here" | token-count --model_name "gpt-4"

License

This project is licensed under the MIT License.

About

Like wc(1) but count tokens in terms of OpenAI LLMs.

Topics

Resources

License

Stars

Watchers

Forks

Languages

  • Python 100.0%