ToCount is a lightweight and extensible Python library for estimating token counts from text inputs using both rule-based and machine learning methods. Designed for flexibility, speed, and accuracy, ToCount provides a unified interface for different estimation strategies, making it ideal for tasks like prompt analysis, token budgeting, and optimizing interactions with token-based systems.
| PyPI Counter |
|
| Github Stars |
|
| Branch | main | dev |
| CI |
|
|
| Code Quality |
- Check Python Packaging User Guide
- Run
pip install tocount==0.3
- Download Version 0.3 or Latest Source
- Run
pip install .
| Model Name | MAE | MSE | R² |
|---|---|---|---|
RULE_BASED.UNIVERSAL |
106.70 | 381,647.81 | 0.8175 |
RULE_BASED.GPT_4 |
152.34 | 571,795.89 | 0.7266 |
RULE_BASED.GPT_3_5 |
161.93 | 652,923.59 | 0.6878 |
| Model Name | MAE | MSE | R² |
|---|---|---|---|
TIKTOKEN_R50K.LINEAR_ALL |
71.38 | 183897.01 | 0.8941 |
TIKTOKEN_R50K.LINEAR_ENGLISH |
23.35 | 14127.92 | 0.9887 |
| Model Name | MAE | MSE | R² |
|---|---|---|---|
TIKTOKEN_CL100K.LINEAR_ALL |
41.85 | 47949.48 | 0.9545 |
TIKTOKEN_CL100K.LINEAR_ENGLISH |
21.12 | 17597.20 | 0.9839 |
| Model Name | MAE | MSE | R² |
|---|---|---|---|
TIKTOKEN_O200K.LINEAR_ALL |
25.53 | 20195.32 | 0.9777 |
TIKTOKEN_O200K.LINEAR_ENGLISH |
20.24 | 15887.99 | 0.9859 |
ℹ️ The training and testing dataset is taken from Lmsys-chat-1m [1] and Wildchat [2].
>>> from tocount import estimate_text_tokens, TextEstimator
>>> estimate_text_tokens("How are you?", estimator=TextEstimator.RULE_BASED.UNIVERSAL)
4Just fill an issue and describe it. We'll check it ASAP! or send an email to tocount@openscilab.com.
- Please complete the issue template
You can also join our discord server
1- Zheng, Lianmin, et al. "Lmsys-chat-1m: A large-scale real-world llm conversation dataset." International Conference on Learning Representations (ICLR) 2024 Spotlights.
2- Zhao, Wenting, et al. "Wildchat: 1m chatgpt interaction logs in the wild." International Conference on Learning Representations (ICLR) 2024 Spotlights.
Give a ⭐️ if this project helped you!
If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .
