Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: estimated prompt tokens are not equal to api response #405

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aiperon
Copy link

@aiperon aiperon commented Aug 18, 2023

I found an issue with the prompt count calculation for GPT-3.5.
On every call to __count_tokens() function, the returned value differs from the value returned by the API.
The difference is always equal to the number of messages in the conversation.
The most possible reason for that is that the condition if key == "name": never evaluates to True.
It looks like the condition should be replaced with key == "role". I know, it comes from the openai-cookbook article, but my assumption the things are changed.

In addition, for the model GPT-4 the calculated count is always equal to the value returned by the API.
So, we should not add tokens_per_name for this model.

This PR fixes prompt count calculation for GPT-3.5.

Detailed examples (with STREAM=false param) for 3.5 and 4 models before the fix are below . After this fix the values of calculated total and API response are equal.

gpt-3.5-turbo-0613

Chat msg Conversation msg After 1st msg After 2nd msg
First question { role: "system", 1 1
content: "You are a helpful assistant."} 6 6
{ role: "user", 1 1
content: "What's your name" } 4 4
Response { role: "assistant", 1
content: "I am a helpful digital assistant and don't have a personal name. You can just call me "Assistant". How can I assist you today?" } 29
Second question { role: "user", 1
content: "Is it OK?" } 4
Temporary total 12 47
Msg count 2 4
Per msg tokens 4 4
Per msg * msg count 8 16
Static add-on 3 3
Calculated total 23 66
prompt_tokens from API response 21 62

gpt-4-0613

Chat msg Conversation msg After 1st msg After 2nd msg
First question { role: "system", 1 1
content: "You are a helpful assistant."} 6 6
{ role: "user", 1 1
content: "What's your name" } 4 4
Response { role: "assistant", 1
content: "I'm OpenAI, a virtual assistant here to help answer your questions and provide information. " } 18
Second question { role: "user", 1
content: "Is it OK?" } 4
Temporary total 12 36
Msg count 2 4
Per msg tokens 3 3
Per msg * msg count 6 12
Static add-on 3 3
Calculated total 21 51
prompt_tokens from API response 21 51

@n3d1117
Copy link
Owner

n3d1117 commented Sep 11, 2023

Thanks @aiperon. I wonder if it would be worth opening an issue in the openai-cookbook repo?

@aiperon
Copy link
Author

aiperon commented Sep 11, 2023

Thanks @aiperon. I wonder if it would be worth opening an issue in the openai-cookbook repo?

I believe it would so.
If you approve my changes, I can create the pull request in their repo.

@n3d1117
Copy link
Owner

n3d1117 commented Sep 13, 2023

@aiperon Sorry, I'm a bit short on free time at the moment, so I won't be able to test your changes quickly. In the meantime please go ahead and open the PR in their repo! I'll keep this PR open for further updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants