fix: estimated prompt tokens are not equal to api response #405

aiperon · 2023-08-18T08:39:23Z

I found an issue with the prompt count calculation for GPT-3.5.
On every call to __count_tokens() function, the returned value differs from the value returned by the API.
The difference is always equal to the number of messages in the conversation.
The most possible reason for that is that the condition if key == "name": never evaluates to True.
It looks like the condition should be replaced with key == "role". I know, it comes from the openai-cookbook article, but my assumption the things are changed.

In addition, for the model GPT-4 the calculated count is always equal to the value returned by the API.
So, we should not add tokens_per_name for this model.

This PR fixes prompt count calculation for GPT-3.5.

Detailed examples (with STREAM=false param) for 3.5 and 4 models before the fix are below . After this fix the values of calculated total and API response are equal.

gpt-3.5-turbo-0613

Chat msg	Conversation msg	After 1st msg	After 2nd msg
First question	{ role: "system",	1	1
	content: "You are a helpful assistant."}	6	6
	{ role: "user",	1	1
	content: "What's your name" }	4	4
Response	{ role: "assistant",		1
	content: "I am a helpful digital assistant and don't have a personal name. You can just call me "Assistant". How can I assist you today?" }		29
Second question	{ role: "user",		1
	content: "Is it OK?" }		4

Temporary total		12	47
Msg count		2	4
Per msg tokens		4	4
Per msg * msg count		8	16
Static add-on		3	3
Calculated total		23	66
prompt_tokens from API response		21	62

gpt-4-0613

Chat msg	Conversation msg	After 1st msg	After 2nd msg
First question	{ role: "system",	1	1
	content: "You are a helpful assistant."}	6	6
	{ role: "user",	1	1
	content: "What's your name" }	4	4
Response	{ role: "assistant",		1
	content: "I'm OpenAI, a virtual assistant here to help answer your questions and provide information. " }		18
Second question	{ role: "user",		1
	content: "Is it OK?" }		4

Temporary total		12	36
Msg count		2	4
Per msg tokens		3	3
Per msg * msg count		6	12
Static add-on		3	3
Calculated total		21	51
prompt_tokens from API response		21	51

n3d1117 · 2023-09-11T20:07:42Z

Thanks @aiperon. I wonder if it would be worth opening an issue in the openai-cookbook repo?

aiperon · 2023-09-11T20:36:43Z

Thanks @aiperon. I wonder if it would be worth opening an issue in the openai-cookbook repo?

I believe it would so.
If you approve my changes, I can create the pull request in their repo.

n3d1117 · 2023-09-13T21:41:35Z

@aiperon Sorry, I'm a bit short on free time at the moment, so I won't be able to test your changes quickly. In the meantime please go ahead and open the PR in their repo! I'll keep this PR open for further updates

fix: estimated prompt tokens are not equal to api response

56cecc0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: estimated prompt tokens are not equal to api response #405

fix: estimated prompt tokens are not equal to api response #405

aiperon commented Aug 18, 2023 •

edited

Loading

n3d1117 commented Sep 11, 2023

aiperon commented Sep 11, 2023 •

edited

Loading

n3d1117 commented Sep 13, 2023 •

edited

Loading

fix: estimated prompt tokens are not equal to api response #405

Are you sure you want to change the base?

fix: estimated prompt tokens are not equal to api response #405

Conversation

aiperon commented Aug 18, 2023 • edited Loading

gpt-3.5-turbo-0613

gpt-4-0613

n3d1117 commented Sep 11, 2023

aiperon commented Sep 11, 2023 • edited Loading

n3d1117 commented Sep 13, 2023 • edited Loading

aiperon commented Aug 18, 2023 •

edited

Loading

aiperon commented Sep 11, 2023 •

edited

Loading

n3d1117 commented Sep 13, 2023 •

edited

Loading