Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input/output token count returned by Langchain-Google seem excessively high #491

Closed
boriswang01 opened this issue Sep 14, 2024 · 7 comments · Fixed by #545
Closed

Input/output token count returned by Langchain-Google seem excessively high #491

boriswang01 opened this issue Sep 14, 2024 · 7 comments · Fixed by #545

Comments

@boriswang01
Copy link

boriswang01 commented Sep 14, 2024

Below is a output message returned by Gemini via Langchain:

"""
'Here's a reformatted version of the previous response, focusing on clarity and readability:\n\n## AI Model Performance & Financial Data\n\nLet's break down your questions one by one:\n\n1. MATH Score of Llama 400B\n\nLooking at the image, the MATH score for Llama-400b (early snapshot) is 57.8% using a 4-shot Chain of Thought (CoT) approach.\n\n2. Table 14.2: Betas for Financial Service Businesses\n\nWhile the exact table you requested wasn't found, the "finfirm09.pdf" document contains relevant data on betas for financial service businesses. This information is crucial for understanding the relationship between a company's stock price and the overall market.\n\nHere's the table from the document:\n\n| Category | US | Europe | Emerging Markets |\n|---------------------------|------|--------|-------------------|\n| Large Money Center Banks | 0.71 | 0.80 | 0.9 |\n| Small/Regional Banks | 0.91 | 0.98 | 1.05 |\n| Thrifts | 0.66 | 0.75 | 0.85 |\n| Brokerage Houses | 1.37 | 1.25 | 1.5 |\n| Investment Banks | 1.50 | 1.55 | 1.9 |\n| Life Insurance | 1.17 | 1.20 | 1.1 |\n| Property and Casualty Insurance Companies | 0.91 | 0.95 | 0.9 |\n\n3. Table 3: Default Spreads by Sovereign Ratings Class – September 2008\n\nThe "riskfreerate.pdf" document contains the requested Table 3, which details default spreads based on sovereign ratings. This information is essential for assessing credit risk and determining appropriate interest rates.\n\nHere's the table:\n\n| Sovereign Rating | Bonds/ CDS | Corporate Bonds |\n|------------------|------------|-----------------|\n| Aaa | 0.15% | 0.50% |\n| Aa1 | 0.30% | 0.80% |\n| Aa2 | 0.60% | 1.10% |\n| Aa3 | 0.80% | 1.20% |\n| A1 | 1.00% | 1.35% |\n| A2 | 1.30% | 1.45% |\n| A3 | 1.40% | 1.50% |\n| Baa1 | 1.70% | 1.70% |\n| Baa2 | 2.00% | 2.00% |\n| Baa3 | 2.25% | 2.60% |\n| Ba1 | 2.50% | 3.20% |\n| Ba2 | 3.00% | 3.50% |\n| Ba3 | 3.25% | 4.00% |\n| B1 | 3.50% | 4.50% |\n| B2 | 4.25% | 5.50% |\n| B3 | 5.00% | 6.50% |\n| Caa1 | 6.00% | 7.00% |\n| Caa2 | 6.75% | 9.00% |\n| Caa3 | 7.50% | 11.00% |\n\nThis table highlights the relationship between credit ratings and default spreads as of September 2008. As you can see, higher credit ratings generally correlate with lower default spreads, reflecting lower perceived risk. \n' response_metadata={'finish_reason': 'STOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOP', 'safety_ratings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}]} id='run-f9bce28d-e44b-4445-af09-605412426ff2' usage_metadata={'input_tokens': 181773, 'output_tokens': 7814, 'total_tokens': 189587}
"""

At the end, usage_metadata returned a output-token of 7814, this seems way too high for a relatively small output character amount (3004 character, 310 words), and likewise with the input tokens. It seems that the input and output token counter are all inflated by a factor of 10x, is this a bug?

Also do the safety_ratings output count towards input/output token count and cost?

@langcarl langcarl bot added the investigate label Sep 14, 2024
@boriswang01 boriswang01 changed the title Output token count returned by Langchain-Google overly high Input/output token count returned by Langchain-Google seem excessively high Sep 16, 2024
@boriswang01
Copy link
Author

@lkuligin Hey is it okay to get this bug investigated soon? We want to start using Gemini ASAP but this problem is preventing us from doing so because it's messing up our token counters.

@efriis
Copy link
Member

efriis commented Sep 19, 2024

could you add some code that reproduces this? Things that would be helpful:

  • which package are you on (langchain-google-genai or langchain-google-vertexai)
  • which version of that package are you on

@boriswang01
Copy link
Author

could you add some code that reproduces this? Things that would be helpful:

  • which package are you on (langchain-google-genai or langchain-google-vertexai)
  • which version of that package are you on

Hey, just tested this issue again for Gemini-1.5-Pro-002.

Problem update:

  • The output token count is now returned correctly; however the input token is still inflated (by about 20x).
  • We're using langchain-google-genai newest version (2.0.0)

It would be hard for me to share the prompt that reproduce this due to confidential material, but please check on your end that the input token count is correct.

The newest Gemini production model is really good at language translations, looking forward to integrate ASAP.

@efriis
Copy link
Member

efriis commented Sep 27, 2024

hey you don't have to output your exact prompt, but e.g. this repro on latest seems to be working. would be helpful for figuring out which features you might be using that has a bug!

without a reproducible example, I'll close it as "unable to reproduce" next week

note that tools add to input tokens as well, if that might be the source of confusion? https://ai.google.dev/gemini-api/docs/tokens?lang=python#system-instructions-and-tools

ScreenShot 2024-09-27 at 11 13 16AM
ScreenShot 2024-09-27 at 11 17 26AM

@boriswang01
Copy link
Author

hey you don't have to output your exact prompt, but e.g. this repro on latest seems to be working. would be helpful for figuring out which features you might be using that has a bug!

without a reproducible example, I'll close it as "unable to reproduce" next week

note that tools add to input tokens as well, if that might be the source of confusion? https://ai.google.dev/gemini-api/docs/tokens?lang=python#system-instructions-and-tools

ScreenShot 2024-09-27 at 11 13 16AM ScreenShot 2024-09-27 at 11 17 26AM

Hey I will DM you a sample input/output (due to company proprietary content). Please feel free to close this ticket if needed.

@lkuligin
Copy link
Collaborator

+1, we need a reproducible example (ideally, try to reproduce an issue with non-confidential prompt).
Please, check that you're not submitting any multimodal input with your prompt since it counts towards input tokens too.

@efriis efriis reopened this Oct 8, 2024
@efriis
Copy link
Member

efriis commented Oct 8, 2024

wrote a standard test for this in langchain-ai/langchain#27177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants