-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input/output token count returned by Langchain-Google seem excessively high #491
Comments
@lkuligin Hey is it okay to get this bug investigated soon? We want to start using Gemini ASAP but this problem is preventing us from doing so because it's messing up our token counters. |
could you add some code that reproduces this? Things that would be helpful:
|
Hey, just tested this issue again for Gemini-1.5-Pro-002. Problem update:
It would be hard for me to share the prompt that reproduce this due to confidential material, but please check on your end that the input token count is correct. The newest Gemini production model is really good at language translations, looking forward to integrate ASAP. |
hey you don't have to output your exact prompt, but e.g. this repro on latest seems to be working. would be helpful for figuring out which features you might be using that has a bug! without a reproducible example, I'll close it as "unable to reproduce" next week note that tools add to input tokens as well, if that might be the source of confusion? https://ai.google.dev/gemini-api/docs/tokens?lang=python#system-instructions-and-tools |
Hey I will DM you a sample input/output (due to company proprietary content). Please feel free to close this ticket if needed. |
+1, we need a reproducible example (ideally, try to reproduce an issue with non-confidential prompt). |
wrote a standard test for this in langchain-ai/langchain#27177 |
Below is a output message returned by Gemini via Langchain:
"""
'Here's a reformatted version of the previous response, focusing on clarity and readability:\n\n## AI Model Performance & Financial Data\n\nLet's break down your questions one by one:\n\n1. MATH Score of Llama 400B\n\nLooking at the image, the MATH score for Llama-400b (early snapshot) is 57.8% using a 4-shot Chain of Thought (CoT) approach.\n\n2. Table 14.2: Betas for Financial Service Businesses\n\nWhile the exact table you requested wasn't found, the "finfirm09.pdf" document contains relevant data on betas for financial service businesses. This information is crucial for understanding the relationship between a company's stock price and the overall market.\n\nHere's the table from the document:\n\n| Category | US | Europe | Emerging Markets |\n|---------------------------|------|--------|-------------------|\n| Large Money Center Banks | 0.71 | 0.80 | 0.9 |\n| Small/Regional Banks | 0.91 | 0.98 | 1.05 |\n| Thrifts | 0.66 | 0.75 | 0.85 |\n| Brokerage Houses | 1.37 | 1.25 | 1.5 |\n| Investment Banks | 1.50 | 1.55 | 1.9 |\n| Life Insurance | 1.17 | 1.20 | 1.1 |\n| Property and Casualty Insurance Companies | 0.91 | 0.95 | 0.9 |\n\n3. Table 3: Default Spreads by Sovereign Ratings Class – September 2008\n\nThe "riskfreerate.pdf" document contains the requested Table 3, which details default spreads based on sovereign ratings. This information is essential for assessing credit risk and determining appropriate interest rates.\n\nHere's the table:\n\n| Sovereign Rating | Bonds/ CDS | Corporate Bonds |\n|------------------|------------|-----------------|\n| Aaa | 0.15% | 0.50% |\n| Aa1 | 0.30% | 0.80% |\n| Aa2 | 0.60% | 1.10% |\n| Aa3 | 0.80% | 1.20% |\n| A1 | 1.00% | 1.35% |\n| A2 | 1.30% | 1.45% |\n| A3 | 1.40% | 1.50% |\n| Baa1 | 1.70% | 1.70% |\n| Baa2 | 2.00% | 2.00% |\n| Baa3 | 2.25% | 2.60% |\n| Ba1 | 2.50% | 3.20% |\n| Ba2 | 3.00% | 3.50% |\n| Ba3 | 3.25% | 4.00% |\n| B1 | 3.50% | 4.50% |\n| B2 | 4.25% | 5.50% |\n| B3 | 5.00% | 6.50% |\n| Caa1 | 6.00% | 7.00% |\n| Caa2 | 6.75% | 9.00% |\n| Caa3 | 7.50% | 11.00% |\n\nThis table highlights the relationship between credit ratings and default spreads as of September 2008. As you can see, higher credit ratings generally correlate with lower default spreads, reflecting lower perceived risk. \n' response_metadata={'finish_reason': 'STOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOPSTOP', 'safety_ratings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}]} id='run-f9bce28d-e44b-4445-af09-605412426ff2' usage_metadata={'input_tokens': 181773, 'output_tokens': 7814, 'total_tokens': 189587}
"""
At the end, usage_metadata returned a output-token of 7814, this seems way too high for a relatively small output character amount (3004 character, 310 words), and likewise with the input tokens. It seems that the input and output token counter are all inflated by a factor of 10x, is this a bug?
Also do the safety_ratings output count towards input/output token count and cost?
The text was updated successfully, but these errors were encountered: