Fix: Handle criteria returned as JSON string in JudgeAgent#196
Open
rajdeepmahal24 wants to merge 2 commits intolangwatch:mainfrom
Open
Fix: Handle criteria returned as JSON string in JudgeAgent#196rajdeepmahal24 wants to merge 2 commits intolangwatch:mainfrom
rajdeepmahal24 wants to merge 2 commits intolangwatch:mainfrom
Conversation
Fixes langwatch#161 ## Problem JudgeAgent intermittently fails with `AttributeError: 'str' object has no attribute 'values'` when the LLM returns the `criteria` field as a JSON string instead of a dictionary object. This occurs at lines 439 and 444 when the code calls `criteria.values()` without verifying that `criteria` is actually a dict. ## Root Cause When the LLM is uncertain about the schema format (particularly with complex dynamic schemas using sanitized criterion text as property names), it sometimes serializes the nested `criteria` object as a JSON string rather than a proper dict. ## Solution Add defensive parsing after extracting criteria from tool call arguments: 1. Check if `criteria` is a string 2. If yes, attempt to parse it with `json.loads()` 3. If parsing fails, log a warning and use empty dict as fallback 4. Additionally verify `criteria` is a dict before calling `.values()` This ensures the code gracefully handles both formats: - Direct dict: `{"criterion_1": "true", "criterion_2": "false"}` - JSON string: `"{\"criterion_1\": \"true\", \"criterion_2\": \"false\"}"` ## Testing - Verified Python syntax with `python -m py_compile` - Fix includes detailed logging for debugging - Graceful fallback prevents test failures ## Impact - Low risk: Only adds defensive parsing with fallback - Fixes intermittent failures reported in issue langwatch#161 - No changes to normal execution path when criteria is already a dict
77a92af to
9fdb87c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #161
This PR addresses an intermittent bug where
JudgeAgentfails withAttributeError: 'str' object has no attribute 'values'when the LLM returns thecriteriafield as a JSON string instead of a dictionary object.Problem
The error occurs at lines 439 and 444 in
judge_agent.pywhen the code callscriteria.values()without verifying thatcriteriais actually a dict:Root Cause
When the LLM is uncertain about the schema format (particularly with complex dynamic schemas using sanitized criterion text as property names), it sometimes serializes the nested
criteriaobject as a JSON string rather than a proper dict.Example of problematic LLM response:
{ "verdict": "success", "reasoning": "...", "criteria": "{\"criterion_1\": \"true\", \"criterion_2\": \"false\"}" // ❌ String instead of object }Expected format:
{ "verdict": "success", "reasoning": "...", "criteria": {"criterion_1": "true", "criterion_2": "false"} // ✅ Object }Solution
This PR adds defensive parsing after extracting
criteriafrom tool call arguments:Check if
criteriais a stringjson.loads()Verify
criteriais a dict before calling.values()Benefits
✅ Graceful handling: Both dict and JSON string formats are now supported
✅ Detailed logging: Debug and warning messages help diagnose issues
✅ Safe fallback: Empty dict prevents test failures
✅ Low risk: Only adds defensive parsing, no changes to normal execution path
✅ Fixes intermittent failures: Addresses the root cause reported in #161
Testing
python -m py_compile{"criterion_1": "true", "criterion_2": "false"}"{\"criterion_1\": \"true\", \"criterion_2\": \"false\"}"Changes
File modified:
python/scenario/judge_agent.pyLines added: 24 lines of defensive parsing (after line 435)
Impact: Low risk, backward compatible
Related Issues
Closes #161
Ready for review! This fix prevents the intermittent
AttributeErrorwhile maintaining backward compatibility with existing tests.