fix: strip markdown fences before Pydantic JSON validation in Converter#4641
fix: strip markdown fences before Pydantic JSON validation in Converter#4641Aftabbs wants to merge 1 commit intocrewAIInc:mainfrom
Conversation
LLMs commonly return JSON wrapped in markdown code blocks such as:
```json
{"key": "value"}
```
Pydantic's model_validate_json() expects raw JSON and raises
"Invalid JSON: expected value at line 1 column 1" when it encounters
the leading triple-backtick, causing long-term memory saves and task
evaluations backed by such LLMs to fail silently.
Changes:
- Add _strip_markdown_fences() helper that removes the surrounding
```json ... ``` or ``` ... ``` wrapper from a string, leaving raw
JSON. Plain JSON strings are returned unchanged.
- Apply the helper before every model_validate_json() call in:
- Converter.to_pydantic() (both the function-calling and plain-text
paths, including the nested fallback branch)
- validate_model()
- No changes to handle_partial_json(), which already extracts the
JSON object via regex before validation.
Fixes crewAIInc#4509
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Free Tier Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| _JSON_PATTERN: Final[re.Pattern[str]] = re.compile(r"({.*})", re.DOTALL) | ||
| _MARKDOWN_FENCE_PATTERN: Final[re.Pattern[str]] = re.compile( | ||
| r"^```(?:json)?\s*(.*?)\s*```$", re.DOTALL | ||
| ) |
There was a problem hiding this comment.
Case-sensitive regex corrupts output for uppercase language identifiers
Medium Severity
_MARKDOWN_FENCE_PATTERN uses case-sensitive (?:json)?, which only matches lowercase "json". When an LLM returns ```JSON (uppercase), the optional group matches empty, and (.*?) captures the language identifier along with the JSON body — e.g. JSON\n{"key": "value"}. This corrupted string is then passed to model_validate_json, which fails. The existing extract_json_from_llm_response in json_parser.py already uses re.IGNORECASE for the same kind of pattern. Adding re.IGNORECASE to the flags (or matching any word-character language tag like \w*) would fix this.


LLMs commonly return JSON wrapped in markdown code blocks such as:
Pydantic's model_validate_json() expects raw JSON and raises "Invalid JSON: expected value at line 1 column 1" when it encounters the leading triple-backtick, causing long-term memory saves and task evaluations backed by such LLMs to fail silently.
Changes:
json ...or...wrapper from a string, leaving raw JSON. Plain JSON strings are returned unchanged.Fixes #4509
Note
Low Risk
Low risk: adds a small pre-processing step before
model_validate_jsonand backs it with regression tests; behavior only changes for markdown-fenced outputs that previously failed parsing.Overview
Fixes JSON-to-Pydantic conversion failures when LLMs return JSON wrapped in markdown fences (e.g.
json ...).Adds
_strip_markdown_fences()and applies it before everymodel_validate_json()call inConverter.to_pydantic()(including the partial-JSON fallback) andvalidate_model(), plus regression tests covering fenced and unfenced JSON parsing.Written by Cursor Bugbot for commit 63d86c2. This will update automatically on new commits. Configure here.