Skip to content

fix: strip markdown fences before Pydantic JSON validation in Converter#4641

Open
Aftabbs wants to merge 1 commit intocrewAIInc:mainfrom
Aftabbs:fix/markdown-fences-pydantic-validation
Open

fix: strip markdown fences before Pydantic JSON validation in Converter#4641
Aftabbs wants to merge 1 commit intocrewAIInc:mainfrom
Aftabbs:fix/markdown-fences-pydantic-validation

Conversation

@Aftabbs
Copy link

@Aftabbs Aftabbs commented Feb 28, 2026

LLMs commonly return JSON wrapped in markdown code blocks such as:

```json
{"key": "value"}
```

Pydantic's model_validate_json() expects raw JSON and raises "Invalid JSON: expected value at line 1 column 1" when it encounters the leading triple-backtick, causing long-term memory saves and task evaluations backed by such LLMs to fail silently.

Changes:

  • Add _strip_markdown_fences() helper that removes the surrounding json ... or ... wrapper from a string, leaving raw JSON. Plain JSON strings are returned unchanged.
  • Apply the helper before every model_validate_json() call in:
    • Converter.to_pydantic() (both the function-calling and plain-text paths, including the nested fallback branch)
    • validate_model()
  • No changes to handle_partial_json(), which already extracts the JSON object via regex before validation.

Fixes #4509


Note

Low Risk
Low risk: adds a small pre-processing step before model_validate_json and backs it with regression tests; behavior only changes for markdown-fenced outputs that previously failed parsing.

Overview
Fixes JSON-to-Pydantic conversion failures when LLMs return JSON wrapped in markdown fences (e.g. json ... ).

Adds _strip_markdown_fences() and applies it before every model_validate_json() call in Converter.to_pydantic() (including the partial-JSON fallback) and validate_model(), plus regression tests covering fenced and unfenced JSON parsing.

Written by Cursor Bugbot for commit 63d86c2. This will update automatically on new commits. Configure here.

LLMs commonly return JSON wrapped in markdown code blocks such as:

    ```json
    {"key": "value"}
    ```

Pydantic's model_validate_json() expects raw JSON and raises
"Invalid JSON: expected value at line 1 column 1" when it encounters
the leading triple-backtick, causing long-term memory saves and task
evaluations backed by such LLMs to fail silently.

Changes:
- Add _strip_markdown_fences() helper that removes the surrounding
  ```json ... ``` or ``` ... ``` wrapper from a string, leaving raw
  JSON. Plain JSON strings are returned unchanged.
- Apply the helper before every model_validate_json() call in:
  - Converter.to_pydantic() (both the function-calling and plain-text
    paths, including the nested fallback branch)
  - validate_model()
- No changes to handle_partial_json(), which already extracts the
  JSON object via regex before validation.

Fixes crewAIInc#4509
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Free Tier Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

_JSON_PATTERN: Final[re.Pattern[str]] = re.compile(r"({.*})", re.DOTALL)
_MARKDOWN_FENCE_PATTERN: Final[re.Pattern[str]] = re.compile(
r"^```(?:json)?\s*(.*?)\s*```$", re.DOTALL
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case-sensitive regex corrupts output for uppercase language identifiers

Medium Severity

_MARKDOWN_FENCE_PATTERN uses case-sensitive (?:json)?, which only matches lowercase "json". When an LLM returns ```JSON (uppercase), the optional group matches empty, and (.*?) captures the language identifier along with the JSON body — e.g. JSON\n{"key": "value"}. This corrupted string is then passed to model_validate_json, which fails. The existing extract_json_from_llm_response in json_parser.py already uses re.IGNORECASE for the same kind of pattern. Adding re.IGNORECASE to the flags (or matching any word-character language tag like \w*) would fix this.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Pydantic Validation error while saving in Memory

1 participant