Implement YAML mode and benchmark efficiency #1839

jxnl · 2025-10-07T00:57:52Z

Add YAML mode to the instructor library to enable LLMs to return structured data in YAML, improving token efficiency and readability.

This PR introduces instructor.Mode.YAML, allowing users to prompt LLMs for YAML output. Benchmarks show YAML is approximately 28.1% more token-efficient than pretty-printed JSON, while maintaining full Pydantic model compatibility and robust error handling.

Slack Thread

cursor · 2025-10-07T00:57:53Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

cloudflare-workers-and-pages · 2025-10-07T00:58:07Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
✅ Deployment successful! View logs	instructor	`d95ca11`	Oct 08 2025, 12:30 AM

claude · 2025-10-07T00:58:07Z

Claude finished @jxnl's task —— View job

PR Review: YAML Mode Implementation

I've completed a comprehensive code review of this YAML mode implementation. Overall, this is a solid feature addition that follows instructor's architecture well, but there are several important issues to address.

✅ Strengths

Consistent Architecture: The implementation follows the established pattern perfectly:
- Added Mode.YAML to the enum (instructor/mode.py:26)
- Properly categorized as a JSON mode (instructor/mode.py:109)
- Integrated into all the right handler mappings across response.py and openai/utils.py
Complete Integration: YAML mode is properly integrated across the entire request/response lifecycle:
- Request handling (handle_yaml_mode in openai/utils.py:422-465)
- Response parsing (parse_yaml in function_calls.py:647-700)
- Error recovery (reask_yaml in openai/utils.py:81-103)
Robust YAML Parsing: The parse_yaml method includes thoughtful logic:
- Code block extraction with regex patterns for both yaml and yml
- Fallback to raw content detection with heuristics
- Proper error handling with descriptive messages
- Safe YAML loading with security considerations
Benchmarking Evidence: Concrete performance data showing 28.1% token efficiency improvement over pretty JSON.

⚠️ Critical Issues

1. Missing Dependency Management

Issue: The code imports yaml but PyYAML is not listed in pyproject.toml dependencies
Impact: This will cause runtime ImportError for users who don't have PyYAML installed
Fix needed: Add pyyaml as either a core dependency or optional dependency

2. No Test Coverage

Issue: No actual tests found for YAML mode functionality
Impact: No verification that the implementation works correctly with real LLM responses
Repository guideline: "Include tests for all changes" (CLAUDE.md:155)
Expected: Tests in tests/llm/test_openai/ following existing patterns

3. Import Error Handling

Issue: Missing graceful handling if PyYAML is not available
Impact: Hard crashes instead of informative error messages
Location: function_calls.py:654 - naked import yaml

🔧 Code Quality Issues

4. Code Duplication in Reask Handler

Issue: reask_yaml is nearly identical to reask_md_json
Location: openai/utils.py:81-103 vs 106-128
Improvement: Could be refactored to share common logic

5. Hard-coded Prompts

Issue: System message in handle_yaml_mode is hard-coded without flexibility
Location: openai/utils.py:436-445
Concern: No way to customize the prompt style or add provider-specific optimizations

6. Limited Error Context in Reask

Issue: YAML reask message could be more specific about YAML formatting requirements
Location: Line 99: "Correct your YAML ONLY RESPONSE, based on the following errors:\n{exception}"
Improvement: Could include YAML-specific guidance

📋 Missing Features

7. Provider Support

Issue: Only implemented for OpenAI provider
Gap: Other providers (Anthropic, Gemini, etc.) don't support YAML mode
Expected: Based on the architecture, should support major providers

8. Documentation Gaps

Missing: No integration with main docs in docs/ directory
Missing: No examples in examples/ directory following the pattern
Present: Only standalone README file

🔒 Security & Performance

9. YAML Security

Positive: Uses yaml.safe_load() which is secure
Good: Proper exception handling for malformed YAML

10. Regex Performance

Issue: Multiple regex operations in sequence could be optimized
Location: function_calls.py:670-680
Minor: Not critical but could be more efficient

📊 Benchmarking Script Issues

11. Benchmark Accuracy

Issue: Token estimation (len(text) // 4) is very rough
Improvement: Could use actual tokenizer for more accurate results
Note: Results are directionally correct but may not reflect real-world usage

🛠️ Recommended Actions

High Priority:

Add PyYAML dependency to pyproject.toml (core or optional extra)
Add comprehensive tests following tests/llm/test_openai/ patterns
Add graceful import error handling with helpful error messages

Medium Priority:
4. Extend to other providers (Anthropic, Gemini) for feature parity
5. Add proper documentation integration and examples
6. Refactor reask handlers to reduce duplication

Low Priority:
7. Optimize regex patterns and improve error messages
8. Add prompt customization options

💭 Architecture Feedback

The implementation demonstrates excellent understanding of instructor's architecture. The mode-based dispatch pattern is used correctly, and the integration points are all properly covered. The choice to treat YAML as a JSON mode is architecturally sound since both produce structured data that maps to Pydantic models.

🎯 Overall Recommendation

Approve with required changes. This is a valuable feature that addresses a real need (token efficiency), but the missing dependency and test coverage issues must be resolved before merging. The implementation quality is good and follows established patterns well.

Implements YAML mode as an alternative to JSON for LLM responses. Changes: - Added Mode.YAML enum value - Implemented YAML request handler and response parser - Added YAML mode to allowed modes for OpenAI provider - Improved system prompt for YAML responses - Added accurate token benchmark using tiktoken Token efficiency (vs JSON Pretty): - YAML Standard: +32.1% more efficient - JSON Compact: +40.8% more efficient (most efficient) Co-authored-by: jason <jason@jxnl.co>

claude · 2025-10-08T00:15:54Z

Claude finished @jxnl's task —— View job

PR Review: YAML Mode Implementation

I've completed a comprehensive code review of this YAML mode implementation. Overall, this is a well-architected feature that follows instructor's patterns correctly, but there are several critical issues that must be addressed before merging.

✅ Strengths

Excellent Architecture Adherence: The implementation perfectly follows instructor's established patterns:
- Added Mode.YAML to the enum (instructor/mode.py:26)
- Properly categorized as a JSON mode (instructor/mode.py:109)
- Integrated into all correct handler mappings (response.py:436,616 and openai/utils.py)
Complete Integration: YAML mode is properly integrated across the entire request/response lifecycle:
- Request handling (handle_yaml_mode in openai/utils.py:421-463)
- Response parsing (parse_yaml in function_calls.py:646-706)
- Error recovery (reask_yaml in openai/utils.py:80-102)
Robust YAML Parsing: The parse_yaml method includes thoughtful logic:
- Code block extraction with regex patterns for both yaml and yml
- Fallback to raw content detection with heuristics
- Proper error handling with descriptive messages
- Safe YAML loading (yaml.safe_load()) for security
Comprehensive Benchmarking: The accurate_token_benchmark.py provides solid evidence of token efficiency benefits with real tiktoken measurements.

⚠️ Critical Issues

1. Missing PyYAML Dependency 🚨

Issue: Code imports yaml but PyYAML is not in pyproject.toml dependencies
Impact: Runtime ImportError for users who don't have PyYAML installed
Location: function_calls.py:653, accurate_token_benchmark.py:8
Fix needed: Add pyyaml as either core or optional dependency

2. No Test Coverage 🚨

Issue: Zero tests found for YAML mode functionality
Impact: No verification that implementation works with real LLM responses
Repository guideline violation: "Include tests for all changes" (CLAUDE.md:155)
Expected: Tests in tests/llm/test_openai/ following existing patterns

3. Import Error Handling ⚠️

Issue: Missing graceful handling if PyYAML unavailable
Location: function_calls.py:653 - naked import yaml
Better approach: Conditional import with informative error message

🔧 Code Quality Issues

4. Code Duplication

Issue: reask_yaml (lines 80-102) nearly identical to reask_md_json (lines 105-127)
Improvement: Could extract common reask logic to reduce duplication

5. Hard-coded System Messages

Issue: YAML prompt in handle_yaml_mode is inflexible
Location: openai/utils.py:435-443
Limitation: No customization options for different use cases

6. Limited Provider Support

Issue: Only implemented for OpenAI provider
Gap: Other major providers (Anthropic, Gemini) don't support YAML mode
Expected: Based on architecture, should support multiple providers

📋 Missing Documentation & Examples

7. No Integration Documentation

Missing: Documentation in docs/ directory
Missing: Examples in examples/ directory following repo patterns
Present: Only standalone README (not integrated with main docs)

🔒 Security & Performance Assessment

✅ Security: Good use of yaml.safe_load() prevents code injection
✅ Performance: Regex patterns are reasonable, no obvious bottlenecks
⚠️ Token Estimation: Benchmark uses rough estimation fallback when tiktoken unavailable

📊 Benchmarking Analysis

The accurate_token_benchmark.py is well-designed:

Uses actual tiktoken for precise measurements
Tests diverse data structures (simple, nested, arrays)
Shows meaningful efficiency gains (varies by data type)
Graceful fallback when tiktoken unavailable

🛠️ Required Actions for Merge

High Priority (Blocking):

Add PyYAML dependency to pyproject.toml
Add comprehensive tests in tests/llm/test_openai/
Add graceful import handling with helpful error messages

Medium Priority:
4. Extend to other providers (Anthropic, Gemini) for feature parity
5. Add proper documentation and examples integration
6. Refactor duplicate reask logic

Low Priority:
7. Add prompt customization options
8. Optimize regex patterns for better performance

💭 Architecture Feedback

The implementation demonstrates excellent understanding of instructor's architecture. The mode-based dispatch pattern is used correctly, and treating YAML as a JSON mode is architecturally sound since both produce structured data for Pydantic validation.

🎯 Recommendation

Conditional Approve - This is a valuable feature addressing real token efficiency needs, but the missing dependency and test coverage are blocking issues. The code quality is good and follows established patterns well.

The 28.1% token efficiency claim appears credible based on the benchmark implementation, making this a worthwhile addition once the critical issues are resolved.

claude · 2025-10-08T00:20:34Z

Claude finished @jxnl's task —— View job

PR Code Review: YAML Mode Implementation

I've completed a comprehensive code review of this YAML mode implementation. Here's my detailed analysis:

Review Tasks:

Read repository guidelines and setup instructions
Analyze all 5 changed files for implementation quality
Check dependency management and imports
Evaluate architecture and integration patterns
Assess security considerations
Review error handling and edge cases
Check test coverage and documentation
Provide comprehensive feedback and recommendations

✅ Strengths

1. Excellent Architecture Adherence
The implementation perfectly follows instructor's established patterns:

Added Mode.YAML to the enum (instructor/mode.py:26)
Properly categorized as a JSON mode (instructor/mode.py:109)
Integrated into all correct handler mappings (response.py:436,616 and openai/utils.py)

2. Complete Integration
YAML mode is properly integrated across the entire request/response lifecycle:

Request handling (handle_yaml_mode in openai/utils.py:421-463)
Response parsing (parse_yaml in function_calls.py:646-706)
Error recovery (reask_yaml in openai/utils.py:80-102)

3. Robust YAML Parsing
The parse_yaml method includes thoughtful logic:

Code block extraction with regex patterns for both yaml and yml
Fallback to raw content detection with heuristics
Proper error handling with descriptive messages
Safe YAML loading (yaml.safe_load()) for security

4. Clean System Message Design
The YAML prompt in handle_yaml_mode provides clear instructions and proper schema integration.

🚨 Critical Issues

1. Missing PyYAML Dependency (BLOCKING)

Issue: Code imports yaml but PyYAML is not in pyproject.toml dependencies
Impact: Runtime ImportError for users who don't have PyYAML installed
Location: function_calls.py:653
Fix needed: Add pyyaml as either core or optional dependency

2. No Test Coverage (BLOCKING)

Issue: Zero tests found for YAML mode functionality
Impact: No verification that implementation works with real LLM responses
Repository guideline violation: "Include tests for all changes" (CLAUDE.md:155)
Expected: Tests in tests/llm/test_openai/ following existing patterns

3. Import Error Handling (HIGH)

Issue: Missing graceful handling if PyYAML unavailable
Location: function_calls.py:653 - naked import yaml
Better approach: Conditional import with informative error message

⚠️ Code Quality Issues

4. Code Duplication

Issue: reask_yaml (lines 80-102) nearly identical to reask_md_json (lines 105-127)
Improvement: Could extract common reask logic to reduce duplication

5. Hard-coded System Messages

Issue: YAML prompt in handle_yaml_mode is inflexible
Location: openai/utils.py:435-443
Limitation: No customization options for different use cases

6. Limited Provider Support

Issue: Only implemented for OpenAI provider
Gap: Other major providers (Anthropic, Gemini) don't support YAML mode
Expected: Based on architecture, should support multiple providers

📋 Missing Documentation & Examples

7. No Integration Documentation

Missing: Documentation in docs/ directory
Missing: Examples in examples/ directory following repo patterns
Present: Only standalone implementation (not integrated with main docs)

🔒 Security & Performance Assessment

✅ Security: Good use of yaml.safe_load() prevents code injection
✅ Performance: Regex patterns are reasonable, no obvious bottlenecks
⚠️ Token Estimation: Implementation looks sound based on code analysis

🛠️ Required Actions for Merge

High Priority (Blocking):

Add PyYAML dependency to pyproject.toml
Add comprehensive tests in tests/llm/test_openai/
Add graceful import handling with helpful error messages

Medium Priority:
4. Extend to other providers (Anthropic, Gemini) for feature parity
5. Add proper documentation and examples integration
6. Refactor duplicate reask logic

Low Priority:
7. Add prompt customization options
8. Optimize regex patterns for better performance

💭 Architecture Feedback

The implementation demonstrates excellent understanding of instructor's architecture. The mode-based dispatch pattern is used correctly, and treating YAML as a JSON mode is architecturally sound since both produce structured data for Pydantic validation.

The choice to integrate at the same level as other JSON modes (JSON, MD_JSON, JSON_SCHEMA) is appropriate and maintains consistency with the existing codebase.

🎯 Recommendation

Conditional Approve - This is a valuable feature addressing real token efficiency needs, but the missing dependency and test coverage are blocking issues. The code quality is good and follows established patterns well.

Once the critical issues are resolved, this will be a solid addition to the instructor library that provides meaningful performance benefits while maintaining the library's high standards for reliability and usability.

github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request python Pull requests that update python code size:M This PR changes 30-99 lines, ignoring generated files. labels Oct 7, 2025

jxnl force-pushed the cursor/implement-yaml-mode-and-benchmark-efficiency-5b08 branch from 735c5f1 to 0578dd8 Compare October 8, 2025 00:15

Delete accurate_token_benchmark.py

d95ca11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement YAML mode and benchmark efficiency #1839

Implement YAML mode and benchmark efficiency #1839

Uh oh!

jxnl commented Oct 7, 2025

Uh oh!

cursor bot commented Oct 7, 2025

Uh oh!

cloudflare-workers-and-pages bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Implement YAML mode and benchmark efficiency #1839

Are you sure you want to change the base?

Implement YAML mode and benchmark efficiency #1839

Uh oh!

Conversation

jxnl commented Oct 7, 2025

Uh oh!

cursor bot commented Oct 7, 2025

Uh oh!

cloudflare-workers-and-pages bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

claude bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: YAML Mode Implementation

✅ Strengths

⚠️ Critical Issues

🔧 Code Quality Issues

📋 Missing Features

🔒 Security & Performance

📊 Benchmarking Script Issues

🛠️ Recommended Actions

💭 Architecture Feedback

🎯 Overall Recommendation

Uh oh!

claude bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: YAML Mode Implementation

✅ Strengths

⚠️ Critical Issues

🔧 Code Quality Issues

📋 Missing Documentation & Examples

🔒 Security & Performance Assessment

📊 Benchmarking Analysis

🛠️ Required Actions for Merge

💭 Architecture Feedback

🎯 Recommendation

Uh oh!

claude bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Review: YAML Mode Implementation

✅ Strengths

🚨 Critical Issues

⚠️ Code Quality Issues

📋 Missing Documentation & Examples

🔒 Security & Performance Assessment

🛠️ Required Actions for Merge

💭 Architecture Feedback

🎯 Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages bot commented Oct 7, 2025 •

edited

Loading

claude bot commented Oct 7, 2025 •

edited

Loading

claude bot commented Oct 8, 2025 •

edited

Loading

claude bot commented Oct 8, 2025 •

edited

Loading