-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
Current schema validation flags 1,573 responses (98.5% of files) as having "invalid_schema", but these are actually valid JSON responses with useful data - just in different structures than requested.
Current Schema Expectation
Prompt requests nested dict structure:
{
"human": {
"person": {
"count": 1,
"location": "center",
"color": ["blue"],
"size": "medium",
"description": "..."
}
}
}What Models Actually Return
Format 1: Array structure (most common)
{
"human": [
{"item": "person", "count": 12, "location": "background", ...}
]
}Format 2: Grouped attributes
{
"human": {
"item names": ["person"],
"attributes": {"count": 1, ...}
}
}Both formats contain all required information and are valid JSON - they just don't match the exact nesting we requested.
Impact
- Schema validation disabled in quality check (PR Add quality control tools for annotation validation #7) to avoid false positives
- Real issues: 143 problematic responses (0.5%) - mostly in structured_inventory
- 66 too_long (corruption)
- 141 json_parse_error
- 66 repetitive_pattern (corruption)
- 2 empty_response
Proposed Solutions
- Relax schema to accept both nested dict and array formats
- Update prompt to be more specific about exact structure needed
- Add post-processing to normalize different valid formats to canonical structure
- Accept multiple formats and document them as valid alternatives
Related
- Issue Should
responsebe limited or sanitized somehow? #4 (response sanitization) - PR Add quality control tools for annotation validation #7 (quality control tools)
Schema validation is useful for catching real structural errors, but current implementation is too strict and treats valid alternatives as errors.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request