DuckDB JSON Schema Extension by Query.Farm
The JSON Schema extension, developed by Query.Farm, brings powerful JSON schema validation and data enrichment capabilities directly to your SQL queries in DuckDB. Validate JSON data against schemas, ensure data integrity, and automatically apply default values using the industry-standard JSON Schema specification without leaving your database environment.
The JSON Schema extension is perfect for:
- Data validation: Ensure JSON data conforms to predefined schemas and business rules
- API data validation: Validate incoming API payloads and responses against contracts
- Data enrichment: Automatically apply default values to incomplete JSON documents
- Schema compliance: Verify that JSON schemas themselves are valid and well-formed
- Data quality assurance: Implement data quality checks in ETL pipelines
- Configuration validation: Ensure configuration files meet required specifications
- Document standardization: Apply consistent structure and defaults to JSON documents
- Data governance: Enforce data contracts and schemas across your organization
json_schema is a DuckDB Community Extension.
You can now use this by using this SQL:
INSTALL json_schema FROM community;
LOAD json_schema;Validates that a JSON schema is well-formed and follows the JSON Schema specification.
Usage:
-- Validate a well-formed schema
SELECT json_schema_validate_schema('{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "User Profile",
"type": "object",
"properties": {
"id": {
"type": "integer",
"description": "A unique identifier for the user"
}
}
}') as valid;
┌─────────┐
│ valid │
│ boolean │
├─────────┤
│ true │
└─────────┘Validates a JSON value against a JSON schema, returning true if the data conforms to the schema.
Usage:
-- Validate JSON data against a schema
SELECT json_schema_validate('{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "User Profile",
"type": "object",
"properties": {
"id": {
"type": "integer",
"description": "A unique identifier for the user"
}
}
}', {'id': 5}) as valid;
┌─────────┐
│ valid │
│ boolean │
├─────────┤
│ true │
└─────────┘Calculates the JSON patch operations needed to make the JSON data conform to the schema by adding missing default values.
Usage:
-- Generate patch to add missing default values
SELECT json_schema_patch('{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "User Profile",
"type": "object",
"properties": {
"id": {
"type": "integer",
"description": "A unique identifier for the user",
"default": 5
}
}
}', {'name': 'George'}) as patch;
┌───────────────────────────────────────┐
│ patch │
│ json │
├───────────────────────────────────────┤
│ [{"op":"add","path":"/id","value":5}] │
└───────────────────────────────────────┘Updates the JSON data to conform to the schema by applying default values for missing properties.
Usage:
-- Update JSON data with schema defaults
SELECT json_schema_update('{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "User Profile",
"type": "object",
"properties": {
"id": {
"type": "integer",
"description": "A unique identifier for the user",
"default": 5
}
}
}', {'name': 'George'}) as updated;
┌──────────────────────────┐
│ updated │
│ json │
├──────────────────────────┤
│ {"id":5,"name":"George"} │
└──────────────────────────┘Parameters:
schema: A JSON Schema string following the JSON Schema specificationjson_data: JSON data to validate or update (for validation and update functions)
JSON Schema Support: The JSON Schema extension supports the JSON Schema specification, including:
- Data types:
string,number,integer,boolean,array,object,null - Validation keywords:
required,properties,type,format,minimum,maximum - Default values:
defaultkeyword for automatic value assignment - Schema metadata:
title,description,$schema - Draft versions: Supports modern JSON Schema drafts
The JSON Schema extension supports a comprehensive set of JSON Schema features:
- Primitive types:
string,number,integer,boolean,null - Complex types:
object,array - Type validation: Ensures data matches expected types
- Required fields:
requiredarray to specify mandatory properties - Property definitions:
propertiesto define object structure - Numeric constraints:
minimum,maximum,multipleOf - String constraints:
minLength,maxLength,pattern,format - Array constraints:
minItems,maxItems,uniqueItems
- Automatic enrichment:
defaultkeyword automatically populates missing values - Nested defaults: Support for default values in nested objects and arrays
- Type-aware defaults: Ensures default values match property types
- Design schemas first: Define your JSON schemas before implementing data validation
- Use meaningful descriptions: Include
descriptionfields to document your schema intent - Leverage default values: Use
defaultproperties to ensure data completeness - Validate schemas: Always use
json_schema_validate_schema()to verify schema correctness - Handle validation failures: Plan for cases where data doesn't conform to schemas
- Version your schemas: Use
$schemato specify which JSON Schema draft you're using - Test with real data: Validate schemas against actual data samples during development
- Performance considerations: Complex schemas with deeply nested structures may impact performance
The JSON Schema extension provides clear error messages for common issues:
- Invalid schemas: Detailed messages about schema syntax or structure problems
- Validation failures: Specific information about which data doesn't conform to the schema
- Type mismatches: Clear indication when data types don't match schema expectations
- Missing required fields: Identification of which required properties are absent
The JSON Schema extension is open source and developed by Query.Farm. Contributions are welcome!