Token-efficient schema definition for getting structured output from LLMs.
-
Compact schema definition: StructLM uses a proprietary object notation that is more compact and is more token-efficient than JSON schemas.
-
Clear and readable: StructLM's schema definition is human-readable, and is more similar to natural TypeScript syntax. See SPECIFICATION.md for the full specification.
-
More expressive validation: Validations are defined as functions, and are serialized to be sent to LLMs.
-
No accuracy loss: Despite being more compact, StructLM does not lose any accuracy when generating structured output, when compared to JSON schemas. See BENCHMARKS.md for more details on our benchmarks.
-
Lightweight: Zero dependencies, focused solely on runtime schema definition, and output validation.
-
Type-safety: StructLM provides full zod-like TypeScript type inference at compile time, and assertions at run time.
This is a benchmark of StructLM vs JSON Schema, using Claude 3.5 Haiku. For the full benchmark, see BENCHMARKS.md.
- JSON-Schema: 414 tokens (average)
- StructLM: 222 tokens (average)
- Reduction: 46.4% (average)
- Accuracy: Equal
- JSON-Schema: 1460 tokens (average)
- StructLM: 610 tokens (average)
- Reduction: 58.2% (average)
- Accuracy: StructLM is slightly better (+0.4% on average)
- JSON-Schema: 852 tokens (average)
- StructLM: 480 tokens (average)
- Reduction: 43.7% (average)
- Accuracy: Equal
npm install structlmimport { s } from 'structlm';
// Define a user schema
const userSchema = s.object({
name: s.object({
first: s.string(),
last: s.string()
}),
age: s.number(),
active: s.boolean(),
tags: s.array(s.string())
});
// Generate schema description for LLM
console.log(userSchema.stringify());
// Output: "{ name: { first: string, last: string }, age: number, active: boolean, tags: [string] }"
// Parse and validate JSON data
const userData = userSchema.parse('{"name":{"first":"John","last":"Doe"},"age":30,"active":true,"tags":["developer","typescript"]}');
// Returns: { name: { first: "John", last: "Doe" }, age: 30, active: true, tags: ["developer", "typescript"] }Here's a complete example showing how to use StructLM with an LLM to extract structured data:
import { s } from 'structlm';
// 1. Define your schema
const contactSchema = s.object({
name: s.string(),
email: s.string().validate(email => email.includes('@')),
phone: s.string().optional(),
company: s.string()
});
// 2. Create your prompt with the schema
const text = "Contact John Doe at john@example.com or call (555) 123-4567. He works at Tech Corp.";
const prompt = `
Extract contact information from the following text and return it as JSON matching this structure:
${contactSchema.stringify()}
Text: "${text}"
Return only the JSON object, no additional text.`;
// The schema.stringify() outputs:
// { name: string, email: string /* email=>email.includes('@') */, phone: string /* optional */, company: string }
// 3. Send prompt to LLM (the LLM returns this JSON string)
const llmResponse = `{
"name": "John Doe",
"email": "john@example.com",
"phone": "(555) 123-4567",
"company": "Tech Corp"
}`;
// 4. Parse and validate the LLM response
const contact = contactSchema.parse(llmResponse);
// Returns: { name: "John Doe", email: "john@example.com", phone: "(555) 123-4567", company: "Tech Corp" }
// The parse() method validates the email format and ensures all required fields are presentFor the specification of the custom object notation, see SPECIFICATION.md.
Creates a string schema.
const nameSchema = s.string();
console.log(nameSchema.stringify()); // "string"
// Parse and validate a string
const name = nameSchema.parse('"John"'); // "John"Creates a number schema.
const ageSchema = s.number();
console.log(ageSchema.stringify()); // "number"
// Parse and validate a number
const age = ageSchema.parse('25'); // 25Creates a boolean schema.
const activeSchema = s.boolean();
console.log(activeSchema.stringify()); // "boolean"
// Parse and validate a boolean
const isActive = activeSchema.parse('true'); // trueCreates an array schema with specified item type.
const numbersSchema = s.array(s.number());
console.log(numbersSchema.stringify()); // "[number]"
// Parse and validate an array
const numbers = numbersSchema.parse('[1, 2, 3, 4]'); // [1, 2, 3, 4]
const usersSchema = s.array(s.object({
name: s.string(),
age: s.number()
}));
console.log(usersSchema.stringify());
// "[ { name: string, age: number } ]"
// Parse complex array
const users = usersSchema.parse('[{"name":"John","age":30},{"name":"Jane","age":25}]');
// Returns: [{ name: "John", age: 30 }, { name: "Jane", age: 25 }]Creates an object schema with specified properties.
const personSchema = s.object({
name: s.string(),
age: s.number(),
address: s.object({
street: s.string(),
city: s.string(),
zipCode: s.string()
})
});
console.log(personSchema.stringify());
// "{ name: string, age: number, address: { street: string, city: string, zipCode: string } }"
// Parse and validate an object
const person = personSchema.parse(`{
"name": "John Doe",
"age": 30,
"address": {
"street": "123 Main St",
"city": "Anytown",
"zipCode": "12345"
}
}`);
// Returns typed object with validationAdds custom validation using a JavaScript function.
** IMPORTANT **: Validation functions need to be pure functions, and not reference any external variables.
const emailSchema = s.string().validate(email => email.includes('@'));
const positiveNumberSchema = s.number().validate(n => n > 0);
const adultAgeSchema = s.number().validate(age => age >= 18);
// Chaining validation with schema definition
const userSchema = s.object({
email: s.string().validate(email => email.includes('@')),
age: s.number().validate(age => age >= 0),
username: s.string().validate(name => name.length >= 3)
});Makes a field optional in object schemas.
const userSchema = s.object({
name: s.string(),
age: s.number().optional(),
bio: s.string().validate(bio => bio.length <= 500).optional(),
tags: s.array(s.string()).optional()
});
console.log(userSchema.stringify());
// Output: "{ name: string, age: number /* optional */, bio: string /* bio=>bio.length<=500, optional */, tags: [string] /* optional */ }"
// All these are valid:
userSchema.parse('{"name":"John"}');
userSchema.parse('{"name":"John","age":30}');
userSchema.parse('{"name":"John","age":30,"bio":"Developer","tags":["js","ts"]}');StructLM provides full TypeScript type inference:
import { s, Infer } from 'structlm';
const userSchema = s.object({
name: s.string(),
age: s.number(),
active: s.boolean()
});
type User = Infer<typeof userSchema>;
// User = { name: string; age: number; active: boolean; }const apiResponseSchema = s.object({
status: s.string().validate(s => ['success', 'error'].includes(s)),
data: s.object({
users: s.array(s.object({
id: s.number(),
profile: s.object({
name: s.object({
first: s.string(),
last: s.string()
}),
contact: s.object({
email: s.string().validate(email => email.includes('@')),
phone: s.string()
})
}),
permissions: s.array(s.string()),
metadata: s.object({
createdAt: s.string(),
lastLogin: s.string(),
loginCount: s.number().validate(n => n >= 0)
})
}))
}),
pagination: s.object({
page: s.number().validate(n => n > 0),
limit: s.number().validate(n => n > 0),
total: s.number().validate(n => n >= 0)
})
});
console.log(apiResponseSchema.stringify());
// Outputs clean, readable schema description// Email validation
const emailSchema = s.string().validate(email => {
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
return emailRegex.test(email);
});
// Age validation
const ageSchema = s.number().validate(age => age >= 0 && age <= 120);
// Username validation
const usernameSchema = s.string().validate(username => {
return username.length >= 3 &&
username.length <= 20 &&
/^[a-zA-Z0-9_]+$/.test(username);
});
// Complex object with multiple validations
const registrationSchema = s.object({
username: usernameSchema,
email: emailSchema,
age: ageSchema,
password: s.string().validate(pwd => pwd.length >= 8),
confirmPassword: s.string(),
acceptTerms: s.boolean().validate(accepted => accepted === true)
});StructLM provides a more compact alternative to JSON Schema for LLM applications. Here's how they compare:
JSON Schema:
{
"type": "object",
"properties": {
"name": {
"type": "string",
"minLength": 2
},
"email": {
"type": "string",
"format": "email"
},
"age": {
"type": "number",
"minimum": 18,
"maximum": 120
},
"roles": {
"type": "array",
"items": { "type": "string" },
"minItems": 1
}
},
"required": ["name", "email", "age", "roles"]
}StructLM Schema:
{
name: string /* name=>name.length>=2 */,
email: string /* email=>email.includes("@") */,
age: number /* age=>age>=18&&age<=120 */,
roles: [string] /* arr=>arr.length>=1 */
}
To get this schema, this is the expression you would use:
const userSchema = s.object({
name: s.string().validate(name => name.length >= 2),
email: s.string().validate(email => email.includes('@')),
age: s.number().validate(age => age >= 18 && age <= 120),
roles: s.array(s.string()).validate(arr => arr.length >= 1)
});A: While StructLM is inspired by Zod's API, it's specifically designed for LLM integration. StructLM generates compact schema descriptions optimized for AI prompts (XX% fewer tokens), while Zod focuses on general TypeScript validation. StructLM's .stringify() method produces LLM-friendly output, whereas Zod employs zod-to-json-schema or equivalent tools.
A: Yes! StructLM mostly works for standard TypeScript data validation. Use .parse() for validation and type inference just like Zod. However, StructLM's main advantage is its token-efficient LLM integration capabilities. Therefore, some of the more advanced Typescript features like discriminated unions, recursive types, etc. may not work as expected right now.
StructLM itself is model agnostic, and works as a schema definition and data validation library. Reliability may vary by model, but our benchmarks show consistent results across major providers.
A: Yes! StructLM is a lightweight TypeScript library with zero dependencies that works in browsers, Node.js, Deno, and Bun.
A: Use the .optional() method on any field:
const schema = s.object({
name: s.string(),
age: s.number().optional(),
email: s.string().validate(e => e.includes('@')).optional()
});
// Outputs: { name: string, age: number /* optional */, email: string /* e=>e.includes("@"), optional */ }Optional fields are excluded from validation when missing from the input data.
A: Union types are not currently supported but are on the roadmap. For now, use string validation with enums:
const statusSchema = s.string().validate(status =>
['pending', 'approved', 'rejected'].includes(status)
);A: Use nested s.array() calls:
const matrixSchema = s.array(s.array(s.number()));
// Outputs: [[number]]
const complexSchema = s.array(
s.object({
items: s.array(s.string()).validate(arr => arr.length > 0)
})
);A: No, validation functions only receive the current field's value. Cross-field validation isn't currently supported.
A: Our benchmarks show equal or better accuracy compared to JSON Schema. The compact format is:
- Less verbose and confusing
- More similar to natural TypeScript syntax
- Includes validation hints inline
- Reduces prompt complexity
A: Yes! Use .stringify() on multiple schemas:
const userSchema = s.object({...});
const orderSchema = s.object({...});
const prompt = `
Process this data and return:
- User: ${userSchema.stringify()}
- Order: ${orderSchema.stringify()}
`;A: StructLM's .parse() method throws descriptive errors for invalid data:
try {
const result = schema.parse(llmResponse);
} catch (error) {
console.log('LLM returned invalid data:', error.message);
// Handle error: retry, use fallback, etc.
}A: StructLM is lightweight:
- Schema creation: Minimal overhead
.stringify(): Fast string concatenation.parse(): JSON.parse + validation functions- No runtime dependencies
A: Schema stringification is already very fast, but you can cache results:
const userSchemaString = userSchema.stringify();
// Reuse userSchemaString in multiple promptsA: Validation functions are serialized as text hints for LLMs but only enforced during .parse(). Make sure your function:
- Uses simple, clear logic
- Doesn't reference external variables
- Is readable when converted to string
A: Yes! Use .stringify() to see exactly what gets sent to the LLM:
console.log(schema.stringify());
// Shows the compact format with validation hintsWe welcome contributions! Please open an issue or submit a pull request on GitHub.
Apache 2.0 License
- π Report Issues
- π‘ Feature Requests
- π Documentation
