This project fine-tunes OpenAI's 4o-mini model to generate accurate SurrealQL queries from natural language prompts. The training data is organized into three syntax categories:
- Data Models
- Functions
- Statements
Each category contains more than 100 examples per version in JSONL format, including system prompts, user messages, and corresponding SurrealQL queries.
The following workflow explains how to contribute and add more valuable training datasets to this repository. By following these steps, you can help expand the collection of high-quality examples that improve the model's ability to generate accurate SurrealQL queries.
Create a base.jsonl
file with the following structure:
{
"messages": [
{
"role": "system",
"content": "system_message"
},
{
"role": "user",
"content": "Write a query to get all persons email"
},
{
"role": "assistant",
"content": "SELECT email FROM person;"
}
]
}
Use OpenAI's o1 Model to expand the dataset, following the syntax instructions.
python3 extract_queries.py ./statements/select/01-base.jsonl
Test extracted queries in Surreallist.app using the demo dataset.
python3 inject_schema.py ./statements/surreal-deal-store-min.surql ./statements/select/01-base.jsonl
This merges the schema with the system prompt, ensuring schema-compliant queries.
Schema (surreal-deal-store-min.surql):
DEFINE TABLE product SCHEMAFULL;
DEFINE FIELD name ON product TYPE string;
Output (01-patch.jsonl):
{
"role": "system",
"content": "You are a helpful AI assistant that generates SurrealQL queries based on the following schema:\n\nDEFINE TABLE product SCHEMAFULL;\nDEFINE FIELD name ON product TYPE string;\n\nGenerate SurrealQL queries based on this schema."
}
Upload the prepared dataset to OpenAI's platform and fine-tune the 4o-mini model following the fine-tuning guide.
A fine-tuned model capable of generating accurate SurrealQL queries from natural language, adaptable to specific schemas.
The fine-tuned model can serve as the foundation for a powerful AI agent specialized in SurrealQL query generation:
-
Schema-Aware Query Assistant
- Automatically adapts to any provided database schema
- Generates contextually relevant and syntactically correct queries
- Understands relationships and constraints defined in the schema
-
Natural Language to SurrealQL
- Translates plain English requests into optimized SurrealQL queries
- Handles complex query requirements while maintaining schema compliance
- Provides explanations and alternatives for generated queries
-
Query Optimization Agent
- Suggests performance improvements for existing queries
- Recommends appropriate indexes based on query patterns
- Identifies potential bottlenecks in query execution
-
Interactive Query Builder
- Guides users through query construction
- Offers real-time suggestions and corrections
- Validates queries against schema constraints
This agent could be integrated into various tools and platforms, making SurrealDB more accessible to developers of all skill levels.