Skip to content

moawiah/synthetic_data_generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Synthetic Data Generator

A Python-based tool to generate structured, synthetic job postings using open-source LLMs from Hugging Face.
This project supports both script-based execution and an interactive Colab notebook, making it ideal for rapid prototyping, dataset bootstrapping, or demonstrating prompt engineering techniques.

Demo Screenshot

This tool helps:

  • Researchers create labeled training data for NLP classification or QA
  • HR tech startups prototype recommendation models
  • AI instructors demonstrate few-shot prompting in class

✨ Features

  • 🔗 Integrates Hugging Face Transformer models
  • 📄 Generates realistic job postings in structured JSON format
  • 🧪 Supports prompt engineering with control over output length and variability
  • 🧠 Minimal Gradio UI for non-technical users
  • 📓 Jupyter/Colab support for experimentation and reproducibility

📂 Project Structure

 ```
. ├── app/ 
    │ 
    ├── app.py # Main script entry point 
    │ 
    ├── consts.py # Configuration and constants 
    │ 
    └── requirements.txt # Python dependencies 
  ├── data/ 
    │ 
    └── software_engineer_jobs.json # Sample input data (JSON format) 
  ├── notebooks/ 
    │ 
    └── synthetic_data_generator.ipynb # Interactive Colab notebook 
  ├── .env.example # Sample environment variable config 
  ├── .gitignore # Git ignored files list 
  └── README.md
  ``` 

🚀 Getting Started

1. Clone the repository

git clone https://github.com/moawiah/synthetic_data_generator.git
cd synthetic_data_generator

Install Dependencies

pip install -r app/requirements.txt

Hugging Face Token

You need to create a .env file with your HuggingFace token like HF_TOKEN=your-token-here

Run

run the app using python app/app.py

Example Output - 1 Job

{
"title": "Software Engineer"
,
"description": "We are seeking a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have experience in designing, coding, and testing software systems, and will be able to work collaboratively with cross-functional teams. Responsibilities include writing clean, maintainable, and efficient code, as well as actively participating in code reviews and continuous integration processes. This is an excellent opportunity for a self-starter with a passion for technology and a desire to grow in their career."
,
"requirements":[
"0":"Bachelor's degree in Computer Science or related field",
"1":"Minimum of 2 years experience in software development",
"2":"Strong proficiency in Java or C++",
"3":"Experience with agile development methodologies",
"4":"Good understanding of data structures and algorithms",
"5":"Excellent problem-solving and analytical skills"
],
"location":"New York, NY",
"company_name":"ABC Technologies"
}

Future Improvements

🔁 Add support for more job roles and industries

🧠 Model selector from UI

💾 Export dataset as CSV

☁️ Optional integration with LangChain or RAG workflows

About

AI Agent to generate job descriptions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published