DataSynthBench

A comprehensive web-based framework for synthetic data generation and automated model performance benchmarking. Upload your tabular dataset, generate synthetic variants using multiple methods, and automatically evaluate model performance drift.

🚀 Features

Core Functionality

Multi-Method Synthetic Data Generation
- SMOTE (Synthetic Minority Oversampling Technique)
- GAN-based generation simulation
- Gaussian noise injection
- Bootstrap resampling
Automated Model Benchmarking
- Random Forest, Logistic Regression, SVM, XGBoost
- Cross-validation with configurable folds
- Performance drift detection
- Comprehensive metric evaluation
Interactive Dashboard
- Real-time progress tracking
- Visual performance comparisons
- Drift analysis and insights
- Export capabilities for CI/CD

Technical Features

Modern Web Interface: React 18 + TypeScript + Tailwind CSS
Responsive Design: Optimized for desktop and tablet workflows
File Processing: CSV upload with automatic column type detection
Export Formats: JSON, YAML, CSV for different use cases
CI/CD Ready: Structured output for automated pipelines

📋 Prerequisites

Before running DataSynthBench locally, ensure you have:

Node.js (version 16.0.0 or higher)
npm (version 7.0.0 or higher) or yarn
Git for cloning the repository

Check your versions:

node --version
npm --version

🛠️ Local Development Setup

1. Clone the Repository

git clone https://github.com/leandrenash/datasynthbench.git
cd datasynthbench

2. Install Dependencies

Using npm:

npm install

Using yarn:

yarn install

3. Start Development Server

npm run dev

The application will be available at http://localhost:5173

4. Build for Production

npm run build

Built files will be in the dist/ directory.

📁 Project Structure

datasynthbench/
├── public/                 # Static assets
├── src/
│   ├── components/         # React components
│   │   ├── Dashboard.tsx
│   │   ├── DatasetUpload.tsx
│   │   ├── ConfigurationPanel.tsx
│   │   ├── ResultsViewer.tsx
│   │   └── ExportPanel.tsx
│   ├── App.tsx            # Main application component
│   ├── main.tsx           # Application entry point
│   └── index.css          # Global styles
├── docs/                  # Documentation and images
├── package.json
├── README.md
└── ...config files

🎯 Quick Start Guide

1. Upload Your Dataset

Navigate to the "Upload Data" tab
Drag and drop a CSV file or click to browse
Review the automatic column analysis and data preview

2. Configure Generation

Go to the "Configure" tab
Select synthetic data generation methods (SMOTE, GAN, Noise, Resample)
Choose models for benchmarking
Adjust parameters as needed

3. Run Benchmark

Click "Run Benchmark" to start the process
Monitor real-time progress
View detailed results in the "Results" tab

4. Export Results

Navigate to the "Export" tab
Choose export format (JSON, YAML, CSV)
Download complete results or summary reports

📊 Supported Data Formats

Input Requirements

File Format: CSV files only
Size Limit: Up to 100MB
Column Types: Automatic detection of numeric and categorical columns
Missing Values: Handled automatically during processing

Example Dataset Structure

feature1,feature2,feature3,target
1.2,category_a,0.5,class_1
2.1,category_b,0.8,class_2
1.8,category_a,0.3,class_1

⚙️ Configuration Options

Synthetic Data Generators

SMOTE

smote:
  enabled: true
  k_neighbors: 5
  sampling_strategy: "auto"

GAN Simulation

gan:
  enabled: true
  epochs: 100
  batch_size: 32

Noise Injection

noise:
  enabled: true
  noise_level: 0.1
  noise_type: "gaussian"

Resampling

resample:
  enabled: true
  strategy: "random"
  ratio: 1.0

Model Configuration

models:
  random_forest:
    enabled: true
    n_estimators: 100
  logistic_regression:
    enabled: true
    C: 1.0
  svm:
    enabled: true
    kernel: "rbf"
  xgboost:
    enabled: true
    n_estimators: 100

🔧 Development

Available Scripts

npm run dev - Start development server
npm run build - Build for production
npm run preview - Preview production build
npm run lint - Run ESLint

Code Style

TypeScript: Strict mode enabled
ESLint: Configured with React and TypeScript rules
Prettier: Code formatting (recommended)

Adding New Features

New Synthetic Data Method:
- Add configuration options in ConfigurationPanel.tsx
- Implement generation logic simulation
- Update results processing
New Model Type:
- Extend model configuration interface
- Add to benchmarking simulation
- Update results visualization
New Export Format:
- Add format option in ExportPanel.tsx
- Implement conversion function
- Test with sample data

🚀 Deployment

Netlify (Recommended)

Connect your GitHub repository to Netlify
Set build command: npm run build
Set publish directory: dist
Deploy automatically on push

Vercel

Import project from GitHub
Framework preset: Vite
Build command: npm run build
Output directory: dist

Docker

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "run", "preview"]

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Workflow

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes
Run tests: npm run lint
Commit changes: git commit -m 'Add amazing feature'
Push to branch: git push origin feature/amazing-feature
Open a Pull Request

Reporting Issues

Use the GitHub Issues page
Include detailed reproduction steps
Provide sample data if possible (anonymized)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🗺️ Roadmap

Python backend integration for real synthetic data generation
Advanced visualization with D3.js
Model explainability features
Automated hyperparameter tuning
Integration with MLflow and other ML platforms
Support for time series data
Advanced drift detection algorithms

Made with ❤️ for the data science community

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
node_modules		node_modules
src		src
.gitattributes		.gitattributes
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

License

leandrenash/DataSynthBench

Folders and files

Latest commit

History

Repository files navigation