A comprehensive web-based framework for synthetic data generation and automated model performance benchmarking. Upload your tabular dataset, generate synthetic variants using multiple methods, and automatically evaluate model performance drift.
-
Multi-Method Synthetic Data Generation
- SMOTE (Synthetic Minority Oversampling Technique)
- GAN-based generation simulation
- Gaussian noise injection
- Bootstrap resampling
-
Automated Model Benchmarking
- Random Forest, Logistic Regression, SVM, XGBoost
- Cross-validation with configurable folds
- Performance drift detection
- Comprehensive metric evaluation
-
Interactive Dashboard
- Real-time progress tracking
- Visual performance comparisons
- Drift analysis and insights
- Export capabilities for CI/CD
- Modern Web Interface: React 18 + TypeScript + Tailwind CSS
- Responsive Design: Optimized for desktop and tablet workflows
- File Processing: CSV upload with automatic column type detection
- Export Formats: JSON, YAML, CSV for different use cases
- CI/CD Ready: Structured output for automated pipelines
Before running DataSynthBench locally, ensure you have:
- Node.js (version 16.0.0 or higher)
- npm (version 7.0.0 or higher) or yarn
- Git for cloning the repository
Check your versions:
node --version
npm --versiongit clone https://github.com/leandrenash/datasynthbench.git
cd datasynthbenchUsing npm:
npm installUsing yarn:
yarn installnpm run devThe application will be available at http://localhost:5173
npm run buildBuilt files will be in the dist/ directory.
datasynthbench/
βββ public/ # Static assets
βββ src/
β βββ components/ # React components
β β βββ Dashboard.tsx
β β βββ DatasetUpload.tsx
β β βββ ConfigurationPanel.tsx
β β βββ ResultsViewer.tsx
β β βββ ExportPanel.tsx
β βββ App.tsx # Main application component
β βββ main.tsx # Application entry point
β βββ index.css # Global styles
βββ docs/ # Documentation and images
βββ package.json
βββ README.md
βββ ...config files
- Navigate to the "Upload Data" tab
- Drag and drop a CSV file or click to browse
- Review the automatic column analysis and data preview
- Go to the "Configure" tab
- Select synthetic data generation methods (SMOTE, GAN, Noise, Resample)
- Choose models for benchmarking
- Adjust parameters as needed
- Click "Run Benchmark" to start the process
- Monitor real-time progress
- View detailed results in the "Results" tab
- Navigate to the "Export" tab
- Choose export format (JSON, YAML, CSV)
- Download complete results or summary reports
- File Format: CSV files only
- Size Limit: Up to 100MB
- Column Types: Automatic detection of numeric and categorical columns
- Missing Values: Handled automatically during processing
feature1,feature2,feature3,target
1.2,category_a,0.5,class_1
2.1,category_b,0.8,class_2
1.8,category_a,0.3,class_1
smote:
enabled: true
k_neighbors: 5
sampling_strategy: "auto"gan:
enabled: true
epochs: 100
batch_size: 32noise:
enabled: true
noise_level: 0.1
noise_type: "gaussian"resample:
enabled: true
strategy: "random"
ratio: 1.0models:
random_forest:
enabled: true
n_estimators: 100
logistic_regression:
enabled: true
C: 1.0
svm:
enabled: true
kernel: "rbf"
xgboost:
enabled: true
n_estimators: 100npm run dev- Start development servernpm run build- Build for productionnpm run preview- Preview production buildnpm run lint- Run ESLint
- TypeScript: Strict mode enabled
- ESLint: Configured with React and TypeScript rules
- Prettier: Code formatting (recommended)
-
New Synthetic Data Method:
- Add configuration options in
ConfigurationPanel.tsx - Implement generation logic simulation
- Update results processing
- Add configuration options in
-
New Model Type:
- Extend model configuration interface
- Add to benchmarking simulation
- Update results visualization
-
New Export Format:
- Add format option in
ExportPanel.tsx - Implement conversion function
- Test with sample data
- Add format option in
- Connect your GitHub repository to Netlify
- Set build command:
npm run build - Set publish directory:
dist - Deploy automatically on push
- Import project from GitHub
- Framework preset: Vite
- Build command:
npm run build - Output directory:
dist
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "run", "preview"]We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Run tests:
npm run lint - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
- Use the GitHub Issues page
- Include detailed reproduction steps
- Provide sample data if possible (anonymized)
This project is licensed under the MIT License - see the LICENSE file for details.
- Python backend integration for real synthetic data generation
- Advanced visualization with D3.js
- Model explainability features
- Automated hyperparameter tuning
- Integration with MLflow and other ML platforms
- Support for time series data
- Advanced drift detection algorithms
Made with β€οΈ for the data science community
