Visual sklearn Pipeline Builder - A drag-and-drop interface for creating scikit-learn machine learning pipelines.
PipeLearn is a visual tool that helps ML engineers and data scientists build scikit-learn pipelines through an intuitive drag-and-drop interface. No more writing boilerplate code - just drag components, connect them, configure parameters, and generate production-ready Python code.
- Visual Pipeline Builder: Drag and drop sklearn components onto a canvas
- 50+ Components: Comprehensive catalog including:
- Preprocessing (StandardScaler, MinMaxScaler, OneHotEncoder, etc.)
- Feature Selection (SelectKBest, VarianceThreshold, RFE)
- Decomposition (PCA, TruncatedSVD)
- Estimators (LogisticRegression, RandomForest, SVC, etc.)
- Imputation (SimpleImputer, KNNImputer)
- Parameter Configuration: Easy-to-use UI for configuring component parameters
- Pipeline Validation: Real-time validation with helpful error messages
- Code Generation: Generates clean, production-ready Python code
- Export Options: Copy to clipboard or download as .py file
PipeLearn/
├── backend/ # FastAPI backend
│ ├── app.py # Main API application
│ ├── components.py # sklearn component catalog
│ ├── pipeline_generator.py # Code generation logic
│ └── requirements.txt
│
└── frontend/ # React frontend
├── public/
├── src/
│ ├── components/ # React components
│ │ ├── ComponentPanel.js
│ │ ├── PipelineEditor.js
│ │ ├── ComponentNode.js
│ │ ├── ParameterConfig.js
│ │ └── CodeViewer.js
│ ├── App.js
│ └── index.js
└── package.json
- Python 3.8+
- Node.js 16+
- npm or yarn
git clone https://github.com/yourusername/PipeLearn.git
cd PipeLearncd backend
pip install -r requirements.txtcd frontend
npm installcd backend
python app.pyThe API will be available at http://localhost:8000
In a new terminal:
cd frontend
npm startThe application will open in your browser at http://localhost:3000
- Add Components: Drag components from the left panel onto the canvas
- Connect Components: Click and drag from the bottom handle of one component to the top handle of another
- Configure Parameters: Click the gear icon (⚙️) on any component to configure its parameters
- Build Pipeline: Click the "Build Pipeline" button to generate Python code
- Export Code: Copy to clipboard or download the generated code
Here's a simple example of creating a classification pipeline:
- Drag
StandardScaler(Preprocessing) - Drag
PCA(Decomposition) - Drag
RandomForestClassifier(Estimators) - Connect them in order: StandardScaler → PCA → RandomForestClassifier
- Configure parameters:
- PCA: n_components = 5
- RandomForestClassifier: n_estimators = 200, random_state = 42
- Click "Build Pipeline"
Generated code:
# Generated by PipeLearn
# sklearn Pipeline Builder
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
# Create the pipeline
pipeline = make_pipeline(
StandardScaler(),
PCA(n_components=5),
RandomForestClassifier(n_estimators=200, random_state=42)
)
# Usage example:
# pipeline.fit(X_train, y_train)
# predictions = pipeline.predict(X_test)Returns the catalog of available sklearn components.
Response:
{
"preprocessing": {
"StandardScaler": {
"class": "sklearn.preprocessing.StandardScaler",
"description": "Standardize features...",
"parameters": {...}
}
}
}Generates Python code from the visual pipeline.
Request Body:
{
"nodes": [...],
"edges": [...]
}Response:
{
"success": true,
"code": "# Generated pipeline code...",
"pipeline_structure": [...]
}Validates the pipeline structure.
Response:
{
"valid": true,
"errors": [],
"warnings": []
}- StandardScaler, MinMaxScaler, RobustScaler
- Normalizer, OneHotEncoder, LabelEncoder
- PolynomialFeatures
- SelectKBest, VarianceThreshold, RFE
- PCA, TruncatedSVD
Classifiers:
- LogisticRegression
- RandomForestClassifier
- SVC (Support Vector Classification)
- GradientBoostingClassifier
Regressors:
- LinearRegression
- RandomForestRegressor
- SimpleImputer, KNNImputer
To add a new sklearn component:
- Edit
backend/components.py - Add the component to the appropriate category:
"YourComponent": {
"class": "sklearn.module.YourComponent",
"description": "Component description",
"parameters": {
"param_name": {
"type": "number|boolean|select|string",
"default": default_value,
"description": "Parameter description"
}
},
"input": "numeric|categorical",
"output": "numeric|prediction"
}- Restart the backend server
Create a .env file in the frontend directory:
REACT_APP_API_URL=http://localhost:8000cd backend
pytestcd frontend
npm testcd backend
pip install gunicorn
gunicorn app:app --workers 4 --bind 0.0.0.0:8000cd frontend
npm run buildDeploy the build/ directory to your hosting service.
- Ensure the backend server is running on port 8000
- Check CORS settings in
app.py
- Verify the backend API is accessible at
http://localhost:8000/api/components - Check browser console for errors
- Ensure all components are connected properly
- Only one estimator should be present at the end of the pipeline
- Check for circular connections
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Add more sklearn components (clustering, ensemble methods)
- Pipeline templates for common use cases
- Export to Jupyter notebook format
- Pipeline performance visualization
- Integration with MLflow for experiment tracking
- Support for custom transformers
- Collaborative editing features
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with React Flow for the visual editor
- Powered by FastAPI and scikit-learn
- Inspired by the need for faster ML pipeline prototyping
For issues, questions, or contributions, please open an issue on GitHub.
Made with ❤️ for the ML community