Skip to content

HishamElamir/PipeLearn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PipeLearn

Visual sklearn Pipeline Builder - A drag-and-drop interface for creating scikit-learn machine learning pipelines.

PipeLearn Python React License

Overview

PipeLearn is a visual tool that helps ML engineers and data scientists build scikit-learn pipelines through an intuitive drag-and-drop interface. No more writing boilerplate code - just drag components, connect them, configure parameters, and generate production-ready Python code.

Features

  • Visual Pipeline Builder: Drag and drop sklearn components onto a canvas
  • 50+ Components: Comprehensive catalog including:
    • Preprocessing (StandardScaler, MinMaxScaler, OneHotEncoder, etc.)
    • Feature Selection (SelectKBest, VarianceThreshold, RFE)
    • Decomposition (PCA, TruncatedSVD)
    • Estimators (LogisticRegression, RandomForest, SVC, etc.)
    • Imputation (SimpleImputer, KNNImputer)
  • Parameter Configuration: Easy-to-use UI for configuring component parameters
  • Pipeline Validation: Real-time validation with helpful error messages
  • Code Generation: Generates clean, production-ready Python code
  • Export Options: Copy to clipboard or download as .py file

Architecture

PipeLearn/
├── backend/              # FastAPI backend
│   ├── app.py           # Main API application
│   ├── components.py    # sklearn component catalog
│   ├── pipeline_generator.py  # Code generation logic
│   └── requirements.txt
│
└── frontend/            # React frontend
    ├── public/
    ├── src/
    │   ├── components/  # React components
    │   │   ├── ComponentPanel.js
    │   │   ├── PipelineEditor.js
    │   │   ├── ComponentNode.js
    │   │   ├── ParameterConfig.js
    │   │   └── CodeViewer.js
    │   ├── App.js
    │   └── index.js
    └── package.json

Getting Started

Prerequisites

  • Python 3.8+
  • Node.js 16+
  • npm or yarn

Installation

1. Clone the repository

git clone https://github.com/yourusername/PipeLearn.git
cd PipeLearn

2. Set up the backend

cd backend
pip install -r requirements.txt

3. Set up the frontend

cd frontend
npm install

Running the Application

1. Start the backend server

cd backend
python app.py

The API will be available at http://localhost:8000

2. Start the frontend development server

In a new terminal:

cd frontend
npm start

The application will open in your browser at http://localhost:3000

Usage Guide

Building a Pipeline

  1. Add Components: Drag components from the left panel onto the canvas
  2. Connect Components: Click and drag from the bottom handle of one component to the top handle of another
  3. Configure Parameters: Click the gear icon (⚙️) on any component to configure its parameters
  4. Build Pipeline: Click the "Build Pipeline" button to generate Python code
  5. Export Code: Copy to clipboard or download the generated code

Example: Classification Pipeline

Here's a simple example of creating a classification pipeline:

  1. Drag StandardScaler (Preprocessing)
  2. Drag PCA (Decomposition)
  3. Drag RandomForestClassifier (Estimators)
  4. Connect them in order: StandardScaler → PCA → RandomForestClassifier
  5. Configure parameters:
    • PCA: n_components = 5
    • RandomForestClassifier: n_estimators = 200, random_state = 42
  6. Click "Build Pipeline"

Generated code:

# Generated by PipeLearn
# sklearn Pipeline Builder

from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Create the pipeline
pipeline = make_pipeline(
    StandardScaler(),
    PCA(n_components=5),
    RandomForestClassifier(n_estimators=200, random_state=42)
)

# Usage example:
# pipeline.fit(X_train, y_train)
# predictions = pipeline.predict(X_test)

API Documentation

Endpoints

GET /api/components

Returns the catalog of available sklearn components.

Response:

{
  "preprocessing": {
    "StandardScaler": {
      "class": "sklearn.preprocessing.StandardScaler",
      "description": "Standardize features...",
      "parameters": {...}
    }
  }
}

POST /api/generate-pipeline

Generates Python code from the visual pipeline.

Request Body:

{
  "nodes": [...],
  "edges": [...]
}

Response:

{
  "success": true,
  "code": "# Generated pipeline code...",
  "pipeline_structure": [...]
}

POST /api/validate-pipeline

Validates the pipeline structure.

Response:

{
  "valid": true,
  "errors": [],
  "warnings": []
}

Component Categories

Preprocessing

  • StandardScaler, MinMaxScaler, RobustScaler
  • Normalizer, OneHotEncoder, LabelEncoder
  • PolynomialFeatures

Feature Selection

  • SelectKBest, VarianceThreshold, RFE

Decomposition

  • PCA, TruncatedSVD

Estimators

Classifiers:

  • LogisticRegression
  • RandomForestClassifier
  • SVC (Support Vector Classification)
  • GradientBoostingClassifier

Regressors:

  • LinearRegression
  • RandomForestRegressor

Imputation

  • SimpleImputer, KNNImputer

Development

Adding New Components

To add a new sklearn component:

  1. Edit backend/components.py
  2. Add the component to the appropriate category:
"YourComponent": {
    "class": "sklearn.module.YourComponent",
    "description": "Component description",
    "parameters": {
        "param_name": {
            "type": "number|boolean|select|string",
            "default": default_value,
            "description": "Parameter description"
        }
    },
    "input": "numeric|categorical",
    "output": "numeric|prediction"
}
  1. Restart the backend server

Environment Variables

Create a .env file in the frontend directory:

REACT_APP_API_URL=http://localhost:8000

Testing

Backend Tests

cd backend
pytest

Frontend Tests

cd frontend
npm test

Deployment

Backend (Production)

cd backend
pip install gunicorn
gunicorn app:app --workers 4 --bind 0.0.0.0:8000

Frontend (Production Build)

cd frontend
npm run build

Deploy the build/ directory to your hosting service.

Troubleshooting

Backend not connecting

  • Ensure the backend server is running on port 8000
  • Check CORS settings in app.py

Components not loading

  • Verify the backend API is accessible at http://localhost:8000/api/components
  • Check browser console for errors

Pipeline validation errors

  • Ensure all components are connected properly
  • Only one estimator should be present at the end of the pipeline
  • Check for circular connections

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Roadmap

  • Add more sklearn components (clustering, ensemble methods)
  • Pipeline templates for common use cases
  • Export to Jupyter notebook format
  • Pipeline performance visualization
  • Integration with MLflow for experiment tracking
  • Support for custom transformers
  • Collaborative editing features

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support

For issues, questions, or contributions, please open an issue on GitHub.


Made with ❤️ for the ML community

About

Enhancement of SKLearn Pipeline

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •