Welcome to the GPU Optimizer for ML Models! This project aims to optimize GPU performance for machine learning models, leveraging advanced techniques and a wide array of technologies.
The GPU Optimizer for ML Models is a comprehensive platform designed to improve GPU performance for training and deploying machine learning models. This platform features a web-based interface for managing and monitoring models, an API for integration, and robust backend services for data processing and model optimization.
- GPU Performance Optimization: Improve GPU utilization for training ML models using advanced scheduling and resource management.
- Model Management: Upload, manage, and monitor ML models through a web-based interface.
- Data Processing: Use Spark, Hadoop, and other big data technologies for data transformation and analysis.
- Real-time Monitoring: Monitor GPU utilization and performance in real-time.
- Secure API: Securely manage models and GPU resources via a robust API.
The GPU Optimizer for ML Models project is organized into several directories and files, each serving a specific purpose. Below is a detailed breakdown of the project structure:
MLGpuOptimizer/
├── backend_api/
│ ├── config/
│ │ └── config.py
│ ├── controllers/
│ │ ├── model_controller.py
│ │ └── gpu_controller.py
│ ├── models/
│ │ └── model.py
│ ├── routes/
│ │ └── api_routes.py
│ ├── services/
│ │ ├── gpu_service.py
│ │ └── model_service.py
│ ├── utils/
│ │ └── optimization.py
│ ├── app.py
│ └── Dockerfile
├── data_processing/
│ ├── spark_jobs/
│ │ ├── data_transformation.py
│ │ ├── data_aggregation.py
│ │ └── data_analysis.py
│ ├── hadoop_jobs/
│ │ └── hadoop_config.py
│ ├── utils/
│ │ └── spark_utils.py
│ └── Dockerfile
├── web_interface/
│ ├── public/
│ │ └── index.html
│ ├── src/
│ │ ├── components/
│ │ │ ├── Header.js
│ │ │ ├── Footer.js
│ │ │ ├── ModelUpload.js
│ │ │ ├── ModelMonitor.js
│ │ │ └── GpuStats.js
│ │ ├── pages/
│ │ │ ├── HomePage.js
│ │ │ ├── UploadPage.js
│ │ │ └── MonitorPage.js
│ │ ├── services/
│ │ │ └── api.js
│ │ ├── App.js
│ │ ├── index.js
│ │ └── App.css
│ └── Dockerfile
├── db_init/
│ └── init.sql
├── scripts/
│ └── deploy.sh
├── README.md
- Docker
- Node.js and npm
- Python and pip
- MySQL
- Clone the Repository:
git clone https://github.com/yourusername/MLGpuOptimizer.git cd MLGpuOptimizer - Build and Run Backend: cd backend_api docker build -t gpu_optimizer_backend . docker run -d -p 5000:5000 --name gpu_optimizer_backend gpu_optimizer_backend
- Build and Run Web Interface: cd ../web_interface docker build -t gpu_optimizer_frontend . docker run -d -p 3000:3000 --name gpu_optimizer_frontend gpu_optimizer_frontend
- Inititalize Database: docker exec -i mysql_container mysql -u root -p < db_init/init.sql
This guide provides instructions on how to use the GPU Optimizer for ML Models.
-
Access the Web Interface:
- Open your browser and navigate to
http://localhost:3000.
- Open your browser and navigate to
-
Navigate the Dashboard:
- Use the navigation menu to access different sections of the application.
-
Upload Model:
- Navigate to the "Upload Model" section.
- Click the "Choose File" button and select the model file to upload.
- Click the "Upload" button to upload the model.
-
Monitor Models:
- Navigate to the "Monitor Models" section.
- View the list of uploaded models and their status.
- Each model entry shows the model name and its current status (e.g., uploaded, optimized).
- GPU Stats:
- Navigate to the "GPU Stats" section.
- View real-time GPU utilization and memory statistics.
- The stats include GPU utilization percentage, total memory, free memory, and used memory.
-
Upload Your Model:
- Go to the "Upload Model" section.
- Select your model file (e.g., a PyTorch model file) and upload it.
- Wait for the upload to complete and check the model status in the "Monitor Models" section.
-
Monitor Optimization:
- Once the model is uploaded, the system automatically starts optimizing the model.
- You can monitor the optimization process in the "Monitor Models" section.
- The status will change from "uploaded" to "optimized" once the optimization is complete.
- Access GPU Stats:
- Go to the "GPU Stats" section.
- View real-time statistics of GPU performance, including utilization and memory usage.
- Use this information to ensure that your GPUs are being utilized efficiently and identify any potential bottlenecks.
For more detailed instructions and troubleshooting, refer to the FAQ section below.
- To reset the database, you can re-run the database initialization script:
docker exec -i mysql_container mysql -u root -p<password> < db_init/init.sql
- Currently, the platform supports PyTorch model files. Support for other model types can be added by extending the backend services.
- Contributions are welcome!
If you have any further questions or need assistance, feel free to reach out to the project maintainers.
Happy coding!
These steps were verified in a clean environment without Docker by running the backend locally and using SQLite for persistence.
python -m pip install -r backend_api/requirements.txt
./scripts/run_backend.shThen visit:
- API base:
http://localhost:5000/api - GPU stats:
http://localhost:5000/api/gpu/stats
To run the React web interface locally (requires Node.js + npm with registry access):
cd web_interface
npm install
npm startRun a basic API verification:
./scripts/smoke_test.shIf you need a fully runnable, dependency-free demo (no Docker, no pip, no npm), use the built-in demo server:
./scripts/run_demo.shThen open http://localhost:8080 to access the demo dashboard with upload, monitor, and GPU stats panels.
To verify the demo server automatically:
./scripts/smoke_test_demo.shThe demo server uses the same API paths (/api/model/upload, /api/model/monitor, /api/gpu/stats) and stores data in backend_api/demo.db.
- Docker not available: If
docker --versionfails, use the Verified Quickstart above to run locally. - No GPU /
nvidia-smimissing: The/api/gpu/statsendpoint returns a zeroed placeholder with a note. - Torch/TensorRT not installed: Model optimization falls back to a safe copy and still marks the model as optimized.
- Database connection errors: By default, the backend uses SQLite at
backend_api/gpu_optimizer.db. You can override withDATABASE_URLorSQLALCHEMY_DATABASE_URI. - No dependency environment: Use
./scripts/run_demo.shto run a full demo without Docker, pip, or npm.