A Python implementation of a framework for scheduling deep learning applications on heterogeneous embedded devices (CPU, GPU, NPU) using genetic algorithms.
Based on a Paper By DUSEOK KANG, JINWOO OH, JONGWOO CHOI, YOUNGMIN YI And SOONHOI HA: "Scheduling of Deep Learning Applications Onto Heterogeneous Processors in an Embedded Device"
Paper: https://ieeexplore.ieee.org/ielx7/6287639/8948470/09019698.pdf
uv syncpython test_installation.py# Quick example (10 seconds)
cd examples && python basic_usage.py
# Interactive menu with 6 demos
cd src && python main.pyThat's it! See results in the src/results/ directory.
This framework simulates hardware as well as deep learning tasks to schedules deep learning layers across heterogeneous processors (CPU, GPU, NPU) to optimize:
- ⚡ Throughput - Applications processed per second
- 🔋 Energy - Power consumption in mJ
- ⏱️ Latency - Response time in ms
Using a Genetic Algorithm that evolves better schedules over multiple generations.
DL Application → Genetic Algorithm → Optimized Schedule → Visualizations
↓ ↓ ↓ ↓
(Layers) (Evolve 50+ gens) (CPU/GPU/NPU) (Gantt Charts)
- ✅ Heterogeneous Scheduling - CPU, GPU, NPU processors
- ✅ Multi-Core Support - 4 individual CPU cores for fine-grained scheduling
- ✅ DVFS - Dynamic Voltage/Frequency Scaling
- ✅ Multi-Objective Optimization - Balance throughput, energy, latency
- ✅ Genetic Algorithm - Configurable operators and parameters
- ✅ Pareto Front Analysis - Trade-off exploration
- ✅ Rich Visualizations - Gantt charts, utilization plots
- ✅ Animation Support - Watch GA evolution in real-time! 🎬
- ✅ 🆕 Population Diversity Heatmaps - Track genetic diversity evolution
- ✅ 🆕 Mutation Tracking - Visualize chromosome mutation patterns
- ✅ 🆕 Interactive HTML Animations - Explore results in your browser
- ✅ Pre-built Networks - CNN, ResNet, MobileNet
- ✅ Customizable - Hardware platforms, DL apps, GA parameters
- Quick Start Guide - Get running in 3 steps
- Project Overview - Structure and features
- NPU Configuration - Enable/disable NPU processor
- Viewing Plots - How to view visualizations in WSL2/Linux
- Animation Quick Start - Create animations in 30 seconds
- Animation Guide - Complete animation documentation
- Bug Fixes - Known issues and solutions
- Changelog - Complete history of changes
- Full Documentation Index - Complete documentation hub
from libs import (
create_simple_platform,
create_cnn_application,
GeneticAlgorithm,
GAConfig,
plot_gantt_chart
)
# 1. Create hardware platform
hardware = create_simple_platform() # 4 CPU cores + GPU + NPU
# 2. Create DL application
app = create_cnn_application("CNN", start_time=0.0)
# 3. Configure genetic algorithm
config = GAConfig(
population_size=30,
max_generations=50,
verbose=True
)
# 4. Run optimization
ga = GeneticAlgorithm([app], hardware, config)
best_chromosome, best_schedule = ga.evolve()
# 5. Visualize results
plot_gantt_chart(best_schedule, save_path="schedule.png", show=True)cd src && python main.pyChoose from 6 demo scenarios:
- Simple Scenario (2 apps) - 15 seconds
- Multi-Application (3 apps) - 30 seconds
- Optimization Goals Comparison - 45 seconds
- Pareto Front Analysis - 20 seconds
- Galaxy S9 Platform - 60 seconds
- Run All Examples - 3 minutes
All results saved to src/results/ directory.
Create animated GIFs showing how the genetic algorithm evolves:
from libs import GAAnimator, create_animation_callback
# Create animator
animator = GAAnimator(save_dir="./animations")
# Add to GA
ga.add_generation_callback(create_animation_callback(animator, track_diversity=True))
ga.evolve()
# Generate animations
animator.create_combined_animation("evolution.gif", fps=2)
animator.create_diversity_heatmap_animation("diversity.gif", fps=2) # NEW!
animator.create_interactive_html_animation("dashboard.html") # NEW!See: Animation Quick Start for a 30-second tutorial.
Three powerful new visualization features:
- Population Diversity Heatmaps - See gene distribution and entropy over time
- Mutation Tracking - Visualize which genes mutate and identify hotspots
- Interactive HTML Dashboard - Explore all metrics in your browser with zoom/pan
# Try the advanced demo
cd src
python3 example_advanced_animation.pyThis generates 6 visualizations including interactive HTML you can explore in Chrome/Firefox!
Generation 0 | Best: 111.62 | Throughput: 134.08 apps/s | Energy: 22308.3 mJ
Generation 10 | Best: 0.41 | Throughput: 132.60 apps/s | Energy: 17791.7 mJ
Generation 20 | Best: 0.25 | Throughput: 141.18 apps/s | Energy: 18033.3 mJ
Generation 50 | Best: 0.20 | Throughput: 145.32 apps/s | Energy: 16245.1 mJ
================================================================================
GENETIC ALGORITHM RESULTS
================================================================================
Best Solution:
Throughput: 145.32 applications/second
Energy: 16245.1 mJ
Latency: 13.8 ms
Makespan: 13.8 ms
Processor Utilization:
cpu0: 5.2%
cpu1: 12.8%
cpu2: 0.0%
cpu3: 8.5%
gpu: 38.4%
npu: 89.7%
================================================================================
config = GAConfig(
population_size=50, # Number of candidate schedules
max_generations=100, # Evolution iterations
selection_method='tournament', # tournament, roulette, rank
crossover_method='two_point', # single_point, two_point, uniform
mutation_method='adaptive', # bit_flip, swap, gaussian, adaptive
crossover_rate=0.8,
mutation_rate=0.1,
elite_size=2, # Best schedules preserved
verbose=True
)ga.set_optimization_weights(
throughput=2.0, # Prioritize speed
energy=0.5, # Less focus on energy
latency=1.0 # Normal latency
)# Without NPU (CPU + GPU only)
hardware = create_simple_platform(enable_npu=False)
# With NPU (CPU + GPU + NPU) - Default
hardware = create_simple_platform(enable_npu=True)See NPU Configuration Guide for details.
dl-schedular-framework-using-ga/
│
├── 📄 README.md # This file
│
├── 📚 docs/ # Documentation
│ ├── README.md # Documentation index
│ ├── quickstart.md # Quick start guide
│ ├── animation-quickstart.md # Animation quick start
│ ├── animations-guide.md # Complete animation guide
│ ├── npu-configuration.md # NPU configuration
│ ├── viewing-plots.md # Plot viewing guide
│ ├── project-info.md # Project overview
│ ├── changelog.md # Version history
│ └── bugfixes.md # Bug fix history
│
├── 🔬 src/ # Source code
│ ├── libs/ # Core framework modules
│ │ ├── hardware.py # Processor models
│ │ ├── dl_application.py # DL network definitions
│ │ ├── schedule.py # Schedule simulation
│ │ ├── chromosome.py # GA encoding/decoding
│ │ ├── fitness.py # Fitness evaluation
│ │ ├── genetic_operators.py # Selection, crossover, mutation
│ │ ├── genetic_algorithm.py # Main GA loop
│ │ ├── visualization.py # Charts and plots
│ │ ├── animation.py # Animation system
│ │ └── __init__.py # Package exports
│ ├── main.py # Interactive demo menu
│ └── example_animation.py # Animation example
│
├── 📝 examples/ # Usage examples
│ ├── basic_usage.py # Simple example
│ └── custom_application.py # Custom networks
│
├── 🧪 test_installation.py # Installation verification
├── ⚙️ requirements.txt # Python dependencies
├── ⚙️ pyproject.toml # Project configuration
└── 📦 uv.lock # Dependency lock file
Shows when each DL layer executes on each processor.
Shows how busy each processor is during execution.
Tracks fitness improvement over generations.
Visualizes trade-offs between competing objectives.
Watch the GA evolve in real-time with animated GIFs!
See Animation Guide for details.
- Initialize - Create 30-50 random schedules
- Evaluate - Calculate fitness (throughput, energy, latency)
- Select - Choose best schedules as parents
- Crossover - Combine two parent schedules
- Mutate - Randomly modify schedules
- Elitism - Keep best solutions unchanged
- Repeat - For 50-100 generations
- Result - Best schedule found
Each schedule is encoded as:
[cpu_cores, cpu_freq, layer0_proc, layer1_proc, ...]
Example: [4, 2, 0, 1, 2, 1, 0]
↓ ↓ └─ Processor assignments for each layer
│ └─ CPU frequency index
└─ CPU core count
fitness = w1 × throughput - w2 × energy - w3 × latency
Weights can be adjusted to prioritize different objectives.
This framework implements key concepts from the paper:
✅ Heterogeneous Scheduling - Different processor types
✅ DVFS - Dynamic frequency scaling
✅ Multi-Objective - Balance multiple goals
✅ Genetic Algorithm - Evolutionary optimization
✅ Dependency Handling - Respects layer order
✅ Pareto Front - Trade-off analysis
✅ Real Hardware Models - Based on Galaxy S9, HiKey970
from libs import Processor, HardwarePlatform
# Define custom processor
my_cpu = Processor(
name='my_cpu',
core_count=8,
freq_list=[1000, 1500, 2000], # MHz
power_list=[500, 800, 1200], # mW
max_util=0.9,
base_performance=1.0
)
platform = HardwarePlatform([my_cpu, my_gpu, my_npu])apps = [
create_cnn_application("App1", start_time=0.0),
create_resnet_application("App2", start_time=0.0),
create_mobilenet_application("App3", start_time=5.0)
]
ga = GeneticAlgorithm(apps, hardware, config)from libs import Layer, DLApplication
layers = [
Layer(
name="conv1",
layer_type="conv",
base_exec_times={'cpu': 20.0, 'gpu': 8.0, 'npu': 6.0},
dependencies=[],
priority=2
),
# Add more layers...
]
app = DLApplication(name="MyNet", layers=layers, start_time=0.0)# Make sure you're in the correct directory
cd src
python main.pyUserWarning: FigureCanvasAgg is non-interactive
This is normal on headless systems. Use save_path parameter:
plot_gantt_chart(schedule, save_path="chart.png", show=False)See Viewing Plots Guide for WSL2/Linux specific instructions.
- Reduce
population_size(try 20) - Reduce
max_generations(try 30) - Use simpler applications
See Bug Fixes for known issues and solutions.
Based on research paper:
"Scheduling of Deep Learning Applications Onto Heterogeneous
Processors in an Embedded Device"
https://ieeexplore.ieee.org/ielx7/6287639/8948470/09019698.pdf
If you use this framework in your research, please cite the original paper.