Issue:
i) Long training and slow iteration for image-based RL projects on low-end GPUs, such as MX450, Intel Iris and low VRAM issues, which take thousands of epochs and large wall-clock time making experimentation, development and benchmark validation slow and costly.
ii) Slow feedback loop: Long wall clock traning is preventing rapid prototyping and tuning of architectures and the hyper-parameters.
iii) Many developers and contributors with modest specs cannot reproduce results or run experiments. This also results in extended training and time.
iv) Poor Sample efficiency: heavy models need more env steps to reach good rewards, policies hence worsening time and compute requirements.
Goal: To add a reproducible toolkit for image-based RL that runs well on low-end GPUs.
Deliverables: A lightweight backbone, a teacher-> student distillation recipe, preprocessing wrappers for low res and/or frame stacking, as well as benchmark notebooks showing training/inference cost and performance.
My proposed contribution:
OpenEnv already provides modular envs, trainer hooks, examples and clear contribution guidance that makes it easy for me to plug in new model architectures, recipes and evaluation suites. My contribution would be adding a lighweight model (such as MobileViT/MobileNetV3 or a minimal variant), a distillation pipeline with an existing larger policy(teacher) guiding a student training loop, with RL-aware distillation losses and replay support, Low GPU tooling with data preprocessors(low res, framestack, greyscale options), training flags for low VRAM, as well as quantisation-aware training /post training quantization to INT8.
Benchmarks:
- Result of this proposed pipeline should be measured using the following:
a) GPU Memory
b) Wall time
c) FPS
d) Sample Efficiency
e) Final policy performance on target devices.
- Using two representative image-based envs from OpenEnv(simple and medium complexity ones) as env, and testing baselines using existing default model vs light model vs distilled+ quantized student.
Language Integration:
1) Follows a Python-first approach, keeping contributions in Python/PyTorch for compatability with OpenEnv, HuggingFace tooling and dev accessibility. Using ONNX to provide optimized inference on GPU/CPU without forcing new dependencies.
2) Optional Considerations: C/Cpp or Julia for future reference in case a measured performance gap requires a specialised kernel.
Requirements:
1) Permission and Guidance onto starting an initial proposal, and then test it out using the first component(lighweight model+ example usage)
Already have a testing laptop at hand to develop and run the benchmark scripts and attach the logs and results.
Issue:
i) Long training and slow iteration for image-based RL projects on low-end GPUs, such as MX450, Intel Iris and low VRAM issues, which take thousands of epochs and large wall-clock time making experimentation, development and benchmark validation slow and costly.
ii) Slow feedback loop: Long wall clock traning is preventing rapid prototyping and tuning of architectures and the hyper-parameters.
iii) Many developers and contributors with modest specs cannot reproduce results or run experiments. This also results in extended training and time.
iv) Poor Sample efficiency: heavy models need more env steps to reach good rewards, policies hence worsening time and compute requirements.
Goal: To add a reproducible toolkit for image-based RL that runs well on low-end GPUs.
Deliverables: A lightweight backbone, a teacher-> student distillation recipe, preprocessing wrappers for low res and/or frame stacking, as well as benchmark notebooks showing training/inference cost and performance.
My proposed contribution:
OpenEnv already provides modular envs, trainer hooks, examples and clear contribution guidance that makes it easy for me to plug in new model architectures, recipes and evaluation suites. My contribution would be adding a lighweight model (such as MobileViT/MobileNetV3 or a minimal variant), a distillation pipeline with an existing larger policy(teacher) guiding a student training loop, with RL-aware distillation losses and replay support, Low GPU tooling with data preprocessors(low res, framestack, greyscale options), training flags for low VRAM, as well as quantisation-aware training /post training quantization to INT8.
Benchmarks:
a) GPU Memory
b) Wall time
c) FPS
d) Sample Efficiency
e) Final policy performance on target devices.
Language Integration:
1) Follows a Python-first approach, keeping contributions in Python/PyTorch for compatability with OpenEnv, HuggingFace tooling and dev accessibility. Using ONNX to provide optimized inference on GPU/CPU without forcing new dependencies.
2) Optional Considerations: C/Cpp or Julia for future reference in case a measured performance gap requires a specialised kernel.
Requirements:
1) Permission and Guidance onto starting an initial proposal, and then test it out using the first component(lighweight model+ example usage)
Already have a testing laptop at hand to develop and run the benchmark scripts and attach the logs and results.