This repository contains the implementation of a reinforcement learning agent for active cell balancing. The agent utilizes the Proximal Policy Optimization (PPO) algorithm with a custom feature extractor that combines a 1D Convolutional Neural Network (CNN) for local feature extraction and a Transformer for modeling global dependencies among battery cells. This architecture aims to capture both short-range correlations and long-range interactions within the battery pack to achieve effective balancing.
- Environment:
https://github.com/messlem99/Battery_Cell_Balancing
- CNN-Transformer Feature Extractor:
Combines a 1D Convolutional Neural Network (1D-CNN) for local feature extraction with a Transformer encoder for modeling global dependencies. - Custom PPO Policy:
A tailored PPO policy integrates the CNN-Transformer extractor for enhanced decision making. - Logging and Checkpointing:
Uses TensorBoard for monitoring training metrics and callback-based checkpoint saving. - End-to-End Training Pipeline:
Supports vectorized environments and optimized hyperparameters for efficient PPO training.
- Clone the Repository:
git clone https://github.com/messlem99/CNN-Transformer.git cd CNN-Transformer
- Training the Model To train the PPO model with the CNN-Transformer feature extractor ensure you import the environment
The architecture integrates two key modules:
- 1D-CNN for Local Feature Extraction
- Input: Historical per-cell features (voltage and SOC) arranged as a 1D sequence.
- Operation: Two convolutional layers with LeakyReLU activations, layer normalization, and dropout capture local patterns among adjacent cells.
- Transformer Encoder for Global Dependency Modeling:
- Reshaping: The CNN output is reshaped so that each cell acts as a token.
- Attention Mechanism: A multi-head self-attention Transformer encoder captures long-range dependencies across the battery pack.
- Global Pooling: Averages features across cells to obtain a pack-level representation.
- Feature Aggregation and Final Layers:
- Concatenation: Combines the global features with derived features (including current load)
- Fully Connected Layers: Processes the combined vector to produce the final feature representation for the PPO policy.
- This project is licensed under the MIT License. See the LICENSE file for details.
Contributions and enhancements are welcome. To get started:
- Fork the repository and create your feature branch.
- Submit pull requests for review.
To cite this project in publications:
@misc{CNN-Transformer2025,
author = {Abdelkader Messlem and Youcef Messlem and Ahmed Safa},
title = {Hybrid Convolutional Neural Network with Transformer architecture for feature extraction within a Proximal Policy Optimization (PPO) RL framework},
year = {2025},
howpublished = {\url{https://github.com/messlem99/CNN-Transformer}},
}
- J. Li, Q. Xu, X. He, Z. Liu, D. Zhang, R. Wang, R. Qu, and G. Qiu, “Cfformer: Cross cnn-transformer channel attention and spatial feature fusion for improved segmentation of low quality medical images,” 2025. [Online]. Available: https://arxiv.org/abs/2501.03629
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017. [Online]. Available: https://arxiv.org/abs/1707.06347
- J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” 2018. [Online]. Available: https://arxiv.org/abs/1506.02438
- A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021. [Online]. Available: http://jmlr.org/papers/v22/ 20-1364.html
- “CS231N Convolutional Neural Networks for Visual Recognition.” https://cs231n.github.io/convolutional-networks/
- “Tutorial 6: Transformers and Multi-Head Attention — UvA DL Notebooks v1.2 documentation.” https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial6/Transformers_and_MHAttention.html