The inverted double pendulum is a hallmark of control theory, renowned for its instability and nonlinear dynamics. This project explores the challenge of stabilizing this system using the Soft Actor-Critic (SAC) algorithm, a state-of-the-art reinforcement learning method, within the MuJoCo physics engine. Through empirical experimentation, we harness SAC to develop a robust control strategy that balances the double pendulum upright with minimal torque, effectively navigating its complex behavior. Simulation results highlight SAC’s capability to adaptively learn policies for this demanding task, offering practical insights into its application for continuous control problems. This project demonstrates the power of SAC in addressing intricate dynamical systems and contributes to the growing field of reinforcement learning in control theory.
Play around with this notebook to gain knowledge of balancing the inverted double pendulum with SAC.
The green line denotes rewards and the blue line indicates the best moving average. The best-moving average is gathered by applying an arithmetic mean to the reward that is better than the current best-moving average. This is used to decide whether the episode is worth keeping or not.
These are the evolution of the control of the inverted double pendulum. The control is progressively better.
![]() |
![]() |
![]() |
- Neuronlike adaptive elements that can solve difficult learning control problems
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- Soft Actor-Critic
- [gymnasium / mujoco] colaboratory で "HalfCheetah-v5" を動かそうとして FatalError: gladLoadGL error が発生したときの対策
- Inverted Double Pendulum
- PyTorch Lightning