Skip to content

Mastering MountainCar-v0: A Comprehensive Exploration of Reinforcement Learning Algorithms

License

Notifications You must be signed in to change notification settings

pavlosdais/MountainCar-v0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MountainCar-v0

The MountainCar-v0 environment is a classic reinforcement learning challenge. The goal is to drive a car up a steep hill by building momentum.

MountainCar-v0 Demo

This project applies several reinforcement learning (RL) algorithms to solve the MountainCar-v0 problem. Detailed analyses and findings are available in the report. In short, the algorithms explored are:

  • Overview:
    DQN uses a deep neural network to estimate Q-values and leverages experience replay and target networks for stability. We also explored a variant called noisy DQN, which introduces randomness in the network parameters to encourage better exploration.
  • Results:
    The standard DQN agent improves gradually, but its performance is sensitive to hyperparameters like the learning rate and batch size. The noisy DQN variant showed notable improvement under certain conditions.

  • Overview:
    DRQN adds LSTM layers to capture memory over time. This is useful when the full state (like velocity) isn’t visible.
  • Tests Conducted:
    • Partial Observability: Only the car’s position is given.
    • Noisy Observations: Gaussian noise is added to the position.
  • Results:
    DRQN performs better than DQN when some information is missing or noisy, thanks to its ability to remember past observations.

  • Overview:
    PPO is a policy gradient method that uses a clipping mechanism to prevent large, unstable updates.
  • Results:
    The agent learns steadily and reliably, with the clipping strategy helping to keep the training process stable.

  • Overview:
    A2C combines an actor (which chooses actions) and a critic (which evaluates them) to improve learning efficiency.
  • Results:
    A2C offers a stable, straightforward alternative with performance comparable to PPO in many scenarios.