This project is presented in Jupyter Notebook format, providing visibility into the implementation of class definitions and algorithm training. You can observe the program's execution by using the rl.play(True, True) command within the pygame environment. This command executes the program, irrespective of the epsilon capability, facilitating a more focused search.
The number of model states is contingent upon the environment's size. We achieve state reduction by equating certain positions, streamlining the model's complexity.
States: These correspond to the agent's positions within the environment. Actions: Define agent movements, encompassing "up," "down," "left," and "right." Rewards: Define the system of penalties and incentives governing agent behavior. Goal State: Identified as "T," this marks the endpoint the agent must reach.
The learning rate (α) significantly influences the algorithm's performance:
It affects the speed of convergence and oscillation. It strikes a balance between exploration and exploitation. It plays a pivotal role in stabilization and solution accuracy.
The discount factor (γ) holds a crucial role in reinforcement learning:
It delineates the importance of long-term versus short-term rewards. It guides the pursuit of optimal policies and underscores the significance of achieving the goal. It influences the convergence rate and temporal consistency of the learning process.