Gridworld

My grid world project (Swen 711)

Introduction

Implementing Grid-world domain that we learned in class. This is done with 3 steps...

Have the agent uniformly randomly select actions. Run 10,000 episodes. Report the mean, standard deviation, maximum, and minimum of the observed discounted returns.
Implement the value iteration algorithm to find the optimal policy. In this case, the agent will select actions that will provide maximum future discounted rewards. Report the optimal policy.
Run the optimal policy that you found in [2] 10,000 times. Compare the mean, standard deviation, maximum, and minimum of the observed discounted returns with [1]

Environment Dynamics

p = 0.8
- The correct action is attempted
p = 0.05
- The agent is confused and moves +90°
p = 0.05
- The agent is confused and moves-90°
p = 0.1
- The agent is confused and does not move

Assumptions

The agent cannot move out of the world, an attempt to do so will result in no movement

Part 1

See initial.py for the python code related to part 1. The Environmental Dynamics lead to a level of variablility, however, a sample of the statistical analysis observed is returned below:

Mean:   -26.196
Standard Deviation: 50.85970491459816
Maximum:  10
Minimuim:  -480

Part 2

See optimal.py for the python code related to Part 2. Below is the world in which the maximum future discounted rewards are displayed. Note that this was found using the Value Iteration method and iterated until the values changes were less than 0.05, which took 14 iterations. Also note that the xxx.xxx spaces represent the obstacles (states that cannot be entered).

[+003.74] [+004.24] [+004.79] [+005.40] [+005.96] 
[+004.06] [+004.67] [+005.37] [+006.13] [+006.79] 
[+003.59] [+004.08] [xxx.xxx] [+006.95] [+007.73] 
[+003.16] [+003.56] [xxx.xxx] [+007.82] [+008.79] 
[+002.72] [+002.43] [-010.00] [+008.79] [+010.00]

Part 3

See gridworld.py for the python code related to Part 3. The code applies the policy above to part 1. It was ran 10,000 times and the statistics lie below:

Mean:  10.0
Standard Deviation:  0.0
Maximum:  10.0
Minimuim:  10.0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
.gitattributes		.gitattributes
README.md		README.md
gridworld.py		gridworld.py
initial.py		initial.py
optimal.py		optimal.py
state.py		state.py
val.py		val.py
world.py		world.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gridworld

Introduction

Environment Dynamics

Assumptions

Part 1

Part 2

Part 3

About

Releases

Packages

Languages

gwikina/Gridworld

Folders and files

Latest commit

History

Repository files navigation

Gridworld

Introduction

Environment Dynamics

Assumptions

Part 1

Part 2

Part 3

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages