MDP Environments for Inverse Reinforcement Learning

This is a fork of https://github.com/MatthewJA/Inverse-Reinforcement-Learning that includes the following MDP domains:

Gridworld (Sutton, 1998)
Objectworld (Levine et al., 2011)

Requirements

NumPy

Module documentation

mdp

gridworld

Implements the gridworld MDP.

Classes, instance attributes, methods:

Gridworld(grid_size, wind, discount): Gridworld MDP.
- actions: Tuple of (dx, dy) actions.
- n_actions: Number of actions. int.
- n_states: Number of states. int.
- grid_size: Size of grid. int.
- wind: Chance of moving randomly. float.
- discount: MDP discount factor. float.
- transition_probability: NumPy array with shape (n_states, n_actions, n_states) where transition_probability[si, a, sk] is the probability of transitioning from state si to state sk under action a.
- feature_vector(i, feature_map="ident"): Get the feature vector associated with a state integer.
- feature_matrix(feature_map="ident"): Get the feature matrix for this gridworld.
- int_to_point(i): Convert a state int into the corresponding coordinate.
- point_to_int(p): Convert a coordinate into the corresponding state int.
- neighbouring(i, k): Get whether two points neighbour each other. Also returns true if they are the same point.
- reward(state_int): Reward for being in state state_int.
- average_reward(n_trajectories, trajectory_length, policy): Calculate the average total reward obtained by following a given policy over n_paths paths.
- optimal_policy(state_int): The optimal policy for this gridworld.
- optimal_policy_deterministic(state_int): Deterministic version of the optimal policy for this gridworld.
- generate_trajectories(n_trajectories, trajectory_length, policy, random_start=False): Generate n_trajectories trajectories with length trajectory_length, following the given policy.

objectworld

Implements the objectworld MDP described in Levine et al. 2011.

Classes, instance attributes, methods:

OWObject(inner_colour, outer_colour): Object in objectworld.
- inner_colour: Inner colour of object. int.
- outer_colour: Outer colour of object. int.
Objectworld(grid_size, n_objects, n_colours, wind, discount): Objectworld MDP.
- actions: Tuple of (dx, dy) actions.
- n_actions: Number of actions. int.
- n_states: Number of states. int.
- grid_size: Size of grid. int.
- n_objects: Number of objects in the world. int.
- n_colours: Number of colours to colour objects with. int.
- wind: Chance of moving randomly. float.
- discount: MDP discount factor. float.
- objects: Set of objects in the world.
- transition_probability: NumPy array with shape (n_states, n_actions, n_states) where transition_probability[si, a, sk] is the probability of transitioning from state si to state sk under action a.
- feature_vector(i, discrete=True): Get the feature vector associated with a state integer.
- feature_matrix(discrete=True): Get the feature matrix for this gridworld.
- int_to_point(i): Convert a state int into the corresponding coordinate.
- point_to_int(p): Convert a coordinate into the corresponding state int.
- neighbouring(i, k): Get whether two points neighbour each other. Also returns true if they are the same point.
- reward(state_int): Reward for being in state state_int.
- average_reward(n_trajectories, trajectory_length, policy): Calculate the average total reward obtained by following a given policy over n_paths paths.
- generate_trajectories(n_trajectories, trajectory_length, policy): Generate n_trajectories trajectories with length trajectory_length, following the given policy.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
irl_ow		irl_ow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDP Environments for Inverse Reinforcement Learning

Requirements

Module documentation

mdp

gridworld

objectworld

About

Releases

Packages

Languages

License

pedrodbs/Inverse-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

MDP Environments for Inverse Reinforcement Learning

Requirements

Module documentation

mdp

gridworld

objectworld

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages