-
Notifications
You must be signed in to change notification settings - Fork 2
Reinforcement Learning Algorithms implemented by Erlang. This repository is mainly for the demonstration of the Reinforcement Learning methods introduced in the book: "Reinforcement Learning: An Introduction" by Sutton and Barto. Most of the codes are chiefly for instructive purpose but not guaranteed to be good for practical applications.
barcoyou/RL
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Reinforcement Learning -------------------------------------------------------------------------------- This repository mainly demonstrates the Reinforcement Learning algorithms introduced in the book: "Reinforcement Learning: An Introduction" by Sutton and Barto. For more information about this invaluable book, please visit http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=7548. All the codes here are implemented in Erlang, and mainly for instructive purposes, but not guaranteed to be good for practical applications. --------------------------------------------------------------------------------- Usage: cd RL erl -make erl -pa ebin >greedy_bandit:run(10,2000,0.1,"egreedy_0.1.dat"). %this will generate a data file "egreedy_0.1.dat" recording the play number, %average reward got by the agent and the percentage of the optimal action selected. %To generate a figure like Fig. 2.1 in the book, run the gnuplot command: %set xlabel "Plays", set ylabel "Average Rewards", %plot "egreedy_0.1.dat" u 1:2 t "Epsilon Greedy Method with Epsilon=0.1" w l %set xlabel "Plays", set ylabel "Percent of Optimal Action", %plot "egreedy_0.1.dat" u 1:3 t "Epsilon Greedy Method with Epsilon=0.1" w l % %For the meaning of the arguments please see the source file "greedy_bandit.erl" >softmax_bandit:run(10, 2000, 2, "softmax_2.dat"). %This will simulate the agent which uses softmax to solve the n-Armed Bandit problem .... ------------------------------------------------------------------------------------ moduel gridworld.erl is an example in the chapter 3 of the book, illuminating the Bellman equations for state-value function and optimal state-value function. To use it: >gridworld:run(). %To get the state-values >gridworld:run_optimal() % To get the optimal state-values ------------------------------------------------------------------------------------ chapter 4 module policy_eva.erl is an example of iterative policy evaluation. >policy_eva:start(). >policy_eva:run(equi_prob). %To evaluate the policy of equiprobably selecting actions >policy_eva:pause(). %To pause the ongoing evaluation >policy_eva:run(optimal). %To evaluate the optimal policy >policy_eva:stop(). %To stop the process. module car_rental.erl is an example of the Policy Iteration >car_rental:start(). >car_rental:run() %To start the policy iteration process >car_rental:print() % To see the final policy and value fuctions >car_rental:stop(). %To stop the state machine module gambler.erl is an example of Value Iteration >gambler:start() % to start the Value Iteration process >gambler:print(). % to print out the final policy and value function >gambler:stop(). %to stop the state machine -------------------------------------------------------------------------------------- Chapter 5 moduel mc_blackjack.erl simulates the Monte Carlo method solving the balckjack problem >mc_blackjack:run(5000). %Sample 5000 episodes moduel es_blackjak.erl simulates the ES Monte Carlo method solving the blackjack problem >es_blackjack:run(5000). %Sample 5000 episodes with Exploring Starting
About
Reinforcement Learning Algorithms implemented by Erlang. This repository is mainly for the demonstration of the Reinforcement Learning methods introduced in the book: "Reinforcement Learning: An Introduction" by Sutton and Barto. Most of the codes are chiefly for instructive purpose but not guaranteed to be good for practical applications.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published