-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
68 lines (55 loc) · 3.02 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Reinforcement Learning
--------------------------------------------------------------------------------
This repository mainly demonstrates the Reinforcement Learning algorithms
introduced in the book: "Reinforcement Learning: An Introduction" by
Sutton and Barto. For more information about this invaluable book, please
visit http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=7548.
All the codes here are implemented in Erlang, and mainly for instructive
purposes, but not guaranteed to be good for practical applications.
---------------------------------------------------------------------------------
Usage:
cd RL
erl -make
erl -pa ebin
>greedy_bandit:run(10,2000,0.1,"egreedy_0.1.dat").
%this will generate a data file "egreedy_0.1.dat" recording the play number,
%average reward got by the agent and the percentage of the optimal action selected.
%To generate a figure like Fig. 2.1 in the book, run the gnuplot command:
%set xlabel "Plays", set ylabel "Average Rewards",
%plot "egreedy_0.1.dat" u 1:2 t "Epsilon Greedy Method with Epsilon=0.1" w l
%set xlabel "Plays", set ylabel "Percent of Optimal Action",
%plot "egreedy_0.1.dat" u 1:3 t "Epsilon Greedy Method with Epsilon=0.1" w l
%
%For the meaning of the arguments please see the source file "greedy_bandit.erl"
>softmax_bandit:run(10, 2000, 2, "softmax_2.dat").
%This will simulate the agent which uses softmax to solve the n-Armed Bandit problem
....
------------------------------------------------------------------------------------
moduel gridworld.erl is an example in the chapter 3 of the book, illuminating the
Bellman equations for state-value function and optimal state-value function.
To use it:
>gridworld:run(). %To get the state-values
>gridworld:run_optimal() % To get the optimal state-values
------------------------------------------------------------------------------------
chapter 4
module policy_eva.erl is an example of iterative policy evaluation.
>policy_eva:start().
>policy_eva:run(equi_prob). %To evaluate the policy of equiprobably selecting actions
>policy_eva:pause(). %To pause the ongoing evaluation
>policy_eva:run(optimal). %To evaluate the optimal policy
>policy_eva:stop(). %To stop the process.
module car_rental.erl is an example of the Policy Iteration
>car_rental:start().
>car_rental:run() %To start the policy iteration process
>car_rental:print() % To see the final policy and value fuctions
>car_rental:stop(). %To stop the state machine
module gambler.erl is an example of Value Iteration
>gambler:start() % to start the Value Iteration process
>gambler:print(). % to print out the final policy and value function
>gambler:stop(). %to stop the state machine
--------------------------------------------------------------------------------------
Chapter 5
moduel mc_blackjack.erl simulates the Monte Carlo method solving the balckjack problem
>mc_blackjack:run(5000). %Sample 5000 episodes
moduel es_blackjak.erl simulates the ES Monte Carlo method solving the blackjack problem
>es_blackjack:run(5000). %Sample 5000 episodes with Exploring Starting