GoTick

Tic-tac-toe - Reinforcement learning exercise in golang

Introduction

The program builds a tournament of the tic-tac-toe games (https://en.wikipedia.org/wiki/Tic-tac-toe). Any number of robot and/or human players attend the tournament. In each session, two out of all players are chosen. These two players play any number of episodes. A robot has a fixed intelligence but gains experience over episodes and sessions. Each robot exports its experience to an data file which is then analyzed and visualized.

To run the program, build the executable file by go get github.com/wcchu/GoTick then run GoTick.

Reinforcement learning algorithm

We use Monte-Carlo method for learning:

sum = 0
for t = T-1 to 0:
  sum = R[t+1] + gamma * sum
  V(x[t]) = update_func(V(x[t]), sum)
end for
return V

where R[t] and x[t] are the reward and the state at time t respectively, V(x) is the value for entering state x, and gamma is the discount rate of reward. The update_func can be chosen between the following two definitions:

(1) Move v toward the value learned in the newest episode (sum), with the learning rate alpha.

update_func(v, s) = v + alpha * (sum - v)

(2) Assign v to the average over all values learned from previous episodes, including the newest one.

update_func(v, s) = (n * v + sum) / (n + 1)

Reward

Reward R is defined at the end of an episode, for each of the 3 outcomes: winning, losing, and draw. Thus R[t] = 0 except at the end of time.

State

At each step in an episode, the state of game for a player is defined by the game board in the player's eye; a board composed by Xs and Os has to be converted to mes and yous, together with the information of who's playing the next step, to be meaningful.

reference: https://github.com/lazyprogrammer/machine_learning_examples/blob/master/rl/tic_tac_toe.py note: I recommend using meta linter https://github.com/alecthomas/gometalinter.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
analyze_convergence.R		analyze_convergence.R
analyze_memory.R		analyze_memory.R
board_state.R		board_state.R
environment.go		environment.go
export.go		export.go
game.go		game.go
main.go		main.go
player.go		player.go
shiny_state_board.R		shiny_state_board.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GoTick

Introduction

Reinforcement learning algorithm

Reward

State

About

Releases

Packages

Languages

wcchu/GoTick

Folders and files

Latest commit

History

Repository files navigation

GoTick

Introduction

Reinforcement learning algorithm

Reward

State

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages