A neural network implementation for Tic-Tac-Toe AI written in Golang.
t-cubed uses a feedforward neural network (FFNN) to choose optimal Tic-Tac-Toe moves through an efficient bitboard representation. The project demonstrates fundamental machine learning concepts in a simple, understandable game environment.
Visit the Quick Start guide to set up the project locally.
- Bitboard Representation: Efficient 9-bit encoding for each player's moves
- Feedforward Neural Network: 18-input, 9-output architecture for move prediction
- Training Pipeline: Generate and train on optimal game positions using the minimax algorithm with alpha-beta pruning
Each player's moves are stored as a 9-bit segment of a 16-bit integer, where each bit represents a board position:
Position mapping:
0 | 1 | 2
---------
3 | 4 | 5
---------
6 | 7 | 8
Example:
- Player 1 (X):
0x00A1 = 0b0000_0000_1010_0001 - Player 2 (O):
0x0006 = 0b0000_0000_0000_0110
X | O | O
---------
_ | _ | X
---------
_ | X | _
Why bit boards? In just 32 bits, we can store the full game state and find open spaces through bitwise operations. In comparison, an array of 9 bytes would take 72 bits. This also allows us to store the state of the game in a format that is also easy to use for the neural network.
- Input Layer: 18 neurons (9 for Player 1, 9 for Player 2)
- Hidden Layer(s): 3 layers x 32 neurons
- Output Layer: 9 neurons (confidence score for each possible move)
Weights are saved in data/weights.json and loaded at runtime.
The network outputs confidence scores for each position. The AI selects the highest-scoring legal move.
Note: The output layer's activation function is identity and the outputs are normalized later with softmax
to produce a probability distribution. The identity activations are recorded in the trace instead of the softmax
activations to preserve the original values.
Training examples are generated by running the minimax algorithm (with alpha-beta pruning) on possible board states. For each state, the optimal move is determined and used as the target output for the neural network. This allows the network to learn from perfect play and generalize to unseen positions.
- Input: 18 floats (9 for Player 1’s (Human Player) bitboard, 9 for Player 2’s (AI Player) bitboard)
- Output: 9 floats (1.0 for the optimal move, 0.0 for others)
After much experimentation, the best paramters for the training algorithm were the following:
- Learning rate: 0.0001
- Cost threshold: 0.1
- Epochs: 10,000
- Batch size: 10
The learning rate determines how quickly the weights are updated. A lower learning rate allowed the network to converge more gradually, but it also took longer to train.
The cost threshold determines the error cost at which the training algorithm stops. A lower threshold allowed the network to learn from more mistakes, but it also took longer to train.
The epochs and batch size determine how many training examples are used to update the weights. A higher number of epochs and a smaller batch size allowed the network to learn from more examples, but it also took longer to train.
t-cubed: Where Tic-Tac-Toe meets neural networks ✨