Skip to content

Commit a06b1da

Browse files
author
Chris Elion
authored
[MLA-1768] retrain Match3 scene (#4943)
* improved settings and move to default_settings * update models
1 parent f52f19b commit a06b1da

File tree

8 files changed

+49
-69
lines changed

8 files changed

+49
-69
lines changed

Project/Assets/ML-Agents/Examples/Match3/Prefabs/Match3VisualObs.prefab

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ MonoBehaviour:
8989
VectorActionDescriptions: []
9090
VectorActionSpaceType: 0
9191
hasUpgradedBrainParametersWithActionSpec: 1
92-
m_Model: {fileID: 11400000, guid: 48d14da88fea74d0693c691c6e3f2e34, type: 3}
92+
m_Model: {fileID: 11400000, guid: 28ccdfd7cb3d941ce8af0ab89e06130a, type: 3}
9393
m_InferenceDevice: 2
9494
m_BehaviorType: 0
9595
m_BehaviorName: Match3VisualObs
Binary file not shown.
Binary file not shown.

Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.nn.meta

Lines changed: 0 additions & 11 deletions
This file was deleted.
Binary file not shown.

Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.onnx.meta

Lines changed: 15 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

config/ppo/Match3.yaml

Lines changed: 31 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,72 +1,48 @@
1+
default_settings:
2+
trainer_type: ppo
3+
hyperparameters:
4+
batch_size: 16
5+
buffer_size: 120
6+
learning_rate: 0.0003
7+
beta: 0.005
8+
epsilon: 0.2
9+
lambd: 0.99
10+
num_epoch: 3
11+
learning_rate_schedule: constant
12+
network_settings:
13+
normalize: true
14+
hidden_units: 256
15+
num_layers: 4
16+
vis_encode_type: match3
17+
reward_signals:
18+
extrinsic:
19+
gamma: 0.99
20+
strength: 1.0
21+
keep_checkpoints: 5
22+
max_steps: 5000000
23+
time_horizon: 128
24+
summary_freq: 10000
25+
threaded: true
26+
127
behaviors:
2-
Match3VectorObs:
3-
trainer_type: ppo
4-
hyperparameters:
5-
batch_size: 64
6-
buffer_size: 12000
7-
learning_rate: 0.0003
8-
beta: 0.001
9-
epsilon: 0.2
10-
lambd: 0.99
11-
num_epoch: 3
12-
learning_rate_schedule: constant
13-
network_settings:
14-
normalize: true
15-
hidden_units: 128
16-
num_layers: 2
17-
vis_encode_type: match3
18-
reward_signals:
19-
extrinsic:
20-
gamma: 0.99
21-
strength: 1.0
22-
keep_checkpoints: 5
23-
max_steps: 5000000
24-
time_horizon: 1000
25-
summary_freq: 10000
26-
threaded: true
27-
Match3VisualObs:
28-
trainer_type: ppo
29-
hyperparameters:
30-
batch_size: 64
31-
buffer_size: 12000
32-
learning_rate: 0.0003
33-
beta: 0.001
34-
epsilon: 0.2
35-
lambd: 0.99
36-
num_epoch: 3
37-
learning_rate_schedule: constant
38-
network_settings:
39-
normalize: true
40-
hidden_units: 128
41-
num_layers: 2
42-
vis_encode_type: match3
43-
reward_signals:
44-
extrinsic:
45-
gamma: 0.99
46-
strength: 1.0
47-
keep_checkpoints: 5
48-
max_steps: 5000000
49-
time_horizon: 1000
50-
summary_freq: 10000
51-
threaded: true
5228
Match3SimpleHeuristic:
5329
# Settings can be very simple since we don't care about actually training the model
5430
trainer_type: ppo
5531
hyperparameters:
56-
batch_size: 64
57-
buffer_size: 128
32+
batch_size: 16
33+
buffer_size: 120
5834
network_settings:
5935
hidden_units: 4
6036
num_layers: 1
6137
max_steps: 5000000
6238
summary_freq: 10000
6339
threaded: true
64-
Match3GreedyHeuristic:
40+
Match3SmartHeuristic:
6541
# Settings can be very simple since we don't care about actually training the model
6642
trainer_type: ppo
6743
hyperparameters:
68-
batch_size: 64
69-
buffer_size: 128
44+
batch_size: 16
45+
buffer_size: 120
7046
network_settings:
7147
hidden_units: 4
7248
num_layers: 1

docs/Learning-Environment-Examples.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -551,7 +551,7 @@ drop down. New pieces are spawned randomly at the top, with a chance of being
551551
- Observations and actions are defined with a sensor and actuator respectively.
552552
- Float Properties: None
553553
- Benchmark Mean Reward:
554-
- 37.2 for visual observations
555-
- 37.6 for vector observations
554+
- 39.5 for visual observations
555+
- 38.5 for vector observations
556556
- 34.2 for simple heuristic (pick a random valid move)
557557
- 37.0 for greedy heuristic (pick the highest-scoring valid move)

0 commit comments

Comments
 (0)