Skip to content

Commit

Permalink
[ModelZoo] Update modelzoo README. (DeepRec-AI#916)
Browse files Browse the repository at this point in the history
Models: bst/dbmtl/dcn/deepfm/dien/din/dlrm/dssm/esmm/mmoe/ple/simple_multitask/wide_and_deep

Signed-off-by: candy.dc <candy.dc@alibaba-inc.com>
  • Loading branch information
candyzone authored Jul 14, 2023
1 parent 56cc51e commit 616e9e4
Show file tree
Hide file tree
Showing 13 changed files with 267 additions and 268 deletions.
42 changes: 21 additions & 21 deletions modelzoo/bst/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ input:
- `--data_location`: Full path of train & eval data, default to `./data`.
- `--steps`: Set the number of steps on train dataset. Default will be set to 100 epoch.
- `--no_eval`: Do not evaluate trained model by eval dataset.
- `--batch_size`: Batch size to train. Default to 512.
- `--batch_size`: Batch size to train. Default to 2048.
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
Expand Down Expand Up @@ -157,21 +157,20 @@ input:
## Benchmark
### Stand-alone Training
#### Test Environment
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
- Hardware
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
- CPU(s): 8
- Model name: Intel(R) Xeon(R) Platinum 8475B
- CPU(s): 16
- Socket(s): 1
- Core(s) per socket: 4
- Core(s) per socket: 8
- Thread(s) per core: 2
- Memory: 32G
- Memory: 64G
- Software
- kernel: 4.18.0-348.2.1.el8_5.x86_64
- OS: CentOS Linux release 8.5.2111
- GCC: 8.5.0
- Docker: 20.10.12
- Python: 3.6.8
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
- OS: Ubuntu 22.04.2 LTS
- GCC: 11.3.0
- Docker: 20.10.21
#### Performance Result
Expand All @@ -182,33 +181,34 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
<td>DType</td>
<td>Accuracy</td>
<td>AUC</td>
<td>Globalsetp/Sec</td>
<td>Throughput</td>
</tr>
<tr>
<td rowspan="3">BST</td>
<td>Community TensorFlow</td>
<td>FP32</td>
<td></td>
<td></td>
<td></td>
<td>0.912500</td>
<td>0.499316</td>
<td>16924.47(baseline)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32</td>
<td></td>
<td></td>
<td></td>
<td>0.894900</td>
<td>0.499316</td>
<td>22143.04(1.30x)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32+BF16</td>
<td></td>
<td></td>
<td></td>
<td>0.909099</td>
<td>0.499316</td>
<td>28686.70(1.69x)</td>
</tr>
</table>
- Community TensorFlow version is v1.15.5.
- Due to the small size of the dataset, the results did not converge, leading to limited reference value for ACC and AUC.
### Distributed Training
#### Test Environment
Expand Down
43 changes: 22 additions & 21 deletions modelzoo/dbmtl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ Context │ │────►│ │
- `--data_location`: Full path of train & eval data. Default is `./data`.
- `--steps`: Set the number of steps on train dataset. When default(`0`) is used, the number of steps is computed based on dataset size and number of epochs equals 1000.
- `--no_eval`: Do not evaluate trained model by eval dataset.
- `--batch_size`: Batch size to train. Default is `512`.
- `--batch_size`: Batch size to train. Default is `2048`.
- `--output_dir`: Full path to output directory for logs and saved model. Default is `./result`.
- `--checkpoint`: Full path to checkpoints output directory. Default is `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMP)`
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to `None`.
Expand Down Expand Up @@ -151,20 +151,20 @@ Context │ │────►│ │
### Stand-alone Training
#### Test Environment
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
- Hardware
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
- CPU(s): 8
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
- Hardware
- Model name: Intel(R) Xeon(R) Platinum 8475B
- CPU(s): 16
- Socket(s): 1
- Core(s) per socket: 4
- Core(s) per socket: 8
- Thread(s) per core: 2
- Memory: 32G
- Memory: 64G
- Software
- kernel: 4.18.0-305.12.1.el8_4.x86_64
- OS: CentOS Linux release 8.4.2105
- Docker: 20.10.12
- Python: 3.6.12
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
- OS: Ubuntu 22.04.2 LTS
- GCC: 11.3.0
- Docker: 20.10.21
#### Performance Result
Expand All @@ -175,33 +175,34 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
<td>DType</td>
<td>Accuracy</td>
<td>AUC</td>
<td>Globalsetp/Sec</td>
<td>Throughput</td>
</tr>
<tr>
<td rowspan="3">DBMTL</td>
<td>Community TensorFlow</td>
<td>FP32</td>
<td></td>
<td></td>
<td></td>
<td>0.973150</td>
<td>0.753008</td>
<td>63220.87(baseline)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32</td>
<td></td>
<td></td>
<td></td>
<td>0.973150</td>
<td>0.753070</td>
<td>77383.57(1.22x)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32+BF16</td>
<td></td>
<td></td>
<td></td>
<td>0.973150</td>
<td>0.753070</td>
<td>137581.54(2.17x)</td>
</tr>
</table>
- Community TensorFlow version is v1.15.5.
- Due to the small size of the dataset, the results did not converge, leading to limited reference value for ACC and AUC.
## Dataset
Train & eval dataset using ***Taobao dataset***.
Expand Down
39 changes: 19 additions & 20 deletions modelzoo/dcn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ The following is a brief directory structure and description for this example:
- `--data_location`: Full path of train & eval data, default to `./data`.
- `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
- `--no_eval`: Do not evaluate trained model by eval dataset.
- `--batch_size`: Batch size to train. Default to 512.
- `--batch_size`: Batch size to train. Default to 2048.
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
Expand Down Expand Up @@ -128,21 +128,20 @@ The following is a brief directory structure and description for this example:
## Benchmark
### Stand-alone Training
#### Test Environment
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
- Hardware
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
- CPU(s): 8
- Model name: Intel(R) Xeon(R) Platinum 8475B
- CPU(s): 16
- Socket(s): 1
- Core(s) per socket: 4
- Core(s) per socket: 8
- Thread(s) per core: 2
- Memory: 32G
- Memory: 64G
- Software
- kernel: 4.18.0-348.2.1.el8_5.x86_64
- OS: CentOS Linux release 8.5.2111
- GCC: 8.5.0
- Docker: 20.10.12
- Python: 3.6.8
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
- OS: Ubuntu 22.04.2 LTS
- GCC: 11.3.0
- Docker: 20.10.21
#### Performance Result
Expand All @@ -159,23 +158,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
<td rowspan="3">DCN</td>
<td>Community TensorFlow</td>
<td>FP32</td>
<td>0.775859</td>
<td>0.768275</td>
<td></td>
<td>0.776260</td>
<td>0.769636</td>
<td>24524.91(baseline)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32</td>
<td></td>
<td></td>
<td></td>
<td>0.775738</td>
<td>0.769095</td>
<td>31917.35(1.30x)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32+BF16</td>
<td></td>
<td></td>
<td></td>
<td>0.775738</td>
<td>0.768651</td>
<td>55753.15(2.27x)</td>
</tr>
</table>
Expand Down
39 changes: 19 additions & 20 deletions modelzoo/deepfm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ input: | |
- `--data_location`: Full path of train & eval data, default to `./data`.
- `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
- `--no_eval`: Do not evaluate trained model by eval dataset.
- `--batch_size`: Batch size to train. Default to 512.
- `--batch_size`: Batch size to train. Default to 2048.
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
Expand Down Expand Up @@ -153,21 +153,20 @@ input: | |
## Benchmark
### Stand-alone Training
#### Test Environment
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
- Hardware
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
- CPU(s): 8
- Model name: Intel(R) Xeon(R) Platinum 8475B
- CPU(s): 16
- Socket(s): 1
- Core(s) per socket: 4
- Core(s) per socket: 8
- Thread(s) per core: 2
- Memory: 32G
- Memory: 64G
- Software
- kernel: 4.18.0-348.2.1.el8_5.x86_64
- OS: CentOS Linux release 8.5.2111
- GCC: 8.5.0
- Docker: 20.10.12
- Python: 3.6.8
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
- OS: Ubuntu 22.04.2 LTS
- GCC: 11.3.0
- Docker: 20.10.21
#### Performance Result
Expand All @@ -184,23 +183,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
<td rowspan="3">DeepFM</td>
<td>Community TensorFlow</td>
<td>FP32</td>
<td>0.784695</td>
<td>0.781548</td>
<td>18848.64(baseline)</td>
<td>0.782777</td>
<td>0.776113</td>
<td>61230.80(baseline)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32</td>
<td>0.782755</td>
<td>0.777158</td>
<td>31260.00(1.65x)</td>
<td>0.780460</td>
<td>0.773281</td>
<td>74380.35(1.22x)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32+BF16</td>
<td>0.782659</td>
<td>0.776537</td>
<td>34627.46(1.84x)</td>
<td>0.780460</td>
<td>0.775249</td>
<td>95107.32(1.55x)</td>
</tr>
</table>
Expand Down
39 changes: 19 additions & 20 deletions modelzoo/dien/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ The following is a brief directory structure and description for this example:
- `--data_location`: Full path of train & eval data, default to `./data`.
- `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
- `--no_eval`: Do not evaluate trained model by eval dataset.
- `--batch_size`: Batch size to train. Default to 512.
- `--batch_size`: Batch size to train. Default to 2048.
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
Expand Down Expand Up @@ -138,21 +138,20 @@ The following is a brief directory structure and description for this example:
## Benchmark
### Stand-alone Training
#### Test Environment
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
- Hardware
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
- CPU(s): 8
- Model name: Intel(R) Xeon(R) Platinum 8475B
- CPU(s): 16
- Socket(s): 1
- Core(s) per socket: 4
- Core(s) per socket: 8
- Thread(s) per core: 2
- Memory: 32G
- Memory: 64G
- Software
- kernel: 4.18.0-348.2.1.el8_5.x86_64
- OS: CentOS Linux release 8.5.2111
- GCC: 8.5.0
- Docker: 20.10.12
- Python: 3.6.8
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
- OS: Ubuntu 22.04.2 LTS
- GCC: 11.3.0
- Docker: 20.10.21
#### Performance Result
Expand All @@ -169,23 +168,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
<td rowspan="3">DIEN</td>
<td>Community TensorFlow</td>
<td>FP32</td>
<td>0.681824</td>
<td>0.757496</td>
<td>2822.78(baseline)</td>
<td>0.575529</td>
<td>0.597272</td>
<td>6327.50(baseline)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32</td>
<td>0.692499</td>
<td>0.767193</td>
<td>3834.05(1.36x)</td>
<td>0.543935</td>
<td>0.5972728</td>
<td>10094.21(1.60x)</td>
</tr>
<tr>
<td>DeepRec w/ oneDNN</td>
<td>FP32+BF16</td>
<td>0.693011</td>
<td>0.768412</td>
<td>3862.06(1.37x)</td>
<td>0.551233</td>
<td>0.597272</td>
<td>11565.63(1.83x)</td>
</tr>
</table>
Expand Down
Loading

0 comments on commit 616e9e4

Please sign in to comment.