The following is a brief directory structure and description for this example:
├── data # Data set directory
│ └── README.md # Documentation describing how to prepare dataset
├── distribute_k8s # Distributed training related files
│ ├── distribute_k8s_BF16.yaml # k8s yaml to crate a training job with BF16 feature
│ ├── distribute_k8s_FP32.yaml # k8s yaml to crate a training job
│ └── launch.py # Script to set env for distributed training
├── README.md # Documentation
├── result # Output directory
│ └── README.md # Documentation describing output directory
└── train.py # Training script
What is this model
The discription of model in this folder(such as params, model size)
The structure of the model
How to use MODEL model example
The first step in following chapter should be data preparation
Put a link to #dataset, Like Data Prepare
How to train stand-alone model
How to train distribute model
The benchmark of example
The benchmark of stand-alone training
The information of hardware & software stand-alone test environment
Framework | DType | Accuracy | AUC | Globalsetp/Sec | |
Model Type | Community TensorFlow | FP32 | (baseline) | ||
DeepRec w/ oneDNN | FP32 | (+1.00x) | |||
DeepRec w/ oneDNN | FP32+BF16 | (+1.00x) |
- Community TensorFlow version is v1.15.5.
The benchmark of distribute training
The information of hardware & software stand-alone test environment
Framework | Protocol | DType | Globalsetp/Sec | |
Model Type | Community TensorFlow | GRPC | FP32 | |
DeepRec w/ oneDNN | GRPC | FP32 | ||
DeepRec w/ oneDNN | GRPC | FP32+BF16 |
- Community TensorFlow version is v1.15.5.
Whic dataset is used
Where to download dataset
Where to put dataset
Here put a link to data/README.md
A detailed description of dataset
How data are processed in this example
Next To do
- Pending to do sth