Skip to content

Commit

Permalink
[Embedding] Add a list of fused embedding ops. (DeepRec-AI#2)
Browse files Browse the repository at this point in the history
1. Added the Python API of the fused embedding.
2. Add fill_empty_row and prune_invalid_id to fused embedding lookup
3. Add fused embedding modelzoo perf test benchmark
Co-authored-by: Randy Wang <ruotongw@nvidia.com>
  • Loading branch information
nvzhou authored Dec 27, 2021
1 parent e7fc3d0 commit 441c972
Show file tree
Hide file tree
Showing 46 changed files with 4,084 additions and 1,099 deletions.
1 change: 1 addition & 0 deletions cibuild/gpu-ut.sh
Original file line number Diff line number Diff line change
Expand Up @@ -165,5 +165,6 @@ export TF_BUILD_BAZEL_TARGET="$TF_BUILD_BAZEL_TARGET "\
"-//tensorflow/python/tools/api/generator:output_init_files_test "\
"-//tensorflow/python/tpu:datasets_test "\
"-//tensorflow/python/training/tracking:util_xla_test_gpu "\
"-//tensorflow/core/kernels:fused_embedding_ops_test_gpu"

bazel test -c opt --config=cuda --verbose_failures --run_under=//tensorflow/tools/ci_build/gpu_build:parallel_gpu_execute --test_timeout="300,450,1200,3600" --local_test_jobs=2 -- $TF_BUILD_BAZEL_TARGET
3 changes: 3 additions & 0 deletions modelzoo/features/GPUFusedEmbedding/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.csv
*/result/model_*
record.py
21 changes: 21 additions & 0 deletions modelzoo/features/GPUFusedEmbedding/DLRM/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# DLRM with GPU Fused Embedding

The model structure, hyper params, dataset, etc, are all same to [DLRM](../../../DLRM/README.md). Please follow the instruction there to prepare, setup and run the model.

The only difference is that this model use GPU Fused Embedding to acclerate the lookup process. Only change is:

```python
categorical_embedding_column = tf.feature_column.embedding_column(
categorical_column, dimension=16, combiner='mean',
do_fusion=True)
```

## Benchmark

On A100-80GB-PCIE GPU, with 8 cores AMD EPYC 7232P CPU @ 3.20GHz. Average of 5000 iterations. The perf boost:

| | Avg Time per Iteration |
| ------- | ---------------------- |
| Unfused | 37.15 ms |
| Fused | 31.43 ms |
| SpeedUp | 1.18x |
10 changes: 10 additions & 0 deletions modelzoo/features/GPUFusedEmbedding/DLRM/data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Dataset
## Prepare dataset
Put data file **train.csv & eval.csv** into ./data/

Download Kaggle Display Advertising Challenge Dataset from http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/

The evaluation dataset for accuracy measurement is not available in the above link can be downloaded from https://storage.googleapis.com/dataset-uploader/criteo-kaggle/large_version/eval.csv

Download the train dataset(in csv format) from https://storage.googleapis.com/dataset-uploader/criteo-kaggle/large_version/train.csv

2 changes: 2 additions & 0 deletions modelzoo/features/GPUFusedEmbedding/DLRM/result/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Result
Checkpoint & timeline file are default saved in this folder.
Loading

0 comments on commit 441c972

Please sign in to comment.