DeepRec-AI · liutongxuan · Jul 17, 2022 · Jul 15, 2022
diff --git a/modelzoo/MMoE/data/README.md b/modelzoo/MMoE/data/README.md
@@ -1,10 +1,41 @@
 # Dataset
+Taobao dataset from [EasyRec](https://github.com/AlibabaPAI/EasyRec) is used.
 ## Prepare dataset
-Put data file **train.csv & eval.csv** into ./data/
+Put data file **taobao_train_data & taobao_test_data** into ./data/
+For details of Data download, see [EasyRec](https://github.com/AlibabaPAI/EasyRec/#GetStarted)
 
-Download Kaggle Display Advertising Challenge Dataset from http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/
+### Fields
+The dataset contains 20 columns, details as follow:
+| Name | clk      | buy      | pid       | adgroup_id | cate_id   | campaign_id | customer  | brand     | user_id   | cms_segid | cms_group_id | final_gender_code | age_level | pvalue_level | shopping_level | occupation | new_user_class_level | tag_category_list | tag_brand_list | price    |
+| ---- | -------- | -------- | --------- | ---------- | --------- | ----------- | --------- | --------- | --------- | --------- | ------------ | ----------------- | --------- | ------------ | -------------- | ---------- | -------------------- | ----------------- | -------------- | -------- |
+| Type | tf.int32 | tf.int32 | tf.string | tf.string  | tf.string | tf.string   | tf.string | tf.string | tf.string | tf.string | tf.string    | tf.string         | tf.string | tf.string    | tf.string      | tf.string  | tf.string            | tf.string         | tf.string      | tf.int32 |
 
-The evaluation dataset for accuracy measurement is not available in the above link can be downloaded from https://storage.googleapis.com/dataset-uploader/criteo-kaggle/large_version/eval.csv
 
-Download the train dataset(in csv format) from https://storage.googleapis.com/dataset-uploader/criteo-kaggle/large_version/train.csv
+The data in `tag_category_list` and `tag_brand_list` column are separated by `'|'`
 
+### Processing
+The `clk` and `buy` columns are used as labels.
+User's feature columns is as follow:
+| Column name          | Hash bucket size | Embedding dimension |
+| -------------------- | ---------------- | ------------------- |
+| user_id              | 100000           | 16                  |
+| cms_segid            | 100              | 16                  |
+| cms_group_id         | 100              | 16                  |
+| age_level            | 10               | 16                  |
+| pvalue_level         | 10               | 16                  |
+| shopping_level       | 10               | 16                  |
+| occupation           | 10               | 16                  |
+| new_user_class_level | 10               | 16                  |
+| tag_category_list    | 100000           | 16                  |
+| tag_brand_list       | 100000           | 16                  |
+
+Item's feature columns is as follow:
+| Column name | Hash bucket size | Number of buckets | Embedding dimension |
+| ----------- | ---------------- | ------------------- | ------------------- |
+| pid         | 10               | N/A                 | 16                  |
+| adgroup_id  | 100000           | N/A                 | 16                  |
+| cate_id     | 10000            | N/A                 | 16                  |
+| campaign_id | 100000           | N/A                 | 16                  |
+| customer    | 100000           | N/A                 | 16                  |
+| brand       | 100000           | N/A                 | 16                  |
+| price       | N/A              | 50                  | 16                  |