-
Notifications
You must be signed in to change notification settings - Fork 2.9k
refactor ctr model #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor ctr model #138
Conversation
ctr/README.md
Outdated
├── train.py # 训练脚本 | ||
└── utils.py # helper functions | ||
``` | ||
|
||
## 背景介绍 | ||
|
||
CTR(Click-Through Rate,点击率预估)\[[1](https://en.wikipedia.org/wiki/Click-through_rate)\] 是用来表示用户点击一个特定链接的概率, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 第一句话:是用来表示用户点击一个特定链接的概率 ,通常被用来衡量一个在线广告系统的有效性。--> 是对用户点击一个特定链接的概率做出预测,是广告投放过程中的一个重要环节。精准的点击率预估对在线广告系统收益最大化具有重要意义。
- 11 行,from @llxxxll "召回"这个词对基础用户比较陌生,解释或者再用其他方式描述一下。
- 这一篇分段过于细碎,第23,24 行全部合在第一段中。
- 第24行“系统大体上会执行下列步骤来展示广告” --> 粗略来讲,系统会执行下列步骤展示广告:
- 31 行去掉 “很”,很重要 --> 重要。
- 53 ~ 62 行合并为一段。
ctr/README.md
Outdated
具体的特征处理方法参看 [data process](./dataset.md)。 | ||
|
||
本教程中演示模型的输入格式如下: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 69 行
click
是指点击率吗?在69行时还没有介绍数据,这里还无法和数据中的click
字段关联上,说明更清楚一些吧。 - 79 和 81 合为一段。
- 81 行之后,可否用文字再进一步描述解释一下格式。
ctr/README.md
Outdated
@@ -61,8 +76,40 @@ LR 对于 DNN 模型的优势是对大规模稀疏特征的容纳能力,包括 | |||
|
|||
我们使用 Kaggle 上 `Click-through rate prediction` 任务的数据集\[[2](https://www.kaggle.com/c/avazu-ctr-prediction/data)\] 来演示模型。 | |||
|
|||
具体的特征处理方法参看 [data process](./dataset.md) | |||
具体的特征处理方法参看 [data process](./dataset.md)。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
77 行 我们使用 Kaggle 上 Click-through rate prediction
任务的数据集来演示模型。 --> 我们使用 Kaggle 上 Click-through rate prediction
任务的数据集来运行本例中的模型。
ctr/README.md
Outdated
23 231 \t 1230:0.12 13421:0.9 \t 1 | ||
``` | ||
|
||
演示数据集\[[2](#参考文档)\] 可以使用 `avazu_data_processor.py` 脚本处理,具体使用方法参考如下说明: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
本例目录下的avazu_data_processor.py
脚本可以对下载的原始数据进行处理,具体使用方法参考如下说明:
ctr/README.md
Outdated
├── network_conf.py # 模型网络配置 | ||
├── reader.py # data provider | ||
├── train.py # 训练脚本 | ||
└── utils.py # helper functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 这里没有
avazu_data_processor.py
这个文件,是刻意的吗?这个文件也挺重要的。 - images 目录是可以简单地省略。
ctr/network_conf.py
Outdated
self.output = layer.fc( | ||
input=merge_layer, | ||
size=1, | ||
name='output', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,去掉name
@@ -0,0 +1,112 @@ | |||
#!/usr/bin/env python | |||
# -*- coding: utf-8 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有些脚本加了 shebang,保持一致,或者都删掉,或者都加上。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ctr/utils.py
Outdated
import logging | ||
|
||
logging.basicConfig() | ||
logger = logging.getLogger("logger") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 可以直接
logging.getLogger("paddle")
获取config_parser.py
中的logger。
ctr/train.py
Outdated
# n_records_as_test=args.test_set_size, | ||
# fields=reader.fields, | ||
# feature_dims=reader.feature_dims) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果是不需要的注释,就删掉吧。
ctr/avazu_data_processer.py
Outdated
for key in id_features) + 1 | ||
# logger.warning("dump dataset's meta info to %s" % meta_out_path) | ||
# cPickle.dump([feature_dims, fields], open(meta_out_path, 'wb')) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要的注释就删掉。
运行训练和测试 有问题,需要check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ctr/README.md
Outdated
|
||
1. 召回满足 query 的广告集合 | ||
1. 获取满足 query 的广告集合 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“满足 query ” 这一句的意义不明白,我不是特别理解。
能否略微再增加一些描述性词汇。
修改如下:
infer.py