Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
649453932 committed Jul 28, 2019
1 parent d97f21e commit f624976
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 4 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ tensorboardX

模型|acc|备注
--|--|--
bert|94.04%|bert + fc
ERNIE|92.75%|说好的中文碾压bert呢
bert|94.83%|bert + fc
ERNIE|94.61%|说好的中文碾压bert呢

CNN、RNN、DPCNN、RCNN、RNN+Attention、FastText等模型效果,请见我另外一个[仓库](https://github.com/649453932/Chinese-Text-Classification-Pytorch)

Expand Down
2 changes: 1 addition & 1 deletion models/bert.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def __init__(self, dataset):

self.require_improvement = 1000 # 若超过1000batch效果还没提升,则提前结束训练
self.num_classes = len(self.class_list) # 类别数
self.num_epochs = 2 # epoch数
self.num_epochs = 3 # epoch数
self.batch_size = 128 # mini-batch大小
self.pad_size = 32 # 每句话处理成的长度(短填长切)
self.learning_rate = 5e-5 # 学习率
Expand Down
3 changes: 2 additions & 1 deletion utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import time
from datetime import timedelta

PAD = '[PAD]' # padding符号
PAD, CLS = '[PAD]', '[CLS]' # padding符号, bert中综合信息符号


def build_dataset(config):
Expand All @@ -18,6 +18,7 @@ def load_dataset(path, pad_size=32):
continue
content, label = lin.split('\t')
token = config.tokenizer.tokenize(content)
token = [CLS] + token
seq_len = len(token)
mask = []
token_ids = config.tokenizer.convert_tokens_to_ids(token)
Expand Down

0 comments on commit f624976

Please sign in to comment.