fastNLP V0.3.1 #132

FengZiYjun · 2019-02-04T02:29:17Z

New features:

添加一系列callbacks: EarlyStopCallback, LRFinder, LRScheduler, etc.
添加padder, 允许自定义pad方法 (EngChar2dPadder 解决二维padding)
拓展DataSet初始化接受的类型
多进程batch
升级中文分词/词性标注/句法分析 APIs
添加BERT和预训练模型加载接口

Bugs fixed:

validation step counts
remove GPU id when saving
refactor type system in FieldArray
...

Code structure refined:

reduced dependency on reproduction/
renaming folders
optimized Trainer methods

Testing:

add tests for callbacks
more tests for processors

Tutorials:

添加一份padding教程
添加一份测试指南

- refine & fix Transformer Encoder - refine & speed up biaffine parser

* move used readers from reproduction to io/dataset_loader.py (API shall not call anything from reproduction/)

* 改名: chinese_word_segment ---> Chinese_word_segmentation * 改名: pos_tag_model ---> POS_tagging * 添加4个对Batch的测试 * 删除无用的chinese_word_segment/run.py

* 将dataset.py中的assert改为raise error * 给trainer添加try-except,捕捉EarlyStopError * 优化trainer代码 * 给callbacks添加测试

2. FieldArray默认使用AutoPadder, AutoPadder的行为与之前不使用padder是一致的的 3. 为了解决二维padding的问题，引入了EngChar2dPadder用于对character进行padding 4. 增加一份padding的tutorial。

… dev

* 重构dtype的检测代码，在FieldArray的初始化和append两处，达到更好的代码复用 * 类型检测的责任完全落在FieldArray，DataSet与之配合测试： * 整理dtype相关的测试代码 * 给所有tutorial添加测试其他： * 完善一个完整的Conll dataset loader * 升级POS tag model训练脚本

* 添加测试：FieldArray的初始化

* 添加两类Callback * 完善Trainer对error的捕捉

into dev

* rename callback methods. Use fastai's notation. * add a new callback method - on_valid_begin

* load pre-trained BERT weights from local binary * add tests

* 升级parser API和模型 * update docs: add new pages for tutorials * upgrade CWS api download source * add a new method for dataset field access * add introduction for bert * add more unit tests for api/processor * remove unused test data. Add new test data.

codecov-io · 2019-02-04T02:36:04Z

Codecov Report

Merging #132 into master will increase coverage by 6.49%.
The diff coverage is 77.38%.

@@            Coverage Diff            @@
##           master    #132      +/-   ##
=========================================
+ Coverage      68%   74.5%   +6.49%     
=========================================
  Files          90      88       -2     
  Lines        6286    7245     +959     
=========================================
+ Hits         4275    5398    +1123     
+ Misses       2011    1847     -164

Impacted Files	Coverage Δ
fastNLP/io/config_io.py	`83.22% <ø> (+0.64%)`	⬆️
fastNLP/core/instance.py	`92.85% <ø> (ø)`	⬆️
fastNLP/io/base_loader.py	`57.57% <ø> (+3.03%)`	⬆️
fastNLP/api/examples.py	`0% <0%> (ø)`	⬆️
fastNLP/core/utils.py	`61.51% <100%> (+1.37%)`	⬆️
test/models/test_bert.py	`100% <100%> (ø)`
test/io/test_dataset_loader.py	`100% <100%> (ø)`	⬆️
test/api/test_processor.py	`100% <100%> (ø)`	⬆️
fastNLP/io/embed_loader.py	`57.81% <100%> (+2.07%)`	⬆️
test/core/test_callbacks.py	`100% <100%> (ø)`	⬆️
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fa95b6...b66d7b8. Read the comment docs.

yunfan and others added 30 commits January 14, 2019 19:13

- fix trainer with validate_every > 0

2e9e6c6

- refine & fix Transformer Encoder - refine & speed up biaffine parser

remove the gpu_id info when saving

a6dbbe9

code optimization

c4ba75d

* move used readers from reproduction to io/dataset_loader.py (API shall not call anything from reproduction/)

Updates:

1fdaf23

* 改名: chinese_word_segment ---> Chinese_word_segmentation * 改名: pos_tag_model ---> POS_tagging * 添加4个对Batch的测试 * 删除无用的chinese_word_segment/run.py

train增加注释；attention增加注释；新增transformer分词

6a0a1ed

conflict solved

1f50b01

* 添加callbacks：EarlyStopCallback

d80d944

* 将dataset.py中的assert改为raise error * 给trainer添加try-except,捕捉EarlyStopError * 优化trainer代码 * 给callbacks添加测试

修改Padder的测试用例

3e33a23

Merge branch 'dev' of github.com:choosewhatulike/fastNLP-private into…

73dd35d

… dev

* FieldArray添加对list of np.array的支持

b93ca9b

* 添加测试：FieldArray的初始化

添加FieldArray对list of np.array的支持

864c223

将batch增强为多进程batch

2e3ef52

减少batch中不断创建多进程的开销

d9ac334

* 重构POS API，改成接受word作为输入

ab953b4

* 添加两类Callback * 完善Trainer对error的捕捉

- fix parser train

eb55856

update reproduction

de856fb

- revert batch

a7f3701

添加LR finder，用第一个epoch找最佳lr,从第二个epoch开始训练

62ea4f7

Update POS API

b14dd58

- batch with multiprocessing

03f49c8

将tesorboardX处理为callback, 从trainer移除tensorboardX相关代码

f3cb812

trainer根据syf的多进程batch进行修改

47ec69e

Merge branch 'dev' of https://github.com/choosewhatulike/fastNLP-private

e93c6f0

into dev

add batch device

a37de43

Merge branch 'yyff' into dev

c02980e

remove device in batch

9474ab4

add testing tutorial

d4b4ffa

skip training while n_epoch in trainer is not greater than 0

e0d6a25

FengZiYjun added 6 commits January 25, 2019 21:43

update callbacks:

887fc92

* rename callback methods. Use fastai's notation. * add a new callback method - on_valid_begin

add BERT model

bfaf09d

* load pre-trained BERT weights from local binary * add tests

整理所有dataset loader，建立单元测试

9865411

Ready for V0.3.1

0c5630b

* 升级parser API和模型 * update docs: add new pages for tutorials * upgrade CWS api download source * add a new method for dataset field access * add introduction for bert * add more unit tests for api/processor * remove unused test data. Add new test data.

add codecov fix

d1b5ada

update API introduction

b66d7b8

FengZiYjun requested review from xpqiu, xuyige, choosewhatulike and yhcc February 4, 2019 02:29

xuyige approved these changes Feb 5, 2019

View reviewed changes

FengZiYjun merged commit 13faa2b into fastnlp:master Feb 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fastNLP V0.3.1 #132

fastNLP V0.3.1 #132

Uh oh!

FengZiYjun commented Feb 4, 2019

Uh oh!

codecov-io commented Feb 4, 2019

Uh oh!

Uh oh!

fastNLP V0.3.1 #132

fastNLP V0.3.1 #132

Uh oh!

Conversation

FengZiYjun commented Feb 4, 2019

Uh oh!

codecov-io commented Feb 4, 2019

Codecov Report

Uh oh!

Uh oh!