Skip to content

fastNLP V0.3.1 #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Feb 6, 2019
Merged

fastNLP V0.3.1 #132

merged 36 commits into from
Feb 6, 2019

Conversation

FengZiYjun
Copy link
Contributor

New features:

  • 添加一系列callbacks: EarlyStopCallback, LRFinder, LRScheduler, etc.
  • 添加padder, 允许自定义pad方法 (EngChar2dPadder 解决二维padding)
  • 拓展DataSet初始化接受的类型
  • 多进程batch
  • 升级中文分词/词性标注/句法分析 APIs
  • 添加BERT和预训练模型加载接口

Bugs fixed:

  • validation step counts
  • remove GPU id when saving
  • refactor type system in FieldArray
  • ...

Code structure refined:

  • reduced dependency on reproduction/
  • renaming folders
  • optimized Trainer methods

Testing:

  • add tests for callbacks
  • more tests for processors

Tutorials:

  • 添加一份padding教程
  • 添加一份测试指南

yunfan and others added 30 commits January 14, 2019 19:13
- refine & fix Transformer Encoder
- refine & speed up biaffine parser
* move used readers from reproduction to io/dataset_loader.py
(API shall not call anything from reproduction/)
* 改名: chinese_word_segment ---> Chinese_word_segmentation
* 改名: pos_tag_model ---> POS_tagging
* 添加4个对Batch的测试
* 删除无用的chinese_word_segment/run.py
* 将dataset.py中的assert改为raise error
* 给trainer添加try-except,捕捉EarlyStopError
* 优化trainer代码
* 给callbacks添加测试
2. FieldArray默认使用AutoPadder, AutoPadder的行为与之前不使用padder是一致的的
3. 为了解决二维padding的问题,引入了EngChar2dPadder用于对character进行padding
4. 增加一份padding的tutorial。
* 重构dtype的检测代码,在FieldArray的初始化和append两处,达到更好的代码复用
* 类型检测的责任完全落在FieldArray,DataSet与之配合
测试:
* 整理dtype相关的测试代码
* 给所有tutorial添加测试
其他:
* 完善一个完整的Conll dataset loader
* 升级POS tag model训练脚本
* 添加测试:FieldArray的初始化
* 添加两类Callback
* 完善Trainer对error的捕捉
* rename callback methods. Use fastai's notation.
* add a new callback method - on_valid_begin
* load pre-trained BERT weights from local binary
* add tests
* 升级parser API和模型
* update docs: add new pages for tutorials
* upgrade CWS api download source
* add a new method for dataset field access
* add introduction for bert
* add more unit tests for api/processor
* remove unused test data. Add new test data.
@codecov-io
Copy link

Codecov Report

Merging #132 into master will increase coverage by 6.49%.
The diff coverage is 77.38%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #132      +/-   ##
=========================================
+ Coverage      68%   74.5%   +6.49%     
=========================================
  Files          90      88       -2     
  Lines        6286    7245     +959     
=========================================
+ Hits         4275    5398    +1123     
+ Misses       2011    1847     -164
Impacted Files Coverage Δ
fastNLP/io/config_io.py 83.22% <ø> (+0.64%) ⬆️
fastNLP/core/instance.py 92.85% <ø> (ø) ⬆️
fastNLP/io/base_loader.py 57.57% <ø> (+3.03%) ⬆️
fastNLP/api/examples.py 0% <0%> (ø) ⬆️
fastNLP/core/utils.py 61.51% <100%> (+1.37%) ⬆️
test/models/test_bert.py 100% <100%> (ø)
test/io/test_dataset_loader.py 100% <100%> (ø) ⬆️
test/api/test_processor.py 100% <100%> (ø) ⬆️
fastNLP/io/embed_loader.py 57.81% <100%> (+2.07%) ⬆️
test/core/test_callbacks.py 100% <100%> (ø) ⬆️
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fa95b6...b66d7b8. Read the comment docs.

@FengZiYjun FengZiYjun merged commit 13faa2b into fastnlp:master Feb 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants