-
Notifications
You must be signed in to change notification settings - Fork 449
Merge Preprocessor into DataSet. #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- add character vocab in preprocessor - add dataset loader for language model dataset - other minor adjustments - preserve only a little example data for language model
- DataSet's __init__ takes a function as argument, rather than class object - Preprocessor is about to remove. Don't use anymore. - Remove cross_validate in trainer, because it is rarely used and wired - Loader.load is expected to be a static method - Delete sth. in other_modules.py - Add more tests - Delete extra sample data
Codecov Report
@@ Coverage Diff @@
## master #91 +/- ##
==========================================
+ Coverage 73.05% 79.47% +6.41%
==========================================
Files 67 68 +1
Lines 3437 3620 +183
==========================================
+ Hits 2511 2877 +366
+ Misses 926 743 -183
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good
the framework has been updated, i think we should merge it as soon as possible
- improve metrics codes - fix validator bugs in trainer; remove early saving - run CWS codes - improve README.md
|
Ready to merge now. This is the latest release V0.1.0. |
Before
After
DataSet takes care of data loading, indexing, and transformation into tensors.
How to add a new task ?
convert
method handles how your multi-level lists are transformed intoFields
.Evaluator
, to compute metricsModel, DataSet, and Evaluator are the only three things you need to program.
And there are a lot of reusable sample codes.