-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the benchmark of text dataset #349
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #349 +/- ##
==========================================
+ Coverage 93.18% 93.67% +0.48%
==========================================
Files 117 119 +2
Lines 6135 6243 +108
==========================================
+ Hits 5717 5848 +131
+ Misses 418 395 -23 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I am just curious if we have more datasets from HuggingFace (can be a follow-up, not a blocker)
Now there are some bugs for dataset materialization
Current text data include
Mercari
AmazonFineFoodReviews
MultimodalTextBenchmark
HuggingFaceDatasetDict(path='maharshipandya/spotify-tracks-dataset', target_col='track_genre')
(May add another dataset from HuggingFace)