Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the benchmark of text dataset #349

Merged
merged 7 commits into from
Feb 15, 2024
Merged

Add the benchmark of text dataset #349

merged 7 commits into from
Feb 15, 2024

Conversation

zechengz
Copy link
Member

@zechengz zechengz commented Feb 11, 2024

Now there are some bugs for dataset materialization
Current text data include

  • Mercari
  • AmazonFineFoodReviews
  • 17 datasets from MultimodalTextBenchmark
  • HuggingFaceDatasetDict(path='maharshipandya/spotify-tracks-dataset', target_col='track_genre')
    (May add another dataset from HuggingFace)

Copy link

codecov bot commented Feb 11, 2024

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (f085dce) 93.18% compared to head (ec75543) 93.67%.

❗ Current head ec75543 differs from pull request most recent head 9cae7e7. Consider uploading reports for the commit 9cae7e7 to get more accurate results

Files Patch % Lines
torch_frame/datasets/data_frame_text_benchmark.py 77.58% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
+ Coverage   93.18%   93.67%   +0.48%     
==========================================
  Files         117      119       +2     
  Lines        6135     6243     +108     
==========================================
+ Hits         5717     5848     +131     
+ Misses        418      395      -23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zechengz zechengz changed the title [Draft] Benchmark of text dataset Add the benchmark of text dataset Feb 14, 2024
@zechengz zechengz requested a review from weihua916 February 14, 2024 09:20
@zechengz zechengz marked this pull request as ready for review February 14, 2024 09:23
Copy link
Contributor

@weihua916 weihua916 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I am just curious if we have more datasets from HuggingFace (can be a follow-up, not a blocker)

test/datasets/test_data_frame_text_benchmark.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the data label Feb 15, 2024
@zechengz zechengz enabled auto-merge (squash) February 15, 2024 03:22
@zechengz zechengz merged commit 3cb8ba3 into master Feb 15, 2024
12 checks passed
@zechengz zechengz deleted the zecheng_text_bench branch February 15, 2024 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants