Add the benchmark of text dataset #349

zechengz · 2024-02-11T06:19:02Z

Now there are some bugs for dataset materialization
Current text data include

Mercari
AmazonFineFoodReviews
17 datasets from MultimodalTextBenchmark
HuggingFaceDatasetDict(path='maharshipandya/spotify-tracks-dataset', target_col='track_genre')
(May add another dataset from HuggingFace)

codecov · 2024-02-11T06:21:20Z

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (f085dce) 93.18% compared to head (ec75543) 93.67%.

❗ Current head ec75543 differs from pull request most recent head 9cae7e7. Consider uploading reports for the commit 9cae7e7 to get more accurate results

Files	Patch %	Lines
torch_frame/datasets/data_frame_text_benchmark.py	77.58%	13 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
+ Coverage   93.18%   93.67%   +0.48%     
==========================================
  Files         117      119       +2     
  Lines        6135     6243     +108     
==========================================
+ Hits         5717     5848     +131     
+ Misses        418      395      -23

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

torch_frame/datasets/data_frame_text_benchmark.py

weihua916

LGTM! I am just curious if we have more datasets from HuggingFace (can be a follow-up, not a blocker)

test/datasets/test_data_frame_text_benchmark.py

Update

f0d5be7

zechengz self-assigned this Feb 11, 2024

github-actions bot added dataset example labels Feb 11, 2024

Update

4007f90

zechengz added the skip-changelog label Feb 11, 2024

Update

15f0444

zechengz commented Feb 13, 2024

View reviewed changes

torch_frame/datasets/data_frame_text_benchmark.py Show resolved Hide resolved

Update

af36c33

zechengz changed the title ~~[Draft] Benchmark of text dataset~~ Add the benchmark of text dataset Feb 14, 2024

zechengz requested a review from weihua916 February 14, 2024 09:20

zechengz commented Feb 14, 2024

View reviewed changes

torch_frame/datasets/data_frame_text_benchmark.py Show resolved Hide resolved

zechengz marked this pull request as ready for review February 14, 2024 09:23

weihua916 approved these changes Feb 14, 2024

View reviewed changes

test/datasets/test_data_frame_text_benchmark.py Outdated Show resolved Hide resolved

test/datasets/test_data_frame_text_benchmark.py Show resolved Hide resolved

zechengz added 2 commits February 14, 2024 19:09

Update

c0b5385

Update

ec75543

github-actions bot added the data label Feb 15, 2024

Update

9cae7e7

zechengz enabled auto-merge (squash) February 15, 2024 03:22

zechengz merged commit 3cb8ba3 into master Feb 15, 2024
12 checks passed

zechengz deleted the zecheng_text_bench branch February 15, 2024 03:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the benchmark of text dataset #349

Add the benchmark of text dataset #349

zechengz commented Feb 11, 2024 •

edited

Loading

codecov bot commented Feb 11, 2024 •

edited

Loading

weihua916 left a comment

Add the benchmark of text dataset #349

Add the benchmark of text dataset #349

Conversation

zechengz commented Feb 11, 2024 • edited Loading

codecov bot commented Feb 11, 2024 • edited Loading

Codecov Report

weihua916 left a comment

Choose a reason for hiding this comment

zechengz commented Feb 11, 2024 •

edited

Loading

codecov bot commented Feb 11, 2024 •

edited

Loading