-
Notifications
You must be signed in to change notification settings - Fork 65
chore: Add Benchmark Scripts and Performance Comparison of LitData vs FFCV for Streaming ImageNet #572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Add Benchmark Scripts and Performance Comparison of LitData vs FFCV for Streaming ImageNet #572
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #572 +/- ##
===================================
Coverage 79% 79%
===================================
Files 41 41
Lines 6135 6135
===================================
Hits 4835 4835
Misses 1300 1300 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a folder with the files to do the benchmarks ?
Sure @tchaton. |
…PEG and PIL formats
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces benchmark scripts for streaming and optimizing the ImageNet dataset using both LitData and FFCV, and updates documentation with usage instructions and a performance comparison.
- Added LitData scripts: dataset optimization and streaming benchmarks.
- Added FFCV scripts: dataset conversion, writing to FFCV format, streaming benchmarks, and installer.
- Updated READMEs and main documentation with usage examples and a local performance comparison table.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
benchmarks/litdata/stream_imagenet.py | New LitData streaming benchmark script |
benchmarks/litdata/optimize_imagenet.py | New LitData dataset optimization script |
benchmarks/litdata/README.md | Usage docs for LitData scripts |
benchmarks/ffcv/write_imagenet.py | Script to write datasets in FFCV format |
benchmarks/ffcv/stream_imagenet.py | Script to stream and benchmark FFCV datasets |
benchmarks/ffcv/convert_imagenet.py | Script to convert raw ImageNet synset folders |
benchmarks/ffcv/install_ffcv.sh | Installer for FFCV dependencies |
benchmarks/ffcv/README.md | Usage docs for FFCV scripts |
benchmarks/README.md | Top-level benchmarks overview |
README.md | Added local disk performance comparison table |
Comments suppressed due to low confidence (1)
benchmarks/litdata/optimize_imagenet.py:103
- When --resize is enabled but --resize_size is not provided, resize_size defaults to None and no resizing occurs silently. Consider validating that resize_size is provided when --resize is set and erroring otherwise.
parser.add_argument("--resize_size", type=int, nargs="+", default=None, help="Resize size: int for max dimension (aspect ratio preserved), or two ints for (width height)")
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Real nice !
What does this PR do?
Benchmarks for LitData vs FFCV
Speed to stream Imagenet 1.2M from local disk with ffcv vs LitData:
Benchmark Logs
LitData | JPEG 90% | 12 GB
LitData | RAW | 168 GB
FFCV | JPEG 90% | 20 GB
FFCV | RAW | 170GB
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃