Releases: Lightning-AI/litData
Releases · Lightning-AI/litData
v0.2.48
What's Changed
- readme: update Maintainers by @Borda in #594
- chore: Add Benchmark Scripts and Performance Comparison of LitData vs FFCV for Streaming ImageNet by @bhimrazy in #572
- fix: Move cache warning under debug by @bhimrazy in #598
- Add support for torch.uint16 data type by @bhimrazy in #597
- fix: Add error handling for empty Parquet files while indexing and corresponding tests by @bhimrazy in #601
- fix: boto3 session options by @deependujha in #604
- bump version 0.2.48 by @deependujha in #605
Full Changelog: v0.2.47...v0.2.48
v0.2.47
What's Changed
- feat: Add support for path in map fn by @deependujha in #582
- ci: Add Python 3.11 to CI testing matrix by @bhimrazy in #585
- fix: docs failing in ci by @deependujha in #586
- fix: multi-node parquet indexing by @deependujha in #583
- bump version 0.2.47 by @deependujha in #587
Full Changelog: v0.2.46...v0.2.47
Release v0.2.46
What's Changed
- Feat: Add
per_stream
batching method to CombinedStreamingDataset by @schopra8 in #438 - Fix parquet cache by @philgzl in #560
- refactor: StreamingDataset variable names for better readability by @deependujha in #557
- feat: Add GitHub Actions workflow for
@benchmark
bot by @deependujha in #561 - fix:
@benchmark
bot fixes by @deependujha in #565 - Fix
IndexError
when resuming after some workers are done by @philgzl in #567 - ref: simplify cache dir creation and remove repeated parts by @bhimrazy in #568
- fix: suppress FileNotFoundError when acquiring file lock for count file by @bhimrazy in #570
- fix: Consolidate Cache Handling + Fix DDP Multi-Indexing for huggingface datasets by @bhimrazy in #569
- update readme to include best practices for image data optimization by @bhimrazy in #577
New Contributors
Full Changelog: v0.2.45...v0.2.46
v0.2.45
What's Changed
- Fixes the logic for
is_last_index
. by @bhimrazy in #531 - Fix: redundant chunk index download request in BinaryReader , when dataset in iter mode by @bhimrazy in #535
- Update
JPEGSerializer
(deserialize) to return as a tensor and also maketorchvision
required dependency by @bhimrazy in #541 - nitpick: readme incorrect
transfer
spelling by @deependujha in #543 - Update macos version to 14 in CI by @bhimrazy in #545
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #542
- Fix/last chunk deletion by @bhimrazy in #536
- Add file filtering support to
StreamingDataset
for Parquet datasets by @philgzl in #546 - Add papers with litdata and citation by @tchaton in #547
- Feat/add jpeg array serializer by @bhimrazy in #537
- feat: better debug & profile with logs & Litracer by @deependujha in #528
- add Github reticular repo by @tchaton in #548
- feat: add Litracer docs in readme by @deependujha in #549
- docs: add benchmark speed for r2 by @bhimrazy in #551
- bump: version 0.2.45 by @deependujha in #555
New Contributors
Full Changelog: v0.2.44...v0.2.45
Release v0.2.44
What's Changed
- Remove
.lock
download skipping, skip locks on force download by @JackUrb in #519 - pre-release bump 0.2.44 by @tchaton in #530
Full Changelog: v0.2.43...v0.2.44
v0.2.43
What's Changed
- Fix: resume issues with resuming in combined streaming dataset in dataloader by @bhimrazy in #507
- fix: s3 error by @deependujha in #510
- Fix: unsigned s5cmd requests and also add option to disable s5cmd by @bhimrazy in #513
- Turn on DEBUG logging based on DEBUG_LITDATA environment variable by @ouj in #518
- Feat: Update indexing of parquet dataset and also add streaming support to huggingface datasets by @bhimrazy in #505
- feat: correctly propagate storage_options by @deependujha in #514
- fix: remove warnings for Streaming Dataset with hf dataset and shuffle enabled by @bhimrazy in #520
- Revert '#506 Add s5cmd' – as boto3 Outperforms s5cmd in Latest Benchmarks by @bhimrazy in #521
- Upd/hf-dataset-get-format by @bhimrazy in #522
- Update documentation on Streaming Parquet Datasets from Huggingface and other cloud providers by @bhimrazy in #523
- Bump version to 0.2.43 by @bhimrazy in #525
- fix package config by @Borda in #526
- example: sine function model prediction with litdata & pytorch-lightning by @deependujha in #517
- fixing package & releasing by @Borda in #529
Full Changelog: v0.2.42...v0.2.43
Release v0.2.42
What's Changed
- Add register function for downloader by @ouj in #496
- Allow for more lenient state resume. by @JackUrb in #497
- Slighy faster speed by @tchaton in #503
- Add s5cmd by @tchaton in #506
- Feat: add support for gcp by @deependujha in #504
- Bump version 0.2.42 by @tchaton in #508
New Contributors
Full Changelog: v0.2.41...v0.2.42
v0.2.41
What's Changed
- doc: improve dev doc by @deependujha in #488
- Expose optimize dns by @tchaton in #498
- Update
_get_folder_size
: Reduce Logs noise and switch toos.scandir
by @bhimrazy in #499 - Bump: version 0.2.41 by @deependujha in #500
Full Changelog: v0.2.40...v0.2.41
v0.2.40
What's Changed
- fix:
clean parquet dir cache
fixture by @deependujha in #474 - Fix: Allow using
Machine
types inmap
by @ethanwharris in #473 - 🛠️ Fix: Ensure
chunk_bytes
inindex.json
matches actual chunk file size by @bhimrazy in #478 - fix: _get_folder_size fn by @deependujha in #471
- Added boolean serialiser called by litdata.optimise() by @DominiquePaul in #481
- Doc: improve dev doc & add ToDos by @deependujha in #479
- upd: Add hf file download progress and update local file path by @bhimrazy in #484
- fix: segmentation fault error in streaming tokens by @bhimrazy in #485
- Warn user if
max_cache_size
is less than 25GB in StreamingDataset by @bhimrazy in #489 - fix: Properly assign the chunks to the right worker by @tchaton in #449
- Bump version to 0.2.40 by @bhimrazy in #491
New Contributors
- @ethanwharris made their first contribution in #473
- @DominiquePaul made their first contribution in #481
Full Changelog: v0.2.39...v0.2.40
Release 0.2.39
What's Changed
- Feat: add support for HuggingFace datasets by @deependujha in #462
- Using count-locks for multi-node-single-cache support by @JackUrb in #468
- Bump version to 0.2.39 by @tchaton in #470
Full Changelog: v0.2.38...v0.2.39