-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Set default in-memory value depending on the dataset size (#2182)
* Create config variable to set in_memory default * Use config in_memory in load * Revert "Create config variable to set in_memory default" This reverts commit cf552f8. * Create config variable to set max in-memory dataset size * Create function to assess if a dataset is small * Use dataset-size in_memory in load * Use dataset-size in_memory in Dataset(Dict).load_from_disk * Fix is_small_dataset for None dataset_size * Fix tests by passing keep_in_memory=False * Fix is_small_dataset for None config max dataset size * Explain new behavior of keep_in_memory in docstrings * Test is_small_dataset * Update docstring of Dataset.load_from_disk * Set default MAX_IN_MEMORY_DATASET_SIZE to 250 MiB * Rename MAX_IN_MEMORY_DATASET_SIZE to MAX_IN_MEMORY_DATASET_SIZE_IN_BYTES * Add docstring to is_small_dataset * Monkeypatch MAX_IN_MEMORY_DATASET_SIZE_IN_BYTE for test call only * Add a note in the docs about this behavior * Force rerun checks * tmp * Fix style * Fix style * Revert "tmp" This reverts commit 8a1af5a. * Fix docs * Add test for load_dataset * Add test for load_from_disk * Implement estimate_dataset_size * Use estimate_dataset_size
- Loading branch information
1 parent
2a05294
commit d5cfc5a
Showing
10 changed files
with
200 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
d5cfc5a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show benchmarks
PyArrow==1.0.0
Show updated benchmarks!
Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!
Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json