Replies: 1 comment 1 reply
-
Hi! You can check #2252 to find more info on this behavior. We plan to add an option to shard arrow files on save to address this. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a very large dataset with 32M examples stored as .arrow table using
save_to_disk
. When I useload_from_disk
to load this dataset the first time (i.e., the first time after a reboot for example), it's really slow and takes > 10 minutes to complete. For every subsequent call toload_to_disk
it's very fast and completes in a fraction of a second. Why does this happen? Is this due to some caching to memory? Can the cache be set to create to disk instead?Beta Was this translation helpful? Give feedback.
All reactions