You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm loading a dataset constituted of 44 GB of compressed JSON files.
When loading the dataset with the JSON script, extracting the files create about 200 GB of uncompressed files before creating the 180GB of arrow cache tables
Having a simple way to delete the extracted files after usage (or even better, to stream extraction/delete) would be nice to avoid disk cluter.
I can maybe tackle this one in the JSON script unless you want a more general solution.