-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify datasets cache path from references with regular PyTorch cache? #6727
Comments
Thanks for reporting @pmeier. Ideally we would like to move away from needing to pre-read the dataset and cache it. This is currently necessary due to the way that the Video Clipping class works but this causes issues with streamed datasets. @YosuaMichael is looking to fix this. |
@YosuaMichael if we won't support caching in the future, feel free to close this issue. |
@datumbox In the case of VideoClipping, we indeed cache the dataset because we pre-compute all the non-sampled clips start and end. However, seems like this cache concept is not just for video dataset but rather for general dataset (for classification too). Also, I am not sure yet if we will get rid of cache (for performance reason) even if we change the clip sampler design, so I think this issue should be still open for now. |
This will more likely be def set_home(root, asset="all"):
# asset can be "all", "datasets", "models", "tutorials", etc.
# this is placed in the main namespace e.g. torchvision.set_home() or torchtext.set_home()
# Note: using set_home(home=...) doesn’t persist across Python executions
def get_home(asset):
# Priority (highest = 0)
# 0. whatever was set earlier in the program through `set_home(root=root, asset=asset)`
# 1. asset-specific env variable e.g. $TORCHTEXT_DATASETS_HOME
# 2. domain-wide env variable + asset name, e.g. $TORCHTEXT_HOME / datasets
# 3. default, which corresponds to torch.hub._get_torch_home() / DOMAIN_NAME / ASSET_NAME
# typically ~/.cache/torch/vision/datasets
# ^^^^^^^^^^^^
# This is returned by _get_torch_home()
# and can get overridden with the $TORCH_HOME variable as well.
pass So perhaps we'll want to go with |
In the
classification
andvideo_classification
references, we cache here:vision/references/classification/train.py
Line 108 in 6e203b4
vision/references/video_classification/train.py
Line 124 in 6e203b4
However, this directory is not used by PyTorch core. Instead,
~/.cache/torch
is used. For example,torch.hub
caches in~/.cache/torch/hub
. The datasets v2 used the same root folder and will store the datasets by default invision/torchvision/_internally_replaced_utils.py
Line 7 in 6e203b4
which expands to
~/.cache/torch/datasets/vision
.Maybe we can use
~/.cache/torch/cached_datasets
or something similar as cache path in the references?cc @datumbox @vfdev-5
The text was updated successfully, but these errors were encountered: