Open
Description
Is your feature request related to a problem? Please describe.
I have a dataset that requires a bit more complicated download method than usual (for example, add some headers)
Describe the solution you'd like
I would like to have a method: dl_manager.download_custom
that is given:
- a URL or list of URLs
- a custom download method that receives:
a. a single URL
b. local file destination path
So I could implement custom downloads.
Full code I want to write:
def my_custom_download(url: str, local_path: str):
opener = urllib.request.build_opener()
opener.addheaders = {...my headers...}
urllib.request.install_opener(opener)
urllib.request.urlretrieve(url, local_path)
URLs = ['url1', 'url2', 'url3']
dl_manager.download_custom(URLs, my_custom_download)
Describe alternatives you've considered
Doing my download without the download manager, but then I'll just hack around where to save the files. the dl_manager seems like the correct place to do this.
Additional context
This method exists in huggingface/datasets
, and I think is well motivated.
This is not just for headers, but also for other download methods (for example, download over scp)