Skip to content

Custom download method #2960

Open
Open
@AmitMY

Description

@AmitMY

Is your feature request related to a problem? Please describe.
I have a dataset that requires a bit more complicated download method than usual (for example, add some headers)

Describe the solution you'd like
I would like to have a method: dl_manager.download_custom that is given:

  1. a URL or list of URLs
  2. a custom download method that receives:
    a. a single URL
    b. local file destination path

So I could implement custom downloads.

Full code I want to write:

def my_custom_download(url: str, local_path: str):
  opener = urllib.request.build_opener()
  opener.addheaders = {...my headers...}
  urllib.request.install_opener(opener)
  urllib.request.urlretrieve(url, local_path)

URLs = ['url1', 'url2', 'url3']
dl_manager.download_custom(URLs, my_custom_download)

Describe alternatives you've considered
Doing my download without the download manager, but then I'll just hack around where to save the files. the dl_manager seems like the correct place to do this.

Additional context
This method exists in huggingface/datasets, and I think is well motivated.
This is not just for headers, but also for other download methods (for example, download over scp)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions