Open
Description
Is your feature request related to a problem? Please describe.
I have a large list of blobs I want to download using the azure SDK. I can do a loop over the list and call the client's download_blob
for each blob sequentially but this is very slow.
I implemented a class derived from azure.storage.blob.ContainerClient
that uses ThreadPoolExecutor
to do the downloads in parallel, with a new method with this interface:
def download_blobs_to_files(
self,
blob_filename_pairs: Iterable[Tuple[str, str]],
concurrency_limit: int = 1000,
verbose: bool = False,
) -> int:
"""Downloads a list of files from an azure blob container.
Args:
blob_filename_pairs: List[Tuple[str, str]]:List of blob and local path pairs
concurrency_limit: Maximum number of threads.
verbose: controls verbosity of the function.
It works is it a bit brittle and it is not clear how to automatically choose the right number of threads (concurrency_limit
). Ideally this would be a feature supported by the Azure SDK. It seems to me to be a frequent user need.
Metadata
Metadata
Assignees
Labels
This issue points to a problem in the data-plane of the library.Workflow: This issue is responsible by Azure service team.Storage Service (Queues, Blobs, Files)Issues that are reported by GitHub users external to the Azure organization.This issue requires a new behavior in the product in order be resolved.Workflow: This issue needs attention from Azure service team or SDK team