Caching and Parallelism Improvements to Data Streaming #7

PJEstrada · 2021-09-03T20:18:09Z

Main features:

Added a caching mechanism to save 1GB of data to avoid duplicate requests
Added a threaded approach to fetch the next file.
Proactively fetch next N files to reduce network waiting times.
Stop using session in request to allow for usage of multiple workers in dataloaders of pytorch.

PJEstrada added 4 commits September 3, 2021 09:49

feat: add caching and parallel fetching

9a699a5

cleanup

ea6192f

fix: support multiprocessing context

6bf6b00

cleanup

1011f43

PJEstrada self-assigned this Sep 3, 2021

PJEstrada requested a review from anthony-chaudhary September 3, 2021 20:18

PJEstrada added the ready_to_merge label Sep 3, 2021

PJEstrada merged commit 1a4a37f into main Sep 3, 2021

PJEstrada deleted the caching-and-parallel-fetch branch September 3, 2021 20:19

Provide feedback