Replies: 2 comments 3 replies
-
On this - do you know of large scale xbatcher examples in this space? e.g. xarray overhead on top of other things for deep learning will slow things down? |
Beta Was this translation helpful? Give feedback.
1 reply
-
Large scale - I guess things with billions of pixels upwards, not simple examples that just look at a satellite scene or two. Cloud rate limiting/bandwidth can be 'slow network' in that sense. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Background
To keep the utilization rate of GPUs at a high level whilst training neural networks, the data loading pipeline needs to be able to match the GPU in speed. Assuming that the data preprocessing mostly happens on a CPU, then this usually involves some form of concurrency and/or parallelism.
However, Python 3.x (specifically CPython) has the Global Interpreter Lock (GIL), which makes it fast for single thread applications, but sub-optimal for multi-threading/parallelization, see https://realpython.com/python-gil and other sources for more information.
CPU/RAM/IO-bound limitations
So to make parallelization possible, several methods are used, and which one to use depends on what your processing is bounded by:
Or as this diagram from Dask-ML (https://ml.dask.org/#dimensions-of-scale) shows:
Note that the I/O dimension isn't mentioned, though dask does have an advanced mode to support async/await operations (https://docs.dask.org/en/stable/deploying-python-advanced.html#start-many-in-one-event-loop).
Breaking speeds in cloud-native workflows
One could argue that 'cloud' processing virtually eliminates compute (CPU) and memory (RAM) limitations, given enough $$ resources. The main bottleneck that remains is thus in communication overhead with I/O operations. There are several aspects to this:
Miss any one of the above (e.g. not working in the same cloud region, doing mixed CPU-GPU processing, reading from non cloud-optimized HDF5 or other files) can result in latency. Ideally, you would tackle these at the root, but not everyone has the privilege of gaining access to architecting the cloud infrastructure or working with the latest file formats.
So what is the solution then to overcome latency?
Redesign for async?
References:
Beta Was this translation helpful? Give feedback.
All reactions