Closed
Description
We are implementing asynchronous computing in DeepLearning.scala 2.0.
However, in order to maximize the throughput, we need on-device computing graph instead of CPU driven asynchronous computing.
In DeepLearning.scala 3.0, we will implement applicative-based computing graph, avoiding flatMap
or map
. We will keep a proper number of kernel for an on-device command queue, e.g. 3 kernels. Most of the on-CPU Futures await for command queue available, instead of awaiting for result.
- Counting the load of command queue
- Make tuples of Buffer and Event
Metadata
Metadata
Assignees
Labels
No labels