You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/blog/2025-10-27-1761560082.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ For e.g. reading constantly from the Global Memory is like driving between the f
21
21
Therefore the job of running a computation graph (like ONNX) efficiently on GPU(s) is like planning the logistics of a manufacturing company. You've got raw materials in the main warehouse that you need to transfer between cities, and store/process/transfer artifacts across different factories and machines. You need to make sure that:
22
22
- the production process follows the chart laid out in the computation graph.
23
23
- every machine in each factory is being utilized optimally
24
-
- account for the time it takes to move things between cities/factories/machines.
24
+
- account for the time it takes to move things between cities/factories/machines
25
25
26
26
And most importantly, you need to focus on your overall goal, i.e. either the time it takes to produce the finished product (i.e. latency) or maximum utilisation of all your machines (i.e. throughput).
0 commit comments