You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/blog/2025-10-27-1761560082.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ tags:
8
8
- sdkit
9
9
---
10
10
11
-
As a note to myself, a possible intuition for understanding GPU memory hierarchy (and the performance penalty for data transfer between various layers):
11
+
As a note to myself, a possible intuition for understanding GPU memory hierarchy (and the performance penalty for data transfer between various layers) is to think of it like a manufacturing logistics problem:
12
12
1. CPU (host) to GPU (device) is like travelling overnight between two cities. The CPU city is like the "headquarters", and contains a mega-sized warehouse of parts (think football field sizes), also known as 'Host memory'.
13
13
2. Each GPU is like a different city, containing its own warehouse outside the city, also known as 'Global Memory'. This warehouse stockpiles whatever it needs from the headquarters city (CPU).
14
14
3. Each SM/Core/Tile is a factory located in different areas of the city. Each factory contains a small warehouse (shed) for stockpiling whatever inventory it needs, also known as 'Shared Memory'.
0 commit comments