Skip to content

Commit 7df337c

Browse files
committed
Update blog
1 parent 7cff0c7 commit 7df337c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

content/blog/2025-10-27-1761560082.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ tags:
88
- sdkit
99
---
1010

11-
As a note to myself, a possible intuition for understanding GPU memory hierarchy (and the performance penalty for data transfer between various layers):
11+
As a note to myself, a possible intuition for understanding GPU memory hierarchy (and the performance penalty for data transfer between various layers) is to think of it like a manufacturing logistics problem:
1212
1. CPU (host) to GPU (device) is like travelling overnight between two cities. The CPU city is like the "headquarters", and contains a mega-sized warehouse of parts (think football field sizes), also known as 'Host memory'.
1313
2. Each GPU is like a different city, containing its own warehouse outside the city, also known as 'Global Memory'. This warehouse stockpiles whatever it needs from the headquarters city (CPU).
1414
3. Each SM/Core/Tile is a factory located in different areas of the city. Each factory contains a small warehouse (shed) for stockpiling whatever inventory it needs, also known as 'Shared Memory'.

0 commit comments

Comments
 (0)