You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: html/data-locality.html
+7-7Lines changed: 7 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -116,11 +116,11 @@ <h3><a href="#a-data-warehouse" name="a-data-warehouse">A data warehouse</a></h3
116
116
some more boxes that are next to it. He doesn’t know if you want those (and, given his work ethic, clearly doesn’t care); he just takes as many as he can fit on the pallet.</p>
117
117
<p>He loads the whole pallet and brings it to you. Disregarding concerns for workplace safety, he drives the forklift right in and drops the pallet in the corner of your office.</p>
118
118
<p>When you need a new box, now, the first thing you do is see if it’s already on the pallet in your office. If it is, great! It just takes you a second to grab it and you’re back to crunching numbers. If a pallet holds fifty boxes and you got lucky and <em>all</em> of the boxes you need happen to be on it, you can churn through fifty times more work than you could before.</p>
119
-
<p>But, if you need a box that’s <em>not</em> on the pallet, you’re back to square one. Since you can only fit one pallet in your office, your warehouse friend will have to take that one back and then bring you entirely new one.</p>
119
+
<p>But, if you need a box that’s <em>not</em> on the pallet, you’re back to square one. Since you can only fit one pallet in your office, your warehouse friend will have to take that one back and then bring you an entirely new one.</p>
120
120
<h3><ahref="#a-pallet-for-your-cpu" name="a-pallet-for-your-cpu">A pallet for your CPU</a></h3>
121
-
<p>Strangely enough, this is similiar to how CPUs in modern computers work. In case it isn’t obvious, you play the role of the CPU. Your desk is the CPU’s registers, and the box of papers is the data you can fit in them. The warehouse is your machine’s RAM, and that annoying warehouse guy is the bus that pulls data from main memory into registers.</p>
121
+
<p>Strangely enough, this is similar to how CPUs in modern computers work. In case it isn’t obvious, you play the role of the CPU. Your desk is the CPU’s registers, and the box of papers is the data you can fit in them. The warehouse is your machine’s RAM, and that annoying warehouse guy is the bus that pulls data from main memory into registers.</p>
122
122
<p>If I were writing this chapter thirty years ago, the analogy would stop there. But as chips got faster and RAM, well, <em>didn’t</em>, hardware engineers started looking for solutions. What they came up with was <em>CPU caching</em>.</p>
123
-
<p>Modern computers have a <spanname="caches">little chunk</span> of memory right inside the chip. It’s small because it has to fit in the chip. The CPU can pull data from this much faster than it can main memory in large part because it’s physically closer to the registers. The electrons have a shorter distance to travel.</p>
123
+
<p>Modern computers have a <spanname="caches">little chunk</span> of memory right inside the chip. It’s small because it has to fit in the chip. The CPU can pull data from this much faster than it can from main memory, in large part because it’s physically closer to the registers. The electrons have a shorter distance to travel.</p>
124
124
<asidename="caches">
125
125
126
126
<p>Modern hardware actually has multiple levels of caching, which is what they mean when you hear "L1", "L2", "L3", etc. Each level is larger but slower than the previous. For this chapter, we won’t worry about the fact that memory is actually a <ahref="http://en.wikipedia.org/wiki/Memory_hierarchy">hierarchy</a>, but it’s important to know.</p>
@@ -134,7 +134,7 @@ <h3><a href="#a-pallet-for-your-cpu" name="a-pallet-for-your-cpu">A pallet for y
134
134
<p>I glossed over (at least) one detail in the analogy. In your office, there was only room for one pallet, or one cache line. A real cache contains a number of cache lines. The details about how those work is out of scope here, but search for "cache associativity" to feed your brain.</p>
135
135
</aside>
136
136
137
-
<p>When a cache miss occurs, the CPU <em>stalls</em>: it can’t process the next instruction because needs data. It sits there, bored out of its mind for a few hundred cycles until the fetch completes. Our mission is to avoid that. Imagine you’re trying to optimize some performance critical piece of game code and it looks like this:</p>
137
+
<p>When a cache miss occurs, the CPU <em>stalls</em>: it can’t process the next instruction because it needs data. It sits there, bored out of its mind for a few hundred cycles until the fetch completes. Our mission is to avoid that. Imagine you’re trying to optimize some performance critical piece of game code and it looks like this:</p>
<p>It all boils down to something pretty simple: whenever the chip reads some memory, it gets a whole cache line. The more you can use stuff in that <spanname="line">cache line, the faster you go</span>. So the goal then is to <em>organize your data structures so that the things you’re processing are next to each other in memory</em>.</p>
165
165
<asidename="line">
166
166
167
-
<p>There’s a key assumption here, though: one thread. If you are accessing nearby data on multiple threads, it’s faster to have it on <em>different</em> cache lines. If two threads try to use data on the same cache line, both cores have to do some costly sychronization of their caches.</p>
167
+
<p>There’s a key assumption here, though: one thread. If you are accessing nearby data on multiple threads, it’s faster to have it on <em>different</em> cache lines. If two threads try to use data on the same cache line, both cores have to do some costly synchronization of their caches.</p>
168
168
</aside>
169
169
170
170
<p>In other words, if your code is crunching on <code>Thing</code> then <code>Another</code> then <code>Also</code>, you want them laid out in memory like this:</p>
@@ -176,7 +176,7 @@ <h2><a href="#when-to-use-it" name="when-to-use-it">When to Use It</a></h2>
176
176
<p>Like most optimizations, the first guideline for using it is <em>when you have a performance problem.</em> Don’t waste time applying this to some infrequently executed corner of your codebase. Optimizing code that doesn’t need it just makes your life harder since the result is almost always more complex and less flexible.</p>
177
177
<p>With this pattern specifically, you’ll also want to be sure your performance problems <em>are caused by cache misses</em>. If your code is slow for other reasons, this won’t help.</p>
178
178
<p>The cheap way to profile is to manually add a bit of instrumentation that checks how much time has elapsed between two points in the code, hopefully using a precise timer. To catch cache misses, you’ll want something a little more sophisticated. You really want to see how many cache misses are occurring and where.</p>
179
-
<p>Fortunately, there are <spanname="cachegrind">profilers</span> out that there report it. It’s worth spending the time to get one of these working and make sure you understand the (surprisingly complex) numbers it throws at you before you do major surgery on your data structures.</p>
179
+
<p>Fortunately, there are <spanname="cachegrind">profilers</span> out there that report it. It’s worth spending the time to get one of these working and make sure you understand the (surprisingly complex) numbers it throws at you before you do major surgery on your data structures.</p>
180
180
<asidename="cachegrind">
181
181
182
182
<p>Unfortunately, most of those tools aren’t cheap. If you’re on a console dev team, you probably already have licenses for them.</p>
<p>Let’s do something better. Our first observation is that the only reason we follow a pointer to get to the game entity is so we can immediately follow <em>another</em> pointer to get to a component. <code>GameEntity</code> itself has no interesting state and no useful methods. The <em>components</em> are what the game loop cares about.</p>
318
-
<p>Instead of a giant constellation of game entities and components scattered across the inky darkess of address space, we’re going to get back down to Earth. We’ll have a big array for each type of component: a flat array of AI components, another for physics, and another for rendering.</p>
318
+
<p>Instead of a giant constellation of game entities and components scattered across the inky darkness of address space, we’re going to get back down to Earth. We’ll have a big array for each type of component: a flat array of AI components, another for physics, and another for rendering.</p>
0 commit comments