You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/blog/tanstack-start-ssr-performance-600-percent.md
+13-3Lines changed: 13 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,10 +10,12 @@ title: 'From 3000ms to 14ms: CPU profiling of TanStack Start SSR under heavy loa
10
10
# title: '343x Faster Latency p95: Profiling SSR Hot Paths in TanStack Start'
11
11
---
12
12
13
-
## Executive summary
13
+
## TL;DR
14
14
15
15
We improved TanStack Start's SSR performance dramatically. Under sustained load (100 concurrent connections, 30 seconds):
16
16
17
+
<!-- these are matteo's numbers, they don't lool amazing (low throughput), maybe we should use our own numbers? we'll cite his in the conclusion anyway. -->
@@ -33,6 +35,9 @@ We did it with a repeatable process, not a single clever trick:
33
35
34
36
The changes span over 20 PRs; we highlight the highest-impact patterns below.
35
37
38
+
39
+
<!-- the "What we optimized" section and "Methodology" feel a little redundant because "what we optimized" doesn't actually say what we optimized, just *how* we did it, which is part of the methodology. -->
40
+
36
41
## What we optimized (and what we did not)
37
42
38
43
This work started after `v1.154.4` and targets server-side rendering performance. The goal was to increase throughput and reduce server CPU time per request while keeping correctness guarantees.
@@ -82,6 +87,9 @@ The resulting flamegraph can be read with a tool like [Speedscope](https://www.s
82
87
- Fix one hotspot, re-run, and re-profile.
83
88
- Prefer changes that remove work in the steady state, not just shift it.
84
89
90
+
91
+
<!-- what do we want to say with these flamegraphs? How to understand them? We're already showing flamegraphs for every *finding* below. I'm not really sure what to say here. -->
92
+
85
93
Placeholders you should replace with real screenshots:
86
94
87
95
-`<!-- FLAMEGRAPH: links-100 before -->`
@@ -189,7 +197,7 @@ Taking the example of the `useRouterState` hook, we can see that most of the cli
189
197
190
198
### The mechanism
191
199
192
-
Client code cares about bundle size. Server code cares about CPU time per request. Those constraints are different.
200
+
Client code cares about bundle size. Server code cares about CPU time per request. Those constraints are different (this is a *general* rule, not a *universal* one).
193
201
194
202
If you can guard a branch with a **build-time constant** like `isServer`, you can:
195
203
@@ -292,7 +300,7 @@ To be clear: TanStack Start was not broken before these changes. Under normal tr
292
300
293
301
The following graphs show event-loop utilization[^elu] against throughput for each feature-focused endpoint, before and after the optimizations. Lower utilization at the same req/s means more headroom; higher req/s at the same utilization means more capacity.
294
302
295
-
For reference, the machine on which these were measured reaches 100% event-loop utilization at 100k req/s on an empty node http server.
303
+
For reference, the machine on which these were measured reaches 100% event-loop utilization at 100k req/s on an empty node http server[^empty-node-http-server].
296
304
297
305
#### 100 links per page
298
306
@@ -329,3 +337,5 @@ There were many other improvements (client and server) not covered here. SSR per
329
337
[^dce]: Dead code elimination is a standard compiler optimization. See esbuild's documentation on [tree shaking](https://esbuild.github.io/api/#tree-shaking), Rollup's [tree-shaking guide](https://rollupjs.org/introduction/#tree-shaking) and Rich Harris's article on [dead code elimination](https://medium.com/@Rich_Harris/tree-shaking-versus-dead-code-elimination-d3765df85c80).
330
338
331
339
[^elu]: Event-loop utilization is the percentage of time the event loop is busy utilizing the CPU. See this [nodesource blog post](https://nodesource.com/blog/event-loop-utilization-nodejs) for more details.
340
+
341
+
[^empty-node-http-server]: To get a reference for the values we were measuring, we ran a similar `autocannon` benchmark on the smallest possible node http server: `require('http').createServer((q,s)=>s.end()).listen(3000)`. This tells us the *theoretical* maximum throughput of the machine and test setup.
0 commit comments