Skip to content

Commit f2a0e33

Browse files
committed
todos + theoretical maximum explanation
1 parent 4e67995 commit f2a0e33

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

src/blog/tanstack-start-ssr-performance-600-percent.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,12 @@ title: 'From 3000ms to 14ms: CPU profiling of TanStack Start SSR under heavy loa
1010
# title: '343x Faster Latency p95: Profiling SSR Hot Paths in TanStack Start'
1111
---
1212

13-
## Executive summary
13+
## TL;DR
1414

1515
We improved TanStack Start's SSR performance dramatically. Under sustained load (100 concurrent connections, 30 seconds):
1616

17+
<!-- these are matteo's numbers, they don't lool amazing (low throughput), maybe we should use our own numbers? we'll cite his in the conclusion anyway. -->
18+
1719
- **Throughput**: 477 req/s → 1,041 req/s (**2.2x**)
1820
- **Average latency**: 3,171ms → 14ms (**231x faster**)
1921
- **p95 latency**: 10,001ms (timeout) → 29ms (**343x faster**)
@@ -33,6 +35,9 @@ We did it with a repeatable process, not a single clever trick:
3335

3436
The changes span over 20 PRs; we highlight the highest-impact patterns below.
3537

38+
39+
<!-- the "What we optimized" section and "Methodology" feel a little redundant because "what we optimized" doesn't actually say what we optimized, just *how* we did it, which is part of the methodology. -->
40+
3641
## What we optimized (and what we did not)
3742

3843
This work started after `v1.154.4` and targets server-side rendering performance. The goal was to increase throughput and reduce server CPU time per request while keeping correctness guarantees.
@@ -82,6 +87,9 @@ The resulting flamegraph can be read with a tool like [Speedscope](https://www.s
8287
- Fix one hotspot, re-run, and re-profile.
8388
- Prefer changes that remove work in the steady state, not just shift it.
8489

90+
91+
<!-- what do we want to say with these flamegraphs? How to understand them? We're already showing flamegraphs for every *finding* below. I'm not really sure what to say here. -->
92+
8593
Placeholders you should replace with real screenshots:
8694

8795
- `<!-- FLAMEGRAPH: links-100 before -->`
@@ -189,7 +197,7 @@ Taking the example of the `useRouterState` hook, we can see that most of the cli
189197

190198
### The mechanism
191199

192-
Client code cares about bundle size. Server code cares about CPU time per request. Those constraints are different.
200+
Client code cares about bundle size. Server code cares about CPU time per request. Those constraints are different (this is a *general* rule, not a *universal* one).
193201

194202
If you can guard a branch with a **build-time constant** like `isServer`, you can:
195203

@@ -292,7 +300,7 @@ To be clear: TanStack Start was not broken before these changes. Under normal tr
292300

293301
The following graphs show event-loop utilization[^elu] against throughput for each feature-focused endpoint, before and after the optimizations. Lower utilization at the same req/s means more headroom; higher req/s at the same utilization means more capacity.
294302

295-
For reference, the machine on which these were measured reaches 100% event-loop utilization at 100k req/s on an empty node http server.
303+
For reference, the machine on which these were measured reaches 100% event-loop utilization at 100k req/s on an empty node http server[^empty-node-http-server].
296304

297305
#### 100 links per page
298306

@@ -329,3 +337,5 @@ There were many other improvements (client and server) not covered here. SSR per
329337
[^dce]: Dead code elimination is a standard compiler optimization. See esbuild's documentation on [tree shaking](https://esbuild.github.io/api/#tree-shaking), Rollup's [tree-shaking guide](https://rollupjs.org/introduction/#tree-shaking) and Rich Harris's article on [dead code elimination](https://medium.com/@Rich_Harris/tree-shaking-versus-dead-code-elimination-d3765df85c80).
330338

331339
[^elu]: Event-loop utilization is the percentage of time the event loop is busy utilizing the CPU. See this [nodesource blog post](https://nodesource.com/blog/event-loop-utilization-nodejs) for more details.
340+
341+
[^empty-node-http-server]: To get a reference for the values we were measuring, we ran a similar `autocannon` benchmark on the smallest possible node http server: `require('http').createServer((q,s)=>s.end()).listen(3000)`. This tells us the *theoretical* maximum throughput of the machine and test setup.

0 commit comments

Comments
 (0)