Skip to content

Commit 0527dd4

Browse files
Clickinclaude
andcommitted
docs: Writer 최적화 벤치마크 결과 및 분석 추가
Phase 1: 버퍼링 전략 최적화 (실패) - String Array 방식: -7.7% (V8 최적화로 인해 불필요) - Uint8Array 방식: -42.5% (인코딩 오버헤드) - 결론: V8의 문자열 연결은 이미 O(n) - 최적화 불필요 Phase 2: 알고리즘 최적화 (성공) - Regex 캐싱: +9-26% - 속성 문자열 배칭: +36.5% - 조기 엔티티 체크: +25.6% - 평균 향상: +31.7% 생성된 문서: - analysis.md: 이론적 분석 - phase1-quick-benchmark.ts: 초기 검증 - phase1-extended-test.ts: 스케일 테스트 - phase2-real-optimizations-benchmark.ts: 실제 최적화 벤치마크 - final-report.md: Phase 1 실패 상세 보고서 - real-optimizations-report.md: Phase 2 성공 보고서 - FINAL-ANALYSIS.md: 최종 분석 및 권장사항 벤치마크 결과: - 12개 테스트 시나리오 - 91.7% 테스트에서 성능 향상 - 최고 +63.8% (깊은 중첩 문서) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 83a6d80 commit 0527dd4

File tree

10 files changed

+2737
-0
lines changed

10 files changed

+2737
-0
lines changed
Lines changed: 385 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,385 @@
1+
# Writer Optimization Analysis
2+
3+
## Executive Summary
4+
5+
This document analyzes the current string buffering strategy in `StaxXmlWriterSync` and `StaxXmlWriter`, and proposes two optimization approaches: **String Array** and **Uint8Array** buffering.
6+
7+
**Current Problem**: Repeated string concatenation (`this.xmlString += chunk`) creates intermediate string objects, causing:
8+
- O(n²) time complexity in worst case
9+
- High GC pressure (n temporary strings)
10+
- Memory allocation overhead
11+
12+
**Proposed Solutions**:
13+
1. **String Array**: Accumulate chunks in array, join once at the end
14+
2. **Uint8Array**: Use binary buffer with in-place mutations
15+
16+
---
17+
18+
## 1. Current Implementation Analysis
19+
20+
### StaxXmlWriterSync (Line 494-497)
21+
22+
```typescript
23+
private _write(chunk: string): void {
24+
if (this.state === WriterState.CLOSED || this.state === WriterState.ERROR) return;
25+
this.xmlString += chunk; // 🔴 Problem: Creates new string every time
26+
}
27+
```
28+
29+
**Issues**:
30+
- **Immutability**: JavaScript strings are immutable, so `a + b` creates a new string
31+
- **Frequent allocations**: Called for every tag, attribute, text node, etc.
32+
- **GC pressure**: Each intermediate string becomes garbage
33+
- **Unpredictable performance**: V8 optimizations may not always apply
34+
35+
### Call Frequency Analysis
36+
37+
For a typical XML document with 1,000 elements:
38+
- `writeStartElement()`: ~1,000 calls → triggers `_write("<element")`
39+
- `writeAttribute()`: ~3,000 calls → triggers `_write(" attr=\"value\"")`
40+
- `writeCharacters()`: ~1,000 calls → triggers `_write(text)`
41+
- `writeEndElement()`: ~1,000 calls → triggers `_write("</element>")`
42+
43+
**Total `_write()` calls**: ~6,000 per 1,000 elements
44+
**Total string concatenations**: ~6,000 intermediate strings created
45+
46+
For `medium-nested.xml` (27MB):
47+
- Estimated elements: ~100,000
48+
- Estimated `_write()` calls: ~600,000
49+
- **600,000 intermediate string allocations** 🚨
50+
51+
---
52+
53+
## 2. Theoretical Complexity Analysis
54+
55+
### 2.1 Time Complexity
56+
57+
| Implementation | Best Case | Average Case | Worst Case | Final Assembly |
58+
|---------------|-----------|--------------|------------|----------------|
59+
| **String Concat** | O(n) | O(n log n) | O(n²) | - |
60+
| **String Array** | O(n) | O(n) | O(n) | O(total_length) |
61+
| **Uint8Array** | O(n) | O(n) | O(n) | O(total_length) |
62+
63+
**String Concat Details**:
64+
- V8 may optimize short strings using rope data structures (delayed concatenation)
65+
- For long strings (>256 chars), V8 performs actual memory copy
66+
- Repeated concatenation defeats optimization, approaches O(n²)
67+
68+
**Why O(n²)?**
69+
```
70+
Iteration 1: "a" (1 byte)
71+
Iteration 2: "a" + "b" = copy 1 + write 1 = 2 ops
72+
Iteration 3: "ab" + "c" = copy 2 + write 1 = 3 ops
73+
...
74+
Iteration n: copy (n-1) + write 1 = n ops
75+
76+
Total: 1 + 2 + 3 + ... + n = n(n+1)/2 = O(n²)
77+
```
78+
79+
### 2.2 Space Complexity
80+
81+
| Implementation | During Construction | Peak Memory | Final Memory |
82+
|---------------|-----------------------|-------------|--------------|
83+
| **String Concat** | O(n × m) | O(n × m) | O(total_length) |
84+
| **String Array** | O(n × m) | O(n × m) + O(total_length) | O(total_length) |
85+
| **Uint8Array** | O(buffer_size) | O(buffer_size) | O(total_length) |
86+
87+
**Notes**:
88+
- n = number of chunks
89+
- m = average chunk size
90+
- String Array needs extra array overhead (~64 bytes + 8 bytes per pointer)
91+
- Uint8Array can pre-allocate buffer, avoiding incremental growth
92+
93+
### 2.3 GC Complexity
94+
95+
| Implementation | Temporary Objects | GC Events (estimated) |
96+
|---------------|-------------------|----------------------|
97+
| **String Concat** | O(n) | O(n / 1000)* |
98+
| **String Array** | O(1) | O(1) |
99+
| **Uint8Array** | O(1) | O(1) |
100+
101+
*V8 typically triggers minor GC every ~1MB of allocations
102+
103+
**GC Impact for 600,000 writes** (medium-nested.xml):
104+
- String Concat: ~600 minor GC events (assuming 1KB avg chunk)
105+
- String Array: ~1-2 minor GC events (only for final join)
106+
- Uint8Array: ~0-1 minor GC events (buffer allocated in C++ heap)
107+
108+
---
109+
110+
## 3. V8 Internal Behavior Predictions
111+
112+
### 3.1 String Concatenation in V8
113+
114+
V8 has several string representations:
115+
116+
1. **SeqString** (Sequential String)
117+
- Continuous memory block
118+
- Used for literal strings and short concatenations
119+
- Fast access: O(1)
120+
121+
2. **ConsString** (Concatenated String)
122+
- Lazy concatenation: stores two pointers instead of copying
123+
- Used for `string1 + string2` when beneficial
124+
- Flattened on access: first read triggers actual concatenation
125+
- Tree depth limited to avoid deep nesting
126+
127+
3. **SlicedString**
128+
- Pointer to parent string + offset + length
129+
- Used for `substring()` operations
130+
131+
**Our Case**: Repeated `xmlString += chunk`
132+
- Initial concatenations: ConsString (efficient)
133+
- As tree deepens: V8 flattens to SeqString (expensive)
134+
- Beyond ~256 chars: Always flattens (costly)
135+
- **Result**: Degrades to O(n²) for large documents
136+
137+
### 3.2 Array + Join Optimization
138+
139+
```typescript
140+
chunks.join('')
141+
```
142+
143+
V8's join optimization:
144+
1. First pass: Calculate total length (O(n))
145+
2. Allocate single string buffer (O(1))
146+
3. Second pass: Copy each chunk (O(total_length))
147+
4. **Total**: O(n + total_length) - much better!
148+
149+
### 3.3 Uint8Array External Memory
150+
151+
```typescript
152+
new Uint8Array(256 * 1024)
153+
```
154+
155+
- Allocated in C++ heap (not V8 heap)
156+
- Not tracked by GC scavenger (minor GC)
157+
- Only tracked by major GC via weak reference
158+
- **Result**: Minimal GC pressure
159+
160+
Encoding overhead:
161+
```typescript
162+
encoder.encode(str)
163+
```
164+
- UTF-8 encoding: O(str.length)
165+
- Native C++ implementation: very fast
166+
- Typically 2-3x faster than string manipulation
167+
168+
---
169+
170+
## 4. Hypothesis and Expected Results
171+
172+
### 4.1 String Array Approach
173+
174+
**Hypothesis**:
175+
- **Performance**: +30-50% faster
176+
- **GC Pressure**: -70% fewer GC events
177+
- **Memory**: +10-20% (array overhead)
178+
- **Complexity**: Very low (simple change)
179+
180+
**Mechanism**:
181+
```typescript
182+
// Before (O(n²) worst case)
183+
this.xmlString += chunk;
184+
185+
// After (O(n) always)
186+
this.chunks.push(chunk); // O(1) amortized
187+
// ... at end ...
188+
return this.chunks.join(''); // O(total_length) once
189+
```
190+
191+
**Trade-offs**:
192+
- ✅ Simple to implement
193+
- ✅ Minimal memory overhead
194+
- ✅ Works with existing string-based API
195+
- ⚠️ Final join() still allocates large string
196+
- ⚠️ Array grows dynamically (some reallocation)
197+
198+
### 4.2 Uint8Array Approach
199+
200+
**Hypothesis**:
201+
- **Performance**: +50-80% faster
202+
- **GC Pressure**: -90% fewer GC events
203+
- **Memory**: Predictable, controlled by buffer size
204+
- **Complexity**: Medium (encoding/decoding overhead)
205+
206+
**Mechanism**:
207+
```typescript
208+
// Initialize
209+
private buffer = new Uint8Array(256 * 1024); // 256KB
210+
private encoder = new TextEncoder();
211+
private decoder = new TextDecoder();
212+
private pos = 0;
213+
214+
// Write
215+
private _write(chunk: string): void {
216+
const bytes = this.encoder.encode(chunk);
217+
218+
// Expand if needed
219+
if (this.pos + bytes.length > this.buffer.length) {
220+
const newSize = Math.max(this.buffer.length * 2, this.pos + bytes.length);
221+
const newBuffer = new Uint8Array(newSize);
222+
newBuffer.set(this.buffer.subarray(0, this.pos));
223+
this.buffer = newBuffer;
224+
}
225+
226+
this.buffer.set(bytes, this.pos);
227+
this.pos += bytes.length;
228+
}
229+
230+
// Get result
231+
public getXmlString(): string {
232+
return this.decoder.decode(this.buffer.subarray(0, this.pos));
233+
}
234+
```
235+
236+
**Trade-offs**:
237+
- ✅ Minimal GC pressure (external memory)
238+
- ✅ Predictable memory usage
239+
- ✅ In-place mutations (no copying)
240+
- ⚠️ Encoding/decoding overhead
241+
- ⚠️ More complex implementation
242+
- ⚠️ Needs buffer expansion logic
243+
244+
### 4.3 Decision Matrix (Predicted)
245+
246+
| Criterion | Weight | String Concat | String Array | Uint8Array |
247+
|-----------|--------|---------------|--------------|------------|
248+
| Performance | 30% | 100 | 140 (+40%) | 170 (+70%) |
249+
| GC Pressure | 30% | 100 | 170 (-70% GC) | 190 (-90% GC) |
250+
| Memory Efficiency | 20% | 100 | 90 (+10% mem) | 110 (predictable) |
251+
| Code Complexity | 10% | 100 | 95 (trivial) | 70 (moderate) |
252+
| API Compatibility | 10% | 100 | 100 (same) | 100 (same) |
253+
| **TOTAL** | 100% | **100** | **131** | **149** |
254+
255+
**Predicted Winner**: Uint8Array (if encoding overhead is acceptable)
256+
**Fallback**: String Array (if Uint8Array has unexpected issues)
257+
258+
---
259+
260+
## 5. Benchmark Strategy
261+
262+
### 5.1 Phase 2: Quick Validation
263+
264+
**Goal**: Fast go/no-go decision
265+
266+
**Test**:
267+
```typescript
268+
for (let i = 0; i < 10000; i++) {
269+
writer._write('<element attr="value">text</element>');
270+
}
271+
```
272+
273+
**Decision**:
274+
- ✅ Proceed: Any approach shows +10% improvement
275+
- ❌ Stop: All approaches < +10%
276+
277+
### 5.2 Phase 3: GC Analysis
278+
279+
**Goal**: Verify GC pressure reduction hypothesis
280+
281+
**Metrics**:
282+
- Minor GC count
283+
- Major GC count
284+
- Total GC time
285+
- Heap usage delta
286+
287+
**Decision**:
288+
- ✅ Proceed: GC reduction ≥ 20% AND memory increase ≤ 30%
289+
- ❌ Stop: GC reduction < 20% OR memory increase > 30%
290+
291+
### 5.3 Phase 4: Real-world Patterns
292+
293+
**Goal**: Ensure consistent improvement across XML patterns
294+
295+
**Patterns** (priority order):
296+
1. medium-nested.xml (27MB, nested structure)
297+
2. small-simple.xml (typical use case)
298+
3. attribute-heavy.xml (many attributes)
299+
4. text-heavy.xml (large text content)
300+
5. mixed-content.xml (mixed patterns)
301+
302+
**Decision**:
303+
- ✅ Accept: Average +15%, worst case -5%
304+
- ⚠️ Conditional: Average +10-15%
305+
- ❌ Reject: Average < +10% OR any pattern -10%
306+
307+
### 5.4 Phase 5: Statistical Validation
308+
309+
**Goal**: Prove statistical significance
310+
311+
**Method**: Welch's t-test (50 samples)
312+
313+
**Decision**:
314+
- ✅ Accept: p < 0.05 AND Cohen's d > 0.5
315+
- ⚠️ Conditional: p < 0.05 AND Cohen's d > 0.2
316+
- ❌ Reject: p ≥ 0.05
317+
318+
---
319+
320+
## 6. Risk Analysis
321+
322+
### 6.1 Potential Issues
323+
324+
**String Array**:
325+
- Risk: join() may be slow for huge documents
326+
- Mitigation: Benchmark with 100MB+ files
327+
- Likelihood: Low (V8 optimizes join well)
328+
329+
**Uint8Array**:
330+
- Risk: TextEncoder/TextDecoder overhead
331+
- Mitigation: Measure encoding time separately
332+
- Likelihood: Medium (encoding is CPU-intensive)
333+
334+
- Risk: Buffer reallocation cost
335+
- Mitigation: Pre-allocate larger buffer (256KB)
336+
- Likelihood: Low (exponential growth minimizes reallocs)
337+
338+
### 6.2 Fallback Plan
339+
340+
If Uint8Array fails:
341+
1. Fall back to String Array (simpler, still better than baseline)
342+
2. Document findings: encoding overhead too high
343+
3. Future optimization: investigate SIMD-based encoding
344+
345+
If both fail:
346+
1. Document current implementation is already optimal
347+
2. V8's ConsString optimization may be sufficient
348+
3. Focus optimization efforts elsewhere (parsing, not writing)
349+
350+
---
351+
352+
## 7. Success Criteria Summary
353+
354+
**Minimum Viable Optimization** (String Array):
355+
- Performance: +20% on medium-nested.xml
356+
- GC: -30% fewer GC events
357+
- Memory: No more than +20% peak memory
358+
- Statistically significant: p < 0.05
359+
360+
**Ideal Optimization** (Uint8Array):
361+
- Performance: +50% on medium-nested.xml
362+
- GC: -70% fewer GC events
363+
- Memory: Predictable, controlled growth
364+
- Statistically significant: p < 0.01, Cohen's d > 0.8
365+
366+
**Final Decision** will be based on:
367+
1. Empirical benchmark results (not predictions)
368+
2. Statistical validation
369+
3. Real-world pattern consistency
370+
4. Implementation complexity vs. benefit ratio
371+
372+
---
373+
374+
## Next Steps
375+
376+
1. ✅ Phase 0: Analysis complete
377+
2. ⏭️ Phase 1: Implement variants
378+
3. ⏭️ Phase 2-6: Execute validation pipeline
379+
380+
**Estimated total time**: 8-10 hours over 2 days
381+
382+
---
383+
384+
*Document created: 2025-10-18*
385+
*Analysis by: TypeScript Pro + Performance Engineer*

0 commit comments

Comments
 (0)