Skip to content

Commit ae77ebb

Browse files
authored
docs: Add ARCHITECTURE.md and improve README.md with better content separation (#12858)
1 parent 9b9249d commit ae77ebb

File tree

2 files changed

+406
-41
lines changed

2 files changed

+406
-41
lines changed

ARCHITECTURE.md

Lines changed: 394 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,394 @@
1+
# Architecture
2+
3+
## System Overview
4+
5+
Oxc (The Oxidation Compiler) is a collection of high-performance JavaScript and TypeScript tools written in Rust. The system is designed as a modular, composable set of compiler components that can be used independently or together to build complete toolchains for JavaScript/TypeScript development.
6+
7+
### Core Mission
8+
9+
- **Performance**: Deliver faster performance than existing JavaScript tools
10+
- **Correctness**: Maintain compatibility with JavaScript/TypeScript standards
11+
- **Modularity**: Enable users to compose tools according to their specific needs
12+
- **Developer Experience**: Provide excellent error messages and tooling integration
13+
14+
## High-Level Architecture
15+
16+
```
17+
┌─────────────────────────────────────────────────────────────────┐
18+
│ Applications │
19+
├─────────────────────────────────────────────────────────────────┤
20+
│ oxlint │ Language Server │ NAPI Bindings │ Future Tools │
21+
├─────────────────────────────────────────────────────────────────┤
22+
│ Core Libraries │
23+
├─────────────────────────────────────────────────────────────────┤
24+
│ Parser │ Semantic │ Linter │ Transformer │ Minifier │ Codegen │
25+
├─────────────────────────────────────────────────────────────────┤
26+
│ Foundation Libraries │
27+
├─────────────────────────────────────────────────────────────────┤
28+
│ AST │ Allocator │ Diagnostics │ Span │ Syntax │
29+
└─────────────────────────────────────────────────────────────────┘
30+
```
31+
32+
## Architecture Principles
33+
34+
### 1. Zero-Copy Architecture
35+
36+
The system is built around an arena allocator (`oxc_allocator`) that enables zero-copy operations throughout the compilation pipeline. All AST nodes are allocated in a single arena, eliminating the need for reference counting or garbage collection.
37+
38+
### 2. Visitor Pattern
39+
40+
AST traversal is implemented using the visitor pattern (`oxc_ast_visit`) with automatic visitor generation through procedural macros. This ensures type safety and performance while maintaining code clarity.
41+
42+
### 3. Shared Infrastructure
43+
44+
Common functionality like error reporting (`oxc_diagnostics`), source positions (`oxc_span`), and syntax definitions (`oxc_syntax`) are shared across all components to ensure consistency.
45+
46+
## Core Components
47+
48+
### Foundation Layer
49+
50+
#### oxc_allocator
51+
52+
- **Purpose**: Arena-based memory allocator for zero-copy operations
53+
- **Key Features**:
54+
- Single allocation arena for entire compilation unit
55+
- Eliminates need for Rc/Arc in hot paths
56+
- Enables structural sharing of AST nodes
57+
- **Dependencies**: None (foundational)
58+
59+
#### oxc_span
60+
61+
- **Purpose**: Source position tracking and text manipulation
62+
- **Key Features**:
63+
- Byte-based indexing for UTF-8 correctness
64+
- Efficient span operations for source maps
65+
- Integration with diagnostic reporting
66+
- **Dependencies**: None (foundational)
67+
68+
#### oxc_syntax
69+
70+
- **Purpose**: JavaScript/TypeScript language definitions
71+
- **Key Features**:
72+
- Token definitions and keyword mappings
73+
- Language feature flags and compatibility
74+
- Shared syntax validation logic
75+
- **Dependencies**: oxc_span
76+
77+
#### oxc_diagnostics
78+
79+
- **Purpose**: Error reporting and diagnostic infrastructure
80+
- **Key Features**:
81+
- Rich error messages with source context
82+
- Multiple output formats (JSON, pretty-printed)
83+
- Integration with language server protocol
84+
- **Dependencies**: oxc_span
85+
86+
#### oxc_ast
87+
88+
- **Purpose**: Abstract Syntax Tree definitions and utilities
89+
- **Key Features**:
90+
- Complete JavaScript/TypeScript AST coverage
91+
- Generated visitor traits for type safety
92+
- Serialization support for caching
93+
- **Dependencies**: oxc_allocator, oxc_span, oxc_syntax
94+
95+
##### AST Design Principles
96+
97+
The Oxc AST differs significantly from the [estree](https://github.com/estree/estree) AST specification by removing ambiguous nodes and introducing distinct types. While many existing JavaScript tools rely on estree as their AST specification, a notable drawback is its abundance of ambiguous nodes that often leads to confusion during development.
98+
99+
For example, instead of using a generic estree `Identifier`, the Oxc AST provides specific types such as:
100+
101+
- `BindingIdentifier` - for variable declarations and bindings
102+
- `IdentifierReference` - for variable references
103+
- `IdentifierName` - for property names and labels
104+
105+
This clear distinction greatly enhances the development experience by aligning more closely with the ECMAScript specification and providing better type safety.
106+
107+
### Core Processing Layer
108+
109+
#### oxc_parser
110+
111+
- **Purpose**: JavaScript/TypeScript parsing
112+
- **Key Features**:
113+
- Hand-written recursive descent parser
114+
- Full ES2024+ and TypeScript support
115+
- Preservation of comments and trivia
116+
- **Dependencies**: oxc_allocator, oxc_ast, oxc_diagnostics, oxc_span, oxc_syntax
117+
118+
#### oxc_semantic
119+
120+
- **Purpose**: Semantic analysis and symbol resolution
121+
- **Key Features**:
122+
- Scope chain construction
123+
- Symbol table generation
124+
- Dead code detection
125+
- **Dependencies**: oxc_ast, oxc_cfg, oxc_diagnostics, oxc_span, oxc_syntax
126+
127+
#### oxc_linter
128+
129+
- **Purpose**: ESLint-compatible linting engine
130+
- **Key Features**:
131+
- 200+ built-in rules
132+
- Plugin architecture for custom rules
133+
- Automatic fixing for many rules
134+
- Configuration compatibility with ESLint
135+
- **Dependencies**: oxc_ast, oxc_semantic, oxc_diagnostics, oxc_cfg
136+
137+
#### oxc_transformer
138+
139+
- **Purpose**: Code transformation and transpilation
140+
- **Key Features**:
141+
- TypeScript to JavaScript transformation
142+
- Modern JavaScript feature transpilation
143+
- React JSX transformation
144+
- Babel plugin compatibility layer
145+
- **Dependencies**: oxc_ast, oxc_semantic, oxc_allocator
146+
147+
#### oxc_minifier
148+
149+
- **Purpose**: Code size optimization
150+
- **Key Features**:
151+
- Dead code elimination
152+
- Constant folding and propagation
153+
- Identifier mangling integration
154+
- Statement and expression optimization
155+
- **Dependencies**: oxc_ast, oxc_semantic, oxc_mangler
156+
157+
#### oxc_codegen
158+
159+
- **Purpose**: AST to source code generation
160+
- **Key Features**:
161+
- Configurable output formatting
162+
- Source map generation
163+
- Comment preservation options
164+
- Minified and pretty-printed output modes
165+
- **Dependencies**: oxc_ast, oxc_span
166+
167+
### Application Layer
168+
169+
#### oxlint (apps/oxlint)
170+
171+
- **Purpose**: Command-line linter application
172+
- **Key Features**:
173+
- File discovery and parallel processing
174+
- Configuration file support
175+
- Multiple output formats
176+
- Integration with CI/CD systems
177+
- **Dependencies**: oxc_linter, oxc_parser, oxc_semantic
178+
179+
#### Language Server (oxc_language_server)
180+
181+
- **Purpose**: LSP implementation for editor integration
182+
- **Key Features**:
183+
- Real-time diagnostics
184+
- Go-to-definition and references
185+
- Symbol search and completion
186+
- **Dependencies**: All core components
187+
188+
#### NAPI Bindings (napi/*)
189+
190+
- **Purpose**: Node.js integration layer
191+
- **Key Features**:
192+
- Parser bindings for JavaScript tooling
193+
- Linter integration for build tools
194+
- Transform pipeline for bundlers
195+
- Async processing support
196+
- **Dependencies**: Core components + Node.js FFI
197+
198+
## Data Flow
199+
200+
### Compilation Pipeline
201+
202+
1. **Input**: Source text + configuration
203+
2. **Lexing/Parsing**: `oxc_parser` → AST + comments
204+
3. **Semantic Analysis**: `oxc_semantic` → Symbol table + scope info
205+
4. **Processing**: Tool-specific analysis (linting, transformation, etc.)
206+
5. **Output**: Results (diagnostics, transformed code, etc.)
207+
208+
### Memory Management Flow
209+
210+
```
211+
Source Text → Arena Allocator → AST Nodes → Visitors → Results
212+
↓ ↓ ↓ ↓ ↓
213+
UTF-8 Arena Borrowed Zero-copy Owned
214+
String Memory References Processing Output
215+
```
216+
217+
## Quality Attributes
218+
219+
### Performance
220+
221+
- **Target**: 10-100x faster than comparable tools
222+
- **Strategies**:
223+
- Arena allocation for memory efficiency
224+
- Zero-copy data structures
225+
- Parallel processing where possible
226+
- Minimal allocations in hot paths
227+
228+
#### Parser Performance Implementation
229+
230+
- AST is allocated in a memory arena ([bumpalo](https://crates.io/crates/bumpalo)) for fast AST memory allocation and deallocation
231+
- Short strings are inlined by [CompactString](https://crates.io/crates/compact_str)
232+
- No other heap allocations are done except the above two
233+
- Scope binding, symbol resolution and some syntax errors are not done in the parser, they are delegated to the semantic analyzer
234+
235+
#### Linter Performance Implementation
236+
237+
- Oxc parser is used for optimal performance
238+
- AST visit is a fast operation due to linear memory scan from the memory arena
239+
- Files are linted in a multi-threaded environment, so scales with the total number of CPU cores
240+
- Every single lint rule is tuned for performance
241+
242+
### Correctness
243+
244+
- **Target**: 100% compatibility with language standards
245+
- **Strategies**:
246+
- Comprehensive test suites
247+
- Real-world codebase testing
248+
- Conformance testing against official specs
249+
- Conservative error handling
250+
251+
### Maintainability
252+
253+
- **Target**: Clear, reviewable, extensible codebase
254+
- **Strategies**:
255+
- Strong type system usage
256+
- Procedural macro code generation
257+
- Clear separation of concerns
258+
- Comprehensive documentation
259+
260+
### Usability
261+
262+
- **Target**: Drop-in replacement for existing tools
263+
- **Strategies**:
264+
- Configuration compatibility
265+
- Familiar CLI interfaces
266+
- Rich error messages
267+
- Editor integration
268+
269+
## Technical Constraints
270+
271+
### Language Choice
272+
273+
- **Rust**: Chosen for memory safety, performance, and zero-cost abstractions
274+
- **MSRV**: N-2 policy for stability
275+
276+
### Memory Model
277+
278+
- **Arena Allocation**: Single arena per compilation unit
279+
- **Lifetime Management**: Explicit lifetimes tied to arena
280+
- **No Garbage Collection**: Manual memory management for predictable performance
281+
282+
### Threading Model
283+
284+
- **File-level Parallelism**: Multiple files processed in parallel
285+
- **Single-threaded Pipeline**: Each file processed by single thread
286+
- **Shared State**: Minimal shared state to avoid synchronization overhead
287+
288+
### Compatibility Requirements
289+
290+
- **JavaScript**: ES2024+ compatibility
291+
- **TypeScript**: Latest TypeScript syntax support
292+
- **Node.js**: LTS versions through NAPI bindings
293+
- **Editors**: LSP compatibility for all major editors
294+
295+
## Design Decisions
296+
297+
### Arena Allocator Choice
298+
299+
**Decision**: Use custom arena allocator instead of Rc/Arc
300+
**Rationale**:
301+
302+
- Eliminates reference counting overhead
303+
- Enables zero-copy string operations
304+
- Simplifies memory management
305+
- Improves cache locality
306+
307+
**Trade-offs**:
308+
309+
- ✅ 10-50% performance improvement
310+
- ✅ Simplified ownership model
311+
- ❌ Requires lifetime management
312+
- ❌ Less flexible memory patterns
313+
314+
### Hand-written Parser
315+
316+
**Decision**: Implement recursive descent parser instead of parser generator
317+
**Rationale**:
318+
319+
- Easier debugging and maintenance
320+
- More efficient generated code
321+
- Faster compilation times
322+
323+
**Trade-offs**:
324+
325+
- ✅ Better performance and error messages
326+
- ✅ More maintainable code
327+
- ❌ More manual implementation work
328+
- ❌ Higher risk of parser bugs
329+
330+
### Visitor Pattern
331+
332+
**Decision**: Use visitor pattern with procedural macros
333+
**Rationale**:
334+
335+
- Type-safe AST traversal
336+
- Automatic visitor generation
337+
- Consistent patterns across tools
338+
- Efficient dispatch
339+
340+
**Trade-offs**:
341+
342+
- ✅ Type safety and performance
343+
- ✅ Reduced boilerplate code
344+
- ❌ Compile-time complexity
345+
- ❌ Learning curve for contributors
346+
347+
## Future Considerations
348+
349+
### Planned Extensions
350+
351+
- **Formatter**: Complete code formatting tool
352+
- **Bundler**: Integration with bundling workflows
353+
- **Type Checker**: Full TypeScript type checking
354+
- **Plugin System**: User-defined transformations
355+
356+
### Scalability Concerns
357+
358+
- **Large Codebases**: Processing optimization improvements
359+
- **Memory Usage**: Streaming processing for huge files
360+
- **Parallel Processing**: Fine-grained parallelization
361+
362+
### Technology Evolution
363+
364+
- **Rust Evolution**: Leveraging new language features
365+
- **JavaScript Standards**: Keeping pace with TC39 proposals
366+
- **Editor Integration**: Advanced IDE features
367+
368+
## Development Infrastructure
369+
370+
### Test Infrastructure
371+
372+
Correctness and reliability are taken extremely seriously in Oxc. We spend significant effort on strengthening the test infrastructure to prevent problems from propagating to downstream tools:
373+
374+
- **Conformance Testing**: Test262, Babel, and TypeScript conformance suites
375+
- **Fuzzing**: Extensive fuzzing to discover edge cases
376+
- **Snapshot Testing**: Linter diagnostic snapshots for regression prevention
377+
- **Ecosystem CI**: Testing against real-world codebases
378+
- **Idempotency Testing**: Ensuring transformations are stable
379+
- **Code Coverage**: Comprehensive coverage tracking
380+
- **End-to-End Testing**: Testing against top 3000 npm packages
381+
382+
### Build and Development Tools
383+
384+
- **Rust**: MSRV 1.86.0+ with clippy and rustfmt integration
385+
- **Just**: Command runner for development tasks (`just --list` for available commands)
386+
- **Performance Monitoring**: Continuous benchmarking and performance regression detection
387+
- **Cross-platform**: Support for Linux, macOS, and Windows
388+
- **CI/CD**: Automated testing, building, and publishing pipelines
389+
390+
For detailed development guidelines, see [CONTRIBUTING.md](./CONTRIBUTING.md) and [AGENTS.md](./AGENTS.md).
391+
392+
---
393+
394+
This architecture document follows the [architecture.md](https://architecture.md/) format for documenting software architecture decisions and system design.

0 commit comments

Comments
 (0)