Welcome to the first phase of your compiler construction journey! This initial phase focuses on building a lexical analyzer, commonly known as a lexer - the fundamental component that transforms raw source code into meaningful tokens. Your lexer will scan the input character by character, identifying patterns and converting them into tokens representing language constructs like keywords, identifiers, and operators.
While we provide a suggested project structure below, you can organize your code as you see fit. The key components remain the same:
my-mini-compiler/
├── CMakeLists.txt # Build configuration
└── phase1-w25/ # Phase 1 implementation
├── include/
│ └── tokens.h # Token definitions and types
└── src/
└── lexer/
└── lexer.c # Core lexer implementation
The tokens.h header file contains the definitions for different token types your lexer will recognize, while lexer.c houses the main implementation of your lexical analyzer. As you progress, you can extend this foundation to support all the features your target programming language requires.
The header file contains basic definitions you'll need to extend:
- Token type definitions (currently handles numbers and operators)
- Error type definitions
- Token structure definition
Think about what additional token types you'll need for your language features.
The source file provides:
- Basic number recognition
- Simple operator handling
- Error reporting structure
- Line number tracking
Study how the existing token recognition works before adding new features.
Consider how you'll handle each feature:
- Keywords (
if,repeat,until) - Identifiers (variable names)
- String literals
- Additional operators
- Delimiters
- Comments
- Study the existing number recognition implementation
- Plan each new feature before coding
- Test thoroughly after each addition
- Consider error cases for each feature
Your lexer should handle various input scenarios. Here are some examples to get you started:
123 + 456 - 789
int x = 42;
y = x + 10;
123 ++ 456
x@ = 10
- Start with simple cases
- Test each feature individually
- Include error cases
- Try multi-line inputs
For each new feature:
- Plan the addition
- Update token types if needed
- Implement recognition logic
- Add error handling
- Test thoroughly
When implementing string literal handling, consider these key aspects:
- Track opening and closing quotes (
") - Handle escape sequences (
\",\n,\t) - Process character by character until the closing quote
- Check for unterminated strings
- Validate escape sequences
- Handle buffer overflow for long strings
"Basic string"
"String with \"quotes\""
"Unterminated string // Error case
Remember to update your token types and error codes in tokens.h to support string literals and their associated errors.
Consider:
- What can go wrong?
- How to detect errors?
- What error messages to display?
- How to recover from errors?
Think about:
- How will you distinguish between identifiers and keywords?
- What makes a valid identifier in your language?
- How will you handle multi-character operators?
- What error messages will be most helpful to users?
Watch out for:
- Position tracking errors
- Incorrect error handling
- Missing edge cases
- Incomplete token recognition
- Extended token types in
tokens.h - Implemented token recognition in
lexer.c - Added comment handling
- Added string literal handling
- Comprehensive error handling
- Test cases with expected outputs
- Documentation of grammar, features,
error_codes, and design decisions
- Load CMake project
- Build (
Ctrl+F9) - Run (
Shift+F10) - Test with various inputs
- Verify output matches expectations
- Install VSCode
- Add C/C++ extension
- Install MinGW or GCC
- Compile using terminal:
gcc lexer.c -o lexer
./lexerOr do it however you like as long as we get a lexer that works!
- Start small and build incrementally
- Test each feature thoroughly
- Document your changes
- Consider all possible error cases
- Keep your code organized and well-commented
- Add your documentation/ report under the documentation directory
Remember:
- A good lexer is fundamental to your compiler
- Take time to plan before implementing and work as a team. DO NOT PROCRASTINATE!
- Test thoroughly
- Ask your TAs for help if needed
- Document your decisions
Good luck with your implementation!