A simple, extensible C++ lexer for tokenizing source code. This project is open source and aims to provide a minimal, easy-to-use lexer for educational and practical use cases.
- Tokenizes identifiers, keywords, strings, integers, operators, and punctuation
- Easily extensible for new token types
- Simple API: just create a
Lexerand calllex() - Written in modern C++
- Identifiers
- Keywords (customizable)
- Strings (double-quoted)
- Integers
- Parentheses:
(and) - Semicolons:
; - Commas:
, - Colons:
: - Operators:
+,-,*,/,= - Double quotes:
" - End of file (EOF)
#include "src/lexer.hpp"#include <iostream>
#include "src/lexer.hpp"
int main() {
std::string code = "let x = 42 + 5;";
Lexer lexer(code);
std::vector<Token> tokens = lexer.lex();
for (const auto& token : tokens) {
std::cout << "Token: " << token.value << " Type: " << token.type << std::endl;
}
return 0;
}This project is standard C++ and requires no external dependencies. To build:
g++ -std=c++11 -o lexer src/lexer.cc src/main.cc
This project is licensed under the MIT License. See LICENSE for details.
Contributions are welcome! Please open issues or pull requests for improvements, bug fixes, or new features.