GitHub - ozdogrumerve/lexical-analyzer: This project implements a DFA (Deterministic Finite Automaton)-based lexical analyzer. It analyzes source code and breaks it down into tokens while visualizing DFA state transitions. The tool also provides statistical insights, including token counts, unique lexemes, and error rates.

DFA-based token scanner — Formal Languages & Automata

What it does

Takes a snippet of source code, runs it through a hand-written DFA (Deterministic Finite Automaton), and breaks it down into tokens — keywords, identifiers, numbers, operators, and so on. Every transition the automaton makes is recorded, so you can replay the whole process character by character, forward and back.

Three tabs, three views of the same analysis:

Editor — write or load source code, run the analysis, watch tokens appear in real time with color-coded chips. A live state panel shows the current DFA state, the character being read, the lexeme accumulating, and the token just produced. A step log at the bottom records every automaton transition.
DFA Diagram — an SVG diagram of the automaton with the active state highlighted as the analysis plays. Includes a full transition table next to the diagram.
Statistics — token type distribution as a bar chart, summary metrics (total tokens, source lines, unique lexemes, error count), and a full token detail table with line/column info for every token.

Token types

Type	Examples
`KEYWORD`	`if` `else` `while` `for` `return` `int` `float` `bool` `true` `false` `print`
`IDENTIFIER`	`x` `oran` `sonuc` `myVar`
`NUMBER`	`5` `42` `100`
`FLOAT`	`3.14` `0.5`
`ASSIGN`	`=`
`OPERATOR`	`+` `-` `*` `/` `<` `>` `==` `!=` `>=`
`LPAREN` / `RPAREN`	`(` `)`
`LBRACE` / `RBRACE`	`{` `}`
`SEMICOLON`	`;`
`COMMA`	`,`
`UNKNOWN`	anything the DFA can't recognize (e.g. `@`)

DFA states

START ──letter──► IN_ID ────────────────────────► DONE
      ──digit───► IN_NUM ──dot──► IN_FLOAT ──────► DONE
      ──op──────► IN_OP ──────────────────────────► DONE

Whitespace is skipped before entering the DFA — no epsilon transitions, pure DFA behaviour. Single-character tokens ((, ), {, }, ;, ,) bypass the DFA entirely and produce a token directly.

Getting started

Clone the repo and open index.html. That's it — no install, no build, no node_modules folder haunting your drive.

git clone https://github.com/your-username/lexical-analyzer.git
cd lexical-analyzer

Then just open index.html in your browser. If you want a proper local server instead of a file:// URL:

# Python (usually already on your machine)
python3 -m http.server 5500

# or Node
npx serve .

Navigate to http://localhost:5500 and you're good to go. A default source snippet is pre-loaded so you can hit Analyze right away.

Project structure

├── index.html
├── css/
│   └── style.css
└── js/
    ├── lexer.js        DFA engine — tokenizer, transition table, step recorder
    ├── dfa-diagram.js  SVG diagram renderer and animation controller
    ├── stats.js        statistics computation and chart/table rendering
    └── ui.js           ties everything together, handles playback controls

lexer.js has zero DOM dependencies — it just takes a string and returns { tokens, steps }. Everything visual lives in the other three files.

Controls

Control	Action
Analiz Et	Start / pause animated playback
İleri / Geri	Step forward or backward one DFA transition at a time
Sıfırla	Reset everything to the initial state
Speed slider	Control playback speed (100ms – 1500ms per step)
Dosya Aç	Load a `.txt` or `.json` file as source input
JSON Kaydet	Export the full token list as JSON

Both the Editor and DFA Diagram tabs share the same playback state — switching tabs mid-animation keeps everything in sync.

⭐ Star this repo if you find it helpful!

Made with ❤️ by Merve Özdoğru

Turing would not be impressed, but it works

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
css		css
js		js
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What it does

Token types

DFA states

Getting started

Project structure

Controls

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What it does

Token types

DFA states

Getting started

Project structure

Controls

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages