An HTML5 tokenizer, fully compliant with the WhatWG HTML specification, written in Rust.
- 100% WhatWG spec-compliant HTML tokenization
- Passes all html5lib tokenizer test cases
- Tokenizes HTML input into tag, comment, doctype and character tokens
- Really fast, finishes all html5lib tokenizer test cases in 0.15 seconds
- Build the project
cargo build
- Run all testcases
cargo test
The tokenizer is finished. The next step is to add tree construction (parsing). This effort has been started in the tree-construction branch.
- Cargo
- Rust