Skip to content

caelwarner/HTMLTokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HTML Tokenizer

An HTML5 tokenizer, fully compliant with the WhatWG HTML specification, written in Rust.

Features

  • 100% WhatWG spec-compliant HTML tokenization
  • Passes all html5lib tokenizer test cases
  • Tokenizes HTML input into tag, comment, doctype and character tokens
  • Really fast, finishes all html5lib tokenizer test cases in 0.15 seconds

Usage

  1. Build the project
cargo build
  1. Run all testcases
cargo test

Future Improvements

The tokenizer is finished. The next step is to add tree construction (parsing). This effort has been started in the tree-construction branch.

Requirements

  • Cargo
  • Rust

About

HTML5 fully WhatWG spec compliant tokenizer in Rust

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages