-
Notifications
You must be signed in to change notification settings - Fork 62
Test the encoding sniffing algorithm (aka meta prescan) #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sideshowbarker
wants to merge
2
commits into
master
Choose a base branch
from
sideshowbarker/preparsed-encoding-tests-add
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next
Next commit
Test the (meta) prescan algorithm
This change adds a `preparsed` subdirectory in the `encoding` directory, with tests for which the result of the *encoding sniffing algorithm* at https://html.spec.whatwg.org/#encoding-sniffing-algorithm is the expected result — that is, tests for which the expected result is the output of running *only* the encoding sniffing algorithm (of which the main sub-algorithm is the so-called “meta prescan”) — without also running the tokenization state machine and tree-construction stage. This change also adds a README file that explicitly documents what the expected results for the encoding tests are, based on whether or not they’re in the `preparsed` subdirectory. Without those changes, it’s unclear whether the expected results shown in the existing tests are for the output of fully parsing the test data — through the tokenization state machine and tree-construction stage — or instead just the output of the encoding sniffing algorithm only. And without those changes, we also don’t have any tests a system can use for testing only the output from the encoding sniffing algorithm. Fixes #28
- Loading branch information
commit 1e10bdb64b6fc9bc43005a6d07f0b2d1b98a27af
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
Encoding Tests | ||
============== | ||
|
||
Each file containing encoding tests has any number of tests separated by | ||
two newlines (LF) and a single newline before the end of the file: | ||
|
||
[TEST]LF | ||
LF | ||
[TEST]LF | ||
LF | ||
[TEST]LF | ||
|
||
...where [TEST] is the format documented below. | ||
|
||
Encoding test format | ||
==================== | ||
|
||
Each test must begin with a string "\#data", followed by a newline (LF). | ||
All subsequent lines until a line that says "\#encoding" are the test data | ||
and must be passed to the system being tested unchanged, except with the | ||
final newline (on the last line) removed. | ||
|
||
Then there must be a line that says "\#encoding", followed by a newline | ||
(LF), followed by string indicating an encoding name, followed by a newline | ||
(LF). The encoding name indicated is the expected character encoding for | ||
the output with the given test data as input. | ||
|
||
For the tests in the `preparsed` subdirectory, the encoding name indicated | ||
is the expected result of running the *encoding sniffing algorithm* at | ||
https://html.spec.whatwg.org/#encoding-sniffing-algorithm with the given | ||
test data as input; this is, it's the expected result of running *only* the | ||
*encoding sniffing algorithm* — without also running the tokenization state | ||
machine and tree-construction stage defined in the spec. | ||
|
||
For all tests outside the subdirectory named `preparsed`, the encoding name | ||
indicated is instead the expected character encoding for the output after | ||
fully parsing the given test data; that is, it's the expected character | ||
encoding for the output after running the tokenization state machine and | ||
tree-construction stage. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.