-
-
Notifications
You must be signed in to change notification settings - Fork 407
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
762 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
[package] | ||
name = "boa_unicode" | ||
version = "0.10.0" | ||
authors = ["boa-dev"] | ||
description = "Boa is a Javascript lexer, parser and Just-in-Time compiler written in Rust. Currently, it has support for some of the language." | ||
repository = "https://github.com/boa-dev/boa" | ||
keywords = ["javascript", "compiler", "lexer", "parser", "unicode"] | ||
categories = ["parsing"] | ||
license = "Unlicense/MIT" | ||
exclude = ["../.vscode/*", "../Dockerfile", "../Makefile", "../.editorConfig"] | ||
edition = "2018" | ||
|
||
[dependencies] | ||
unicode-general-category = "0.3.0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# boa-unicode | ||
|
||
`boa-unicode` defines the trait to provide methods for querying properties or classes for Unicode identifiers. These properties are used to determine if a code point (char) is valid for being the start/part of an identifier in lexer and parser. | ||
|
||
Current version: Unicode 13.0.0 | ||
|
||
## Development | ||
|
||
The Unicode character tables used to query properties are generated by `build_tables.js`. This script depends on [Node.js](https://nodejs.org/en/) and [rustfmt](https://github.com/rust-lang/rustfmt). You can run the script with: | ||
|
||
``` | ||
$ node build_tables.js | ||
``` | ||
|
||
or with [npm](https://www.npmjs.com/): | ||
|
||
``` | ||
$ npm run build-tables | ||
``` | ||
|
||
The configurations are defined as constants in the script. Please check the comments in `build_tables.js` for more information. | ||
|
||
## More Info | ||
|
||
- [Unicode® Standard Annex #31 - UNICODE IDENTIFIER AND PATTERN SYNTAX](https://unicode.org/reports/tr31/) | ||
- [Unicode® Standard Annex #44 - UNICODE CHARACTER DATABASE](https://unicode.org/reports/tr44/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
#!/usr/bin/env node | ||
/** | ||
* This file is used to generate the Rust source code with tables for Unicode properties and classes. | ||
* | ||
* This script downloads the content of `PropList.txt` from the remote server, parses the file, extracts | ||
* the target properties, prepares the char tables, and then writes to the output Rust file. It also | ||
* formats the output file with the command `rustfmt`. Please make sure `rustfmt` is available in the environment. | ||
* | ||
* Update and run this script when {@link https://unicode.org/reports/tr44/|Unicode® Standard Annex #44} is updated, and | ||
* always check the latest standard meets the {@link https://tc39.es/ecma262/#sec-names-and-keywords|spec of ECMAScript}. | ||
* | ||
* Run this script with command `node ./build_tables.js` or `npm run build-tables`. | ||
* | ||
* Version: Unicode 13.0.0 | ||
*/ | ||
|
||
const fs = require("fs"); | ||
const path = require("path"); | ||
const https = require("https"); | ||
const child_process = require("child_process"); | ||
|
||
/** | ||
* The URL to download the content of `PropList.txt` through HTTP Get. | ||
* | ||
* Please make sure the content follows the UCD file format defined in | ||
* {@link http://unicode.org/reports/tr44/#UCD_Files|UAX#44}. | ||
* | ||
* @constant {string} | ||
*/ | ||
const PROPLIST_TXT_URL = | ||
"https://www.unicode.org/Public/13.0.0/ucd/PropList.txt"; | ||
|
||
/** | ||
* The target properties to process given in tuples. The first element is the property to search for. | ||
* The second element is the table variable name in the output Rust file. | ||
* | ||
* @constant {[string, string][]} | ||
*/ | ||
const TARGET_PROPERTIES = [ | ||
["Pattern_Syntax", "PATTERN_SYNTAX"], | ||
["Other_ID_Continue", "OTHER_ID_CONTINUE"], | ||
["Other_ID_Start", "OTHER_ID_START"], | ||
["Pattern_White_Space", "PATTERN_WHITE_SPACE"], | ||
]; | ||
|
||
/** | ||
* The path of output Rust file. | ||
* | ||
* @constant {string} | ||
*/ | ||
const OUTPUT_FILE = path.join(__dirname, "./src/tables.rs"); | ||
|
||
/** | ||
* The doc comment to add to the beginning of output Rust file. | ||
* | ||
* @constant {string} | ||
*/ | ||
const OUTPUT_FILE_DOC_COMMENT = ` | ||
//! This module implements the unicode lookup tables for identifier and pattern syntax. | ||
//! Version: Unicode 13.0.0 | ||
//! | ||
//! This file is generated by \`boa_unicode/build_tables.js\`. Please do not modify it directly. | ||
//! | ||
//! More information: | ||
//! - [Unicode® Standard Annex #44][uax44] | ||
//! | ||
//! [uax44]: http://unicode.org/reports/tr44 | ||
`.trim(); | ||
|
||
https | ||
.get(PROPLIST_TXT_URL, (res) => { | ||
let text = ""; | ||
|
||
res.on("data", (chunk) => { | ||
text += chunk; | ||
}); | ||
|
||
res.on("end", () => { | ||
buildRustFile(text); | ||
}); | ||
}) | ||
.on("error", (err) => { | ||
console.log(`Failed to get 'PropList.txt': ${err.message}`); | ||
}) | ||
.end(); | ||
|
||
function buildRustFile(propListText) { | ||
const dataRegex = /(^|\n)(?<codePointStart>[0-9A-F]+)(\.\.(?<codePointEnd>[0-9A-F]+))?\s*;\s*(?<property>[^\s]+)/gi; | ||
const data = [...propListText.matchAll(dataRegex)].map( | ||
(match) => match.groups | ||
); | ||
|
||
const rustVariables = TARGET_PROPERTIES.map( | ||
([propertyName, rustTableName]) => { | ||
const codePoints = data | ||
.filter(({ property }) => property === propertyName) | ||
.map(({ codePointStart, codePointEnd }) => [ | ||
codePointStart, | ||
codePointEnd ?? codePointStart, | ||
]) | ||
.map(([codePointStart, codePointEnd]) => [ | ||
parseInt(codePointStart, 16), | ||
parseInt(codePointEnd, 16), | ||
]) | ||
.reduce((codePoints, [codePointStart, codePointEnd]) => { | ||
for (let cp = codePointStart; cp <= codePointEnd; cp++) { | ||
codePoints.push(cp); | ||
} | ||
return codePoints; | ||
}, []); | ||
|
||
codePoints.sort((a, b) => a - b); | ||
const rustTable = `&[${codePoints | ||
.map((cp) => `'\\u{${cp.toString(16).padStart(4, "0").toUpperCase()}}'`) | ||
.join(",")}]`; | ||
const rustVariable = `pub static ${rustTableName}: &[char] = ${rustTable};`; | ||
|
||
console.log(`${propertyName}: ${codePoints.length} code points`); | ||
return rustVariable; | ||
} | ||
); | ||
|
||
const rustFile = `${OUTPUT_FILE_DOC_COMMENT}\n\n${rustVariables.join( | ||
"\n\n" | ||
)}`; | ||
|
||
console.log("Writing output file..."); | ||
fs.writeFileSync(OUTPUT_FILE, rustFile); | ||
|
||
console.log("Running rustfmt..."); | ||
child_process.execSync(`rustfmt ${OUTPUT_FILE}`); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"scripts": { | ||
"build-tables": "node ./build_tables.js" | ||
} | ||
} |
Oops, something went wrong.