Skip to content

enh(python) Add support for unicode identifiers #3280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
10 changes: 10 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,25 @@
## Version 11.3.0 (most likely)

Parser:

- add first rough performance testing script (#3280) [Austin Schick][]

Grammars:

- fix(python) added support for unicode identifiers (#3280) [Austin Schick][]
- enh(css/less/stylus/scss) improve consistency of function dispatch (#3301) [Josh Goebel][]
- enh(css/less/stylus/scss) detect block comments more fully (#3301) [Josh Goebel][]
- fix(cpp) switch is a keyword (#3312) [Josh Goebel][]
- fix(cpp) fix `xor_eq` keyword highlighting. [Denis Kovalchuk][]
- enh(c,cpp) highlight type modifiers as type (#3316) [Josh Goebel][]
- enh(css/less/stylus/scss) add support for CSS Grid properties [monochromer][]

[Austin Schick]: https://github.com/austin-schick
[Josh Goebel]: https://github.com/joshgoebel
[Denis Kovalchuk]: https://github.com/deniskovalchuk
[monochromer]: https://github.com/monochromer


## Version 11.2.0

Build:
Expand Down Expand Up @@ -41,6 +50,7 @@ New Languages:
[Bradley Mackey]: https://github.com/bradleymackey
[Dereavy]: https://github.com/dereavy


## Version 11.1.0

Grammars:
Expand Down
11 changes: 11 additions & 0 deletions docs/mode-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,22 @@ name
The canonical name of this language, ie "JavaScript", etc.


unicodeRegex
^^^^^^^^^^^^

- **type**: boolean

Expresses whether the grammar in question uses Unicode (``u`` flag) regular expressions.
(defaults to false)


case_insensitive
^^^^^^^^^^^^^^^^

- **type**: boolean

Case insensitivity of language keywords and regexps. Used only on the top-level mode.
(defaults to false)


aliases
Expand Down Expand Up @@ -92,6 +102,7 @@ disableAutodetect
- **type**: boolean

Disables autodetection for this language.
(defaults to false, meaning auto-detect is enabled)


compilerExtensions
Expand Down
11 changes: 6 additions & 5 deletions src/languages/python.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ Website: https://www.python.org
Category: common
*/

import { UNDERSCORE_IDENT_RE } from '../lib/modes.js';
import * as regex from '../lib/regex.js';

export default function(hljs) {
const IDENT_RE = /[\p{XID_Start}_]\p{XID_Continue}*/u;
const RESERVED_WORDS = [
'and',
'as',
Expand Down Expand Up @@ -358,6 +358,7 @@ export default function(hljs) {
'gyp',
'ipython'
],
unicodeRegex: true,
keywords: KEYWORDS,
illegal: /(<\/|->|\?)|=>/,
contains: [
Expand All @@ -379,7 +380,7 @@ export default function(hljs) {
{
match: [
/def/, /\s+/,
UNDERSCORE_IDENT_RE
IDENT_RE,
],
scope: {
1: "keyword",
Expand All @@ -392,14 +393,14 @@ export default function(hljs) {
{
match: [
/class/, /\s+/,
UNDERSCORE_IDENT_RE, /\s*/,
/\(\s*/, UNDERSCORE_IDENT_RE,/\s*\)/
IDENT_RE, /\s*/,
/\(\s*/, IDENT_RE,/\s*\)/
],
},
{
match: [
/class/, /\s+/,
UNDERSCORE_IDENT_RE
IDENT_RE
],
}
],
Expand Down
2 changes: 1 addition & 1 deletion src/lib/mode_compiler.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ export function compileLanguage(language) {
function langRe(value, global) {
return new RegExp(
regex.source(value),
'm' + (language.case_insensitive ? 'i' : '') + (global ? 'g' : '')
'm' + (language.case_insensitive ? 'i' : '') + (language.unicodeRegex ? 'u' : '') + (global ? 'g' : '')
);
}

Expand Down
21 changes: 14 additions & 7 deletions test/markup/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,23 @@ describe('highlight() markup', async() => {
const markupPath = utility.buildPath('markup');

if (!process.env.ONLY_EXTRA) {
const languages = await fs.readdir(markupPath);
let languages = null;
if (process.env.ONLY_LANGUAGES) {
languages = process.env.ONLY_LANGUAGES.split(" ");
} else {
languages = await fs.readdir(markupPath);
}
languages.forEach(testLanguage);
}

const thirdPartyPackages = await getThirdPartyPackages();
thirdPartyPackages.forEach(
(pkg) => pkg.names.forEach(
(name, idx) => testLanguage(name, { testDir: pkg.markupTestPaths[idx] })
)
);
if (!process.env.ONLY_LANGUAGES) {
const thirdPartyPackages = await getThirdPartyPackages();
thirdPartyPackages.forEach(
(pkg) => pkg.names.forEach(
(name, idx) => testLanguage(name, { testDir: pkg.markupTestPaths[idx] })
)
);
}
});

it("adding dynamic tests...", async function() {}); // this is required to work
Expand Down
23 changes: 23 additions & 0 deletions test/markup/python/diacritic_identifiers.expect.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<span class="hljs-keyword">def</span> <span class="hljs-title function_">fóö</span>():
<span class="hljs-keyword">pass</span>

<span class="hljs-keyword">def</span> <span class="hljs-title function_">bär</span>():
<span class="hljs-keyword">pass</span>

<span class="hljs-keyword">def</span> <span class="hljs-title function_">FOÖ</span>():
<span class="hljs-keyword">pass</span>

<span class="hljs-keyword">def</span> <span class="hljs-title function_">ÿay</span>():
<span class="hljs-keyword">pass</span>

<span class="hljs-keyword">class</span> <span class="hljs-title class_">fóö</span>():
<span class="hljs-keyword">pass</span>

<span class="hljs-keyword">class</span> <span class="hljs-title class_">bär</span>():
<span class="hljs-keyword">pass</span>

<span class="hljs-keyword">class</span> <span class="hljs-title class_">FOÖ</span>():
<span class="hljs-keyword">pass</span>

<span class="hljs-keyword">class</span> <span class="hljs-title class_">ÿay</span>():
<span class="hljs-keyword">pass</span>
23 changes: 23 additions & 0 deletions test/markup/python/diacritic_identifiers.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
def fóö():
pass

def bär():
pass

def FOÖ():
pass

def ÿay():
pass

class fóö():
pass

class bär():
pass

class FOÖ():
pass

class ÿay():
pass
4 changes: 3 additions & 1 deletion test/regex/lib/util.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ const { RegExpParser } = require('regexpp');
* @typedef {{ pattern: Pattern, flags: Flags }} LiteralAST
*/

const parser = new RegExpParser({ strict: false, ecmaVersion: 6 });
const parser = new RegExpParser({ strict: false, ecmaVersion: 2018 });
// ecmaVersion 2018 is ECMAScript 9

/** @type {Map<string, LiteralAST>} */
const astCache = new Map();

Expand Down
10 changes: 7 additions & 3 deletions tools/checkAutoDetect.js
Original file line number Diff line number Diff line change
Expand Up @@ -58,12 +58,16 @@ function testAutoDetection(language, index, languages) {
});
}

const languages = hljs.listLanguages()
.filter(hljs.autoDetection);
let languages = null;
if (process.env.ONLY_LANGUAGES) {
languages = process.env.ONLY_LANGUAGES.split(" ");
} else {
languages = hljs.listLanguages().filter(hljs.autoDetection);
}

console.log('Checking auto-highlighting for ' + colors.grey(languages.length) + ' languages...');
languages.forEach((lang, index) => {
if (index%60===0) { console.log("") }
if (index % 60 === 0) { console.log(""); }
testAutoDetection(lang)
process.stdout.write(".");
});
Expand Down
70 changes: 70 additions & 0 deletions tools/perf.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
#!/usr/bin/env node
const execSync = require('child_process').execSync;
const fs = require('fs');
const { performance } = require('perf_hooks');

const build = () => {
console.log(`Starting perf tests, building hljs ... `);
// build node.js version of library with CJS and ESM libraries
execSync('npm run build', {
cwd: '.',
env: Object.assign(
process.env
)
});
};

const timeTest = (name, func) => {
process.stdout.write(` running ${name}...`);
const t0 = performance.now();
func();
const t1 = performance.now();
console.log(` done! [${((t1 - t0) / 1000).toFixed(2)}s elapsed]`);
}

const oneLanguageMarkupTests = (lang) => {
for (let i = 0; i < 50; i++) {
execSync('npx mocha ./test/markup', {
cwd: '.',
env: Object.assign(
process.env,
{ ONLY_LANGUAGES: lang }
)
});
}
};

const oneLanguageCheckAutoDetect = (lang) => {
for (let i = 0; i < 50; i++) {
execSync('node ./tools/checkAutoDetect.js', {
env: Object.assign(
process.env,
{ ONLY_LANGUAGES: lang }
)
});
}
};

const globalCheckAutoDetect = () => {
for (let i = 0; i < 5; i++) {
execSync('node ./tools/checkAutoDetect.js');
}
};

const highlightFile = (lang) => {
const source = fs.readFileSync(`./tools/sample_files/${lang}.txt`, { encoding:'utf8' });
const hljs = require('../build');
for (let i = 0; i < 2000; i++) {
hljs.highlight(source, {language: lang});
}
};

const main = (lang) => {
build();
timeTest(`global checkAutoDetect`, globalCheckAutoDetect);
timeTest(`${lang}-only markup tests`, () => oneLanguageMarkupTests(lang));
timeTest(`${lang}-only checkAutoDetect`, () => oneLanguageCheckAutoDetect(lang));
timeTest(`highlight large file`, () => highlightFile(lang));
};

main('python');
Loading