Skip to content

fix(html): regexp match script tag ignore comment #20341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

btea
Copy link
Collaborator

@btea btea commented Jul 3, 2025

Description

fix #20340
refs #18386

Copy link
Member

@sapphi-red sapphi-red left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think stripLiteral works with HTML files.
Maybe we can use parse5.Tokenizer to do something similar to stripLiteral but I'm not sure if that is performant than parsing + traversing the HTML directly.

@btea
Copy link
Collaborator Author

btea commented Jul 3, 2025

You are right. The local test result was correct by accident. The type value in the script tag was removed, resulting in the regular expression not matching.

Can we add another regular expression to match the comment content in HTML? 🤔

@sapphi-red
Copy link
Member

I guess we need to at least tokenize it to properly handle comments. In other words, I don't think regex is sufficient.

@btea btea requested a review from sapphi-red July 6, 2025 01:30
@sapphi-red
Copy link
Member

Would you check the performance difference? I'd like to know how much additional overhead this would have. Also we can check the HTML with regex before parsing so that we can skip the parse if that's faster.

@sapphi-red sapphi-red added feat: html p2-edge-case Bug, but has workaround or limited in scope (priority) feat: build labels Jul 7, 2025
@btea
Copy link
Collaborator Author

btea commented Jul 7, 2025

I will check it.

@btea
Copy link
Collaborator Author

btea commented Jul 8, 2025

I created a template project containing only js for testing through pnpm create vite.

I tested it locally and the results of 10 runs are as follows. It seems that when importmap does not exist, regular matching can be used to avoid a huge waste of traversal performance.

preImportMapHook(ms) postImportMapHook(ms)
no-importmap 14.69 ± 1.65 1.05 ± 0.10
no-importmap-with-regexp 0.48 ± 0.17 0.28 ± 0.17
have-importmap 15.63 ± 3.27 1.37 ± 0.18
have-importmap-with-regexp 15.48 ± 3.65 1.61 ± 0.26

I retested on a different device, and the updated results look reasonable.

The following table shows the original logical performance of Vite through regular matching.

preImportMapHook(ms) postImportMapHook(ms)
no-importmap 0.42 ± 0.11 0.28 ± 0.08
have-importmap 0.60 ± 0.09 0.30 ± 0.04

@sapphi-red
Copy link
Member

Thanks for checking the perf 💚

@sapphi-red sapphi-red changed the title fix: regexp match script tag ignore comment fix(html): regexp match script tag ignore comment Jul 9, 2025
Copy link
Member

@sapphi-red sapphi-red left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@btea btea requested a review from bluwy July 11, 2025 09:17
@bluwy
Copy link
Member

bluwy commented Jul 16, 2025

Trying to understand this a bit better, based on the perf result, isn't the regex one better? In most apps so far I think "no importmap" setups are more common.

To fix this with regex, couldn't we do html.replace(/<!--.*-->/gs, (s) => ' '.repeat(s.length)) since html comment stripping should be simpler? Maybe this would incur additional work but maybe not as much as doing a parse.

@sapphi-red
Copy link
Member

The original regex implementation is faster than the parsing approach. But with the "regex filter early return" (if (!importMapRE.test(html)) return), we can keep the perf for most apps (the setups without <script type="importmap"> as a string), as the parsing won't happen.
While we can strip html comments with regex to speed up the non-common apps, I think we can parse it as that would be more robost and this is not the common case.

@bluwy
Copy link
Member

bluwy commented Jul 16, 2025

Ah I missed that the PR still has an early regex check, so in practice this would only slow down setups using importmaps.

Still, I find 15ms a bit much for a step during html processing that only re-arranges some tags, it'll compound with other html processing we do. I would lean on keeping regex unless people are hitting problems often with it. I think thus far there weren't many issues.

@sapphi-red
Copy link
Member

@btea I wonder if the 15ms is coming from the actual parsing or this lazy load of parse5. Because I don't think preImportMapHook would take 15x time than postImportMapHook.

const { parse } = await import('parse5')

Also is this the total of 10 runs? or is it the average of it?

@btea
Copy link
Collaborator Author

btea commented Jul 16, 2025

Yes, I added const startTime = performance.now() and const endTime = performance.now() at the beginning and end of this function respectively, and calculated the results. The above values are the average and variance.

https://github.com/vitejs/vite/pull/20341/files#diff-89bae1df62862bb7f4a03d82a1e9cbf4ac6d0c042f21fbbacb0a2238bd050042L1158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat: build feat: html p2-edge-case Bug, but has workaround or limited in scope (priority)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Commented-out importmap is uncommented during build
3 participants