SONARHTML-384 Skip non-HTML ERB files#675
SONARHTML-384 Skip non-HTML ERB files#675erwan-leforestier-sonarsource wants to merge 3 commits into
Conversation
Ruling ReportThe following ruling changes are in this PR: Rule:
|
|
| } | ||
|
|
||
| @Test | ||
| void erb_with_recognized_non_html_intermediate_extension_should_be_analyzed() { |
There was a problem hiding this comment.
💡 Quality: Test name is misleading: says 'non_html' but tests recognized ext
The test erb_with_recognized_non_html_intermediate_extension_should_be_analyzed is confusingly named. It tests that script.php.erb IS analyzed because .php is a recognized extension. The word 'non_html' makes it sound like the file should NOT be analyzed, but the assertion confirms it should. A clearer name would reflect that this tests a recognized (non-strictly-HTML) intermediate extension being properly kept.
Remove 'non_html' from the name since PHP is recognized by sonar-html and the file should be analyzed:
@Test
void erb_with_recognized_intermediate_extension_should_be_analyzed() {
- Apply fix
Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎
🤖 Generated with GitHub Actions
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsNarrows .erb file selection to exclude non-HTML Ruby templates, resolving spurious analysis errors. Rename the misleading 'non_html' test case to accurately reflect the validated file extension logic. 💡 Quality: Test name is misleading: says 'non_html' but tests recognized ext📄 sonar-html-plugin/src/test/java/org/sonar/plugins/html/core/HtmlSensorTest.java:355 The test Remove 'non_html' from the name since PHP is recognized by sonar-html and the file should be analyzed🤖 Prompt for agentsOptionsAuto-apply is off → Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
I checked Initially the target was to stop analyzing files like So I think the limitation is that we are still purely guessing from the filename, whether that logic sits in Concretely, I mean something in this spirit: String clean = firstChunk.replaceAll("<%[\\s\\S]*?%>", " ");
boolean strongHtml = Pattern.compile("<!DOCTYPE\\s+html\\b|<(?:html|head|body)\\b", Pattern.CASE_INSENSITIVE)
.matcher(clean)
.find();
long weakHtmlTags = Pattern.compile("<(?:div|span|p|a|ul|ol|li|table|tr|td|th|form|input|textarea|select|button|script|style|section|article|header|footer|nav|main)\\b", Pattern.CASE_INSENSITIVE)
.matcher(clean)
.results()
.count();
boolean likelyHtml = strongHtml || weakHtmlTags >= 2;The exact regex is not the important part; the point is that this is still computationally cheap: no parser, no full-file read required, just a couple of linear scans on the first |




Summary
Narrow
.erbfile selection in sonar-html: an ERB template is only analyzed when its intermediate extension (before.erb) is one sonar-html already recognizes — i.e. it belongs to one of the existing layers (HTML language suffixes, JSP language suffixes, or the hard-codedOTHER_FILE_SUFFIXESinHtmlSensor). This dropsDockerfile.erb,config.yml.erb,app.js.erb, etc. without introducing a per-filename blocklist.Changes
ErbFileFilter.shouldSkip(filename, recognizedExtensions)— decides whether a.erbfile lacks a recognized double extension and should be skipped.HtmlSensorbuilds the recognized set fromsonar.html.file.suffixes+sonar.jsp.file.suffixes+OTHER_FILE_SUFFIXES(with the constants' defaults as a floor) and skips files flagged by the filter at debug level.HtmlSensor(skipDockerfile.erb,config.yml.erb,index.erb; analyzepage.html.erb,script.php.erb).Functional Validation
Attached: SONARHTML-384-fv.zip
Unzip and run:
./run.sh
Expected output is in
expected-output.txt. The README shows thebefore/after comparison so you can reproduce the difference directly.