-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search results for "UTF-8 with BOM" files shifted on first line by a character #66189
Conversation
@@ -40,7 +40,7 @@ export class Match { | |||
private _fullPreviewRange: ISearchRange; | |||
|
|||
constructor(private _parent: FileMatch, private _fullPreviewLines: string[], _fullPreviewRange: ISearchRange, _documentRange: ISearchRange) { | |||
this._oneLinePreviewText = _fullPreviewLines[_fullPreviewRange.startLineNumber]; | |||
this._oneLinePreviewText = stripUTF8BOM(_fullPreviewLines[_fullPreviewRange.startLineNumber]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a UTF-8 with BOM file and _fullPreviewRange.startLineNumber
is 0, _fullPreviewLines[0]
will starts with BOM, then _fullPreviewLines
does not match _fullPreviewRange
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the BOM should be stripped already by ripgrepTextSearchEngine right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it to ripgrepTextSearchEngine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's already there, right? That file should be stripping the BOM correctly in all cases with your change.
matchText = stripUTF8BOM(matchText); | ||
startCol -= 3; | ||
endCol -= 3; | ||
startCol -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this? The BOM is 3 bytes long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there is toString()
here, Buffer.from([0xEF, 0xBB, 0xBF]).toString().length
is 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's correct thanks.
I found another solution: Remove then ripgrep will not output BOM character in JSON. Or I can manually strip the BOM character in ripgrepTextSearchEngine. Which do you prefer? |
I want to keep that because I found that search was faster in some cases without it. Also, you could probably search with a different encoding but still end up finding results in UTF8 files with BOMs. Also I don't trust ripgrep to keep that behavior forever. So, let's keep the check to strip the BOM on our end. |
I stripped |
Looks great, thanks! |
Fix #66188
This is my first pull request in this project. Please let me know if my solution is not good enough, I am willing to improve it.