Skip to content
This repository has been archived by the owner on Jul 30, 2020. It is now read-only.

False Positive: Source file is not valid UTF-8 #57

Closed
DaanDeMeyer opened this issue Nov 19, 2017 · 14 comments
Closed

False Positive: Source file is not valid UTF-8 #57

DaanDeMeyer opened this issue Nov 19, 2017 · 14 comments
Labels

Comments

@DaanDeMeyer
Copy link
Contributor

DaanDeMeyer commented Nov 19, 2017

I sometimes get a 'source file is not valid UTF-8' error when editing C++ code with cquery. Haven't been able to find out how to reproduce it.

a

Commenting/uncommenting line doesn't remove the error. Removing and pasting all the code in the file does remove the error

@topisani
Copy link
Contributor

I get this too, but only in vsc, never in emacs

@DaanDeMeyer
Copy link
Contributor Author

I think I've reproduced it.

It happens whenever I type in a character such as 'à' or 'é' by accident.

@jacobdufault
Copy link
Owner

What happens if you compile the file using clang after having typed the character? Does it complain?

@DaanDeMeyer
Copy link
Contributor Author

Clang compiles without warnings or errors

@agauniyal
Copy link
Contributor

depends on which version is being used to compile, it could be clang 5 since this plugin is using 4.

@topisani
Copy link
Contributor

its definately a bug, it compiles fine, and as mentioned, only happens in the vscode client.

@MaskRay
Copy link
Contributor

MaskRay commented Dec 8, 2017

Fixed?

@jhasse
Copy link
Contributor

jhasse commented Dec 11, 2017

Fixed?

Still happens for me when using umlauts in literals for example.

@MaskRay
Copy link
Contributor

MaskRay commented Jan 1, 2018

Example source file?

@Riatre
Copy link
Contributor

Riatre commented Jan 11, 2018

I can confirm that this still happens. How to reproduce:

  1. Open a source file with VSCode client.
  2. Type in some non-ASCII character (in my case, "(", though anything non single byte should work).
  3. cquery reports "source file is not valid UTF-8", which is unexpected, but reasonable.
  4. Delete that character, cquery still reports "source file is not valid UTF-8", which is unexpected. Reload the entire VSCode fixes this.

My guess is it might be a bug in vscode client.

@topisani
Copy link
Contributor

My guess is it might be a bug in vscode client.

nope, happens in emacs too - but often it works for me to just delete the line i entered a bad char on and paste it back (saving for reindex in between)

@Riatre
Copy link
Contributor

Riatre commented Jan 11, 2018

Seems like this is caused by the fact that all the offsets and lengths in Language Server Protocol is given as the amount of UTF-8 characters UTF-16 code units instead of bytes, and we treat it as bytes when updating WorkingFile.buffer_content. (src/working_files.cc:333)

Sounds difficult to fix without introducing UTF-16 aware std::string indexing codes.

Edit: Nope, it's not UTF-8, it's UTF-16 as per specification. But TextDocumentContentChangeEvent.text is sent in UTF-8. (╯‵□′)╯︵┻━┻

@Riatre
Copy link
Contributor

Riatre commented Jan 11, 2018

And what's worse, lsp-mode have no idea about these UTF-16 things, so positions coming from Emacs would be in UTF-8 characters.

Maybe we could use an UTF-8 iterator of std::string for working on buffer_content, this still breaks whenever there are 4-byte UTF-8 characters (as at this point Visual Studio Code disagrees with lsp-mode on how many "characters" there are). But being unable to insert emoji should be less annoying than having to restart cquery after accidentally typed in à or (.

MaskRay added a commit that referenced this issue Jan 13, 2018
@MaskRay MaskRay added the vscode label Jan 13, 2018
@MaskRay
Copy link
Contributor

MaskRay commented Jan 13, 2018

Thank @Riatre for troubleshooting. Emacs lsp-mode is good now. Don't use emojis 😿 in VSCode

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants