Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid byte sequence in US-ASCII #33

Closed
JeffLuckett opened this issue Apr 6, 2018 · 36 comments
Closed

invalid byte sequence in US-ASCII #33

JeffLuckett opened this issue Apr 6, 2018 · 36 comments

Comments

@JeffLuckett
Copy link

Environment Info:
MacOS High Sierra
VS Code: 1.22.1 (1.22.1)
Solargraph Gem: 0.18.1
Ruby Solargraph Plugin: 0.14.1
Ruby: 2.4.1p111 (Large-ish Rails app)
screen shot 2018-04-06 at 9 58 46 am

Let me know if you need more info, or if you have any suggestions.

@castwide
Copy link
Owner

castwide commented Apr 6, 2018

Is there a particular file that gives you this error? The most likely cause is a literal string with escaped characters, e.g., "\xcf".

I'm not able to reproduce this error in Windows. I'll try on MacOS later.

@JeffLuckett
Copy link
Author

JeffLuckett commented Apr 7, 2018

@castwide - I'm guessing that we've got something like that somewhere in the code base, but unfortunately the error message just isn't specific about where the error occurred.

The project is ~250K lines of code in ~2K .rb files, not including gems, etc...

@castwide
Copy link
Owner

castwide commented Apr 7, 2018

No problem. Even in the unlikely event that I can't reproduce the error, at least I can add the name of the offending file to the message.

@TheTharin
Copy link

TheTharin commented Apr 10, 2018

But is there a way to ignore it in the future? I mean I can't just go and add #encoding to any file in the project solargraph finds offending.

@castwide
Copy link
Owner

Yes, I think I can make the parser ignore it. There are already similar character encoding issues that it's able to resolve. I'll still make it report the file name in the event of an edge case it can't handle.

@castwide
Copy link
Owner

Reproduced on MacOS. I'm looking into solutions.

@castwide
Copy link
Owner

A fix is pushed to the master branch. The parser should never fail. Even if the file isn't valid Ruby code, Solargraph should add it to the workspace and report the problems in diagnostics. Character encoding should be a non-issue. It's possible for Ruby code to contain string literals with invalid characters, but Solargraph shouldn't care as long as it doesn't try to write back to the file, which it never does.

I'll update the gem in the next day or so.

@castwide
Copy link
Owner

Version 0.18.3 is published.

@JeffLuckett
Copy link
Author

JeffLuckett commented Apr 12, 2018

Thanks @castwide, instead of the original error, I now receive [NoMethodError] undefined method 'node' for nil:NilClass

screen shot 2018-04-12 at 10 24 55 am

Would you prefer I open a new issue on this?

@castwide
Copy link
Owner

@JeffLuckett Are you still getting that error in the current version (0.19.1)?

@JeffLuckett
Copy link
Author

@castwide - no, now I get:
screen shot 2018-04-18 at 1 07 56 pm

However, I've only got about 2K files in the repo that are .rb files.

=> git ls-files | grep .rb | wc -l
    1981

@castwide
Copy link
Owner

Do you have a .solargraph.yml file? If so, does the include section select files that do not have the .rb extension?

@JeffLuckett
Copy link
Author

There is no .solargraph.yml file in my project root, or in my home dir.

@castwide
Copy link
Owner

Is the project root your open folder in VSCode? Solargraph treats whatever folder you open as the workspace.

Does the project have cache or vendor/cache directories, or something similar that contains bundled gems? If so, you might need to create a .solargraph.yml and add them to the exclude section. You can generate a default config by running solargraph config from the command line or selecting Create a Solargraph config file from the VSCode command palette.

Also, this command should give you a more accurate idea of what gets added to the workspace:

find -name "*.rb" -not -path "./spec/*" -not -path "./test/*" | wc -l

The default is to include all ruby files except the spec and test folders.

@JeffLuckett
Copy link
Author

I did solargraph config and then added - vendor/**/* to the excludes. That seems to have made it happy.

Thanks for sticking with me on this :D

@castwide
Copy link
Owner

Glad you got it working. This made me think of two more items for the to-do list.

  • Add vendor to the list of folders excluded by default.
  • Implement a solargraph subcommand called something like ls-workspace that lists all the files that would be mapped based on the current configuration.

@JeffLuckett
Copy link
Author

This issue is resolved, thanks again ... closing.

@AeroCross
Copy link

AeroCross commented May 14, 2018

Hi! I was wondering if I found an edge case.

I'm using Solargraph's 0.21 version and I am still getting this error. This is a huge codebase though (about 9k Ruby files). I've reduced the count to less than 5000 to make the server work, but I still struggle to get it working, since I'm finding a lot of non-ASCII characters in form of comments, tests, etc.

After looking for a way to check what non ASCII characters are present in my codebase, I got the following:

$ ag --ruby "[\x80-\xFF]" . --stats-only
3382 matches
149 files contained matches
9451 files searched
19659019 bytes searched
0.363241 seconds

The stuff I found are characters like , , ö, etc. In other codebase I work on (which doesn't have non-ASCII characters), this doesn't occur.

These are contained in Ruby files, so I can't just get rid of them (we're specifically testing for issues with those characters, so I can't change the code to remove that).

I also notice that autocompletion still works, but it works in a very barebones way — no inline documentation, no differentiation between methods, constants, variables, etc., no symbol information for Ruby files, etc.).

I am using VS Code, if this helps, on Mac OS X 10.12.6.

$ code --version
1.23.1
d0182c3417d225529c6d5ad24b7572815d0de9ac
x64

$ solargraph --version
0.21.0

@747
Copy link

747 commented May 19, 2018

Similar error. I started to get "Invalid byte sequence in Windows-31J" after some recent update.

default

[Error - 14:29:36] Server initialization failed.
  Message: [ArgumentError] invalid byte sequence in Windows-31J
  Code: -32603 
> solargraph -v
0.21.1
> code --version
1.23.1
d0182c3417d225529c6d5ad24b7572815d0de9ac
x64
> [System.Environment]::OSVersion

Platform ServicePack Version      VersionString
-------- ----------- -------      -------------
 Win32NT             10.0.16299.0 Microsoft Windows NT 10.0.16299.0

@castwide
Copy link
Owner

@747 I started #63 to track character encoding issues.

@tekknovator
Copy link

Hi!
#63 has been closed but the issues persist. I work with scripts generating some bash with osascript calls (among other stuff) wich requires "¬" characters at some line endings. Unfortunately there is no work around that.
So solargraph keeps failing with
"Failed to start Solargraph: Error: [ArgumentError] invalid byte sequence in UTF-8"
I might just add encode or switch back to rubyLocate. Solargraph is (feels) faster though, would be nice to have.
Any ideas anyone?

@castwide
Copy link
Owner

@tekknovator Can you provide a minimal example that triggers the error? I've found at least one bug in how certain UTF-8 characters get handled (including "¬"), but it doesn't make the server fail to start.

@dwarfi09
Copy link

Problem seems to appear again with version 0.31.0 of solargraph gem.

I downgraded to version 0.30.2 and solargraph-server starts inside atom.

@AlanWarren
Copy link

Problem seems to appear again with version 0.31.0 of solargraph gem.

I downgraded to version 0.30.2 and solargraph-server starts inside atom.

Thanks, I experienced this with a rails project today using solargraph-0.31.3. Downgrading to 0.30.2 fixed the issue.

Is there anyway to view output of which files solargraph is parsing when it hits the invalid byte sequence and crashes?

@castwide
Copy link
Owner

@AlanWarren I'm working on an update that will report which file triggered the error. Hopefully it will also be able to skip offending files and parse the rest of the workspace.

Can anyone who's experiencing this error let me know if there's a backtrace or any additional information in the console log? (In VS Code, Help -> Toggle Developer Tools; in Atom, View -> Developer -> Toggle Developer Tools.)

@dwarfi09
Copy link

@castwide I just sent the console-output via email to you.

@castwide
Copy link
Owner

castwide commented Apr 2, 2019

Thanks, @kkneutgen. That was exactly what I needed. I was able to trace the problem to a regular expression match that choked on invalid character encoding.

To reproduce the error, I created a file that contained UTF-8 (e.g., '¬'), converted its contents to ANSI with an external tool, and ran the file through the Solargraph source mapper.

I think I'll be able to have a fix for this in the next minor release.

@castwide
Copy link
Owner

castwide commented Apr 3, 2019

The master branch has an update that might resolve this issue. It fixes the problem in the variation of the error I reproduced yesterday.

@castwide
Copy link
Owner

castwide commented Apr 3, 2019

One more change: exceptions during the mapping process will emit warnings that identify the file where the problem occurred.

@castwide
Copy link
Owner

castwide commented Apr 4, 2019

Gem v0.32.0 includes the latest fix for invalid byte sequences and the new mapper exception handling.

@dwarfi09
Copy link

dwarfi09 commented Apr 4, 2019

Great, thank you. No more problems with strange encodings here.

@castwide castwide closed this as completed May 9, 2019
@tekknovator
Copy link

Works here as well now. Sorry for not responding earlier. I used the characters inside of a heredoc.

@sent-hil
Copy link

sent-hil commented Nov 8, 2019

I'm getting the same error in version 0.37.2. Only version I can get to work is 0.31.3.

~/w/f/bevel:develop$ code --version
1.40.0
86405ea23e3937316009fc27c9361deee66ffbf5
x64
ag --ruby "[\x80-\xFF]" . --stats-only
265 matches
11 files contained matches
1519 files searched
2217378 bytes searched
0.041453 seconds
ruby --version
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin18]

@castwide
Copy link
Owner

castwide commented Nov 8, 2019

@sent-hil Can you give me a simple example of a file that causes the error, the character set it uses, and a sequence of steps to cause it if it doesn't happen on startup? I haven't been able to reproduce it.

@apsoto
Copy link

apsoto commented Nov 1, 2020

I see invalid byte sequence in UTF-8 (ArgumentError) with the following text

# DÌaz

Example:

$ solargraph --version
0.39.17
[/tmp/foo git:() ]
$ ls
fails.rb*
[/tmp/foo git:() ]
$ cat fails.rb
# D�az

[/tmp/foo git:() ]
$ solargraph scan
Traceback (most recent call last):
	15: from /Users/apsoto/.rvm/gems/ruby-2.7.0/bin/solargraph:5:in `<main>'
	14: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/thor-1.0.1/lib/thor/base.rb:485:in `start'
	13: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/thor-1.0.1/lib/thor.rb:392:in `dispatch'
	12: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/thor-1.0.1/lib/thor/invocation.rb:127:in `invoke_command'
	11: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/thor-1.0.1/lib/thor/command.rb:27:in `run'
	10: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/shell.rb:172:in `scan'
	 9: from /Users/apsoto/.rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/benchmark.rb:293:in `measure'
	 8: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/shell.rb:173:in `block in scan'
	 7: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/api_map.rb:170:in `load'
	 6: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/api_map.rb:67:in `catalog'
	 5: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/api_map.rb:67:in `each'
	 4: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/api_map.rb:85:in `block in catalog'
	 3: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/source_map.rb:156:in `map'
	 2: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/source_map/mapper.rb:50:in `map'
	 1: from /Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/source_map/mapper.rb:27:in `map'
/Users/apsoto/.rvm/gems/ruby-2.7.0/gems/solargraph-0.39.17/lib/solargraph/source_map/mapper.rb:201:in `process_comment_directives': invalid byte sequence in UTF-8 (ArgumentError)

@bubbavox
Copy link

bubbavox commented Feb 3, 2021

For what it's worth to other googlers, I was getting the following error in VS-Code, with Solargraph 0.40.2:
Failed to start Solargraph: Error: [ArgumentError] invalid byte sequence in UTF-8

And I traced the problem to a ruby file containing a non-ASCII character in a comment - �
This ruby file wasn't even open in VSCode -- it was just in a folder within the group of folders open in my VSCode workspace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests