Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby chokes on Windows/Russian #348

Closed
the-Arioch opened this issue Jul 18, 2023 · 2 comments
Closed

Ruby chokes on Windows/Russian #348

the-Arioch opened this issue Jul 18, 2023 · 2 comments

Comments

@the-Arioch
Copy link

the-Arioch commented Jul 18, 2023

I wanted to wet my fit in AsciiDoc, not sure if i would need Ruby at all, maybe VSCode extension would be enough. But i thought, better safe than sorry, did winget install "ruby 3.2" and tried gem install asciidoc.

...actually, i just tried gem from powershell prompt.

Some background: being "classic" desktop dev i know zilch about Ruby, but can speak of Win32 API on "flat C API" level.

So, here we go:

PS C:\> gem
C:/Ruby32-x64/lib/ruby/3.2.0/rubygems.rb:1342:in `rescue in <top (required)>': U+2014 to IBM866 in conversion from UTF-16LE to UTF-8 to IBM866 (Encoding::UndefinedConversionError)
Loading the C:/Ruby32-x64/lib/ruby/3.2.0/rubygems/defaults/operating_system.rb file caused an error. This file is owned by your OS, not by rubygems upstream. Please find out which OS package this file belongs to and follow the guidelines from your OS to report the problem and ask for help.
        from C:/Ruby32-x64/lib/ruby/3.2.0/rubygems.rb:1328:in `<top (required)>'
        from <internal:gem_prelude>:2:in `require'
        from <internal:gem_prelude>:2:in `<internal:gem_prelude>'
C:/Ruby32-x64/lib/ruby/3.2.0/win32/registry.rb:910:in `encode': U+2014 to IBM866 in conversion from UTF-16LE to UTF-8 to IBM866 (Encoding::UndefinedConversionError)
        from C:/Ruby32-x64/lib/ruby/3.2.0/win32/registry.rb:910:in `export_string'
        from C:/Ruby32-x64/lib/ruby/3.2.0/win32/registry.rb:611:in `each_key'
        from C:/Ruby32-x64/lib/ruby/site_ruby/3.2.0/ruby_installer/runtime/msys2_installation.rb:71:in `block (2 levels) in iterate_msys_paths'
        from C:/Ruby32-x64/lib/ruby/3.2.0/win32/registry.rb:435:in `open'
        from C:/Ruby32-x64/lib/ruby/3.2.0/win32/registry.rb:542:in `open'
        from C:/Ruby32-x64/lib/ruby/site_ruby/3.2.0/ruby_installer/runtime/msys2_installation.rb:70:in `block in iterate_msys_paths'
        from C:/Ruby32-x64/lib/ruby/site_ruby/3.2.0/ruby_installer/runtime/msys2_installation.rb:68:in `each'
        from C:/Ruby32-x64/lib/ruby/site_ruby/3.2.0/ruby_installer/runtime/msys2_installation.rb:68:in `iterate_msys_paths'
        from C:/Ruby32-x64/lib/ruby/site_ruby/3.2.0/ruby_installer/runtime/msys2_installation.rb:102:in `msys_path'
        from C:/Ruby32-x64/lib/ruby/site_ruby/3.2.0/ruby_installer/runtime/msys2_installation.rb:115:in `mingw_bin_path'
        from C:/Ruby32-x64/lib/ruby/site_ruby/3.2.0/ruby_installer/runtime/msys2_installation.rb:125:in `enable_dll_search_paths'
        from C:/Ruby32-x64/lib/ruby/site_ruby/3.2.0/ruby_installer/runtime/singleton.rb:27:in `enable_dll_search_paths'
        from C:/Ruby32-x64/lib/ruby/3.2.0/rubygems/defaults/operating_system.rb:24:in `<top (required)>'
        from C:/Ruby32-x64/lib/ruby/3.2.0/rubygems.rb:1332:in `require'
        from C:/Ruby32-x64/lib/ruby/3.2.0/rubygems.rb:1332:in `<top (required)>'
        from <internal:gem_prelude>:2:in `require'
        from <internal:gem_prelude>:2:in `<internal:gem_prelude>'

I have Git on my pc, which works like a charm being built with the said MSYS2 runtime, so the problem is not there.

U+2014 is EmDash and of course can be reduced to DOC codepage as a simple ASCII7 "minus" U+002D, as it ever were in pre-IBM-PC times. That said, i am not sure it is ever needed.

Well, i tried to read the code...

msys2_installation.rb

      ].each do |reg_root, base_key|
        begin
          reg_root.open(backslachs(base_key)) do |reg|

If i read the diagnostic correctly, this is where it chokes.

There is if subreg['DisplayName'] =~ /^MSYS2 / later, but feels it never gets there.

  • issue 1: the diagnostic better include the specific path where it get choked, there is four staring points to the loop and i don't not which one barfed.

For example i have VSCode installed (HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{771FD6B0-FA20-440A-A002-3B3BAC16DC50}_is1) and i have Python (HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{3d45edf4-44bb-483f-9e08-43c38c81e118}) with DisplayName set as Python 3.11.4 (64-bit) and even Ruby itself has dashes in the name HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\RubyInstaller-3.2-x64-mingw-ucrt_is1

Here is the access log, but nothign feels wrong there. Probably Ruby RTL first caches the dataset from registry, then iterates (and converts) that dataset to strings.

screenshot

Now...

ibm866 is GetOemCP or CP_OEMCP in Windows terms, a TUI (Text user interface) charset intended for non-graphic Windows apps. So the idea to convert it is generally wise, but in this specific place it feels misplaced.

Most of Windows API is UTF-16 based.

Like i said, i know zilch about Ruby but quick googling suggests Ruby string variables can have any charset at will: https://ruby-doc.org/core-2.5.3/String.html

Then WHY would anyone convert it there rather than keeping them UTF-16LE ???

First, you sorta-kinda can switch the user interface to UTF-8, albeit with caveats:

However, further reading the code suggests you care not about user interaction there at all, you only need il=subreg['InstallLocation']. Now, indeed, folder paths CAN be full unicode and be thus inaccessible from classic, pre-unicode applications. It is bad style, but it IS possible, technically.

So, the proper question, i guess, would be WHY to leave UTF16 realm and reduce the strings to windows-866 instead? The next step you would most probably do would be back-converting it to UTF16 so you can call file I/O API, like opening files, enumerating folders, etc.

So...

  • Issue 2: This trip into 866 and back is both redundant and fragile. Can't you do it lazy-eval style? only do the conversions at the last possible moment when string consumers would coerce them to? Let strings be just immutable blobs passed by reference around for as long as possible. This would prevent the whole issue from happenning. And then you won't have to think how to handle the now impossible error and how to log/display many enough details of it.

P.S. i did RegEdit search and it appears i do not have "MSYS2" anywhere in my registry. Guess, it is only different for MSYS2 develoeprs themselves. So, basically, the Ruby fails fatally over attempting to do the search guaranteed to return empty set for 99% of computers... :-/

P.P.S. i tried to guesstimate what on Earth coerces Ruby there to do the unneeded string converions, my eye stumbled on the obvious typo-error there (the sword swing is "slaSHing" not "slaCHing"):

/* ridk_use.rb */

def backslachs(path)
  path.gsub("/", "\\")
end
/* msys2_installation.rb */

    private def backslachs(path)
      path.gsub("/", "\\")
    end

If i apprehend it, then it is https://ruby-doc.org/core-2.5.3/String.html#method-i-gsub

Well, again, nothing there hints aat any pre-configured and fixed string charset, so i still fail to grasp why that fragile and redundant conversion ever gets kicked in in the fist place...

@the-Arioch
Copy link
Author

or this, in registry.rb

    def export_string(str, enc = Encoding.default_internal || LOCALE) # :nodoc:
      str.encode(enc)
    end

hence

    def each_key
      index = 0
      while true
        begin
          subkey, wtime = API.EnumKey(@hkey, index)
        rescue Error
          break
        end
        subkey = export_string(subkey)
        yield subkey, wtime

and

    def each_value
      index = 0
      while true
        begin
          subkey = API.EnumValue(@hkey, index)
        rescue Error
          break
        end
        subkey = export_string(subkey)

Now, the key thing probably is that UNUSED variable enc = Encoding.default_internal || LOCALE

It says few interesting things that i can not quite comprehend.

::default_internal is initialized by the source file's internal_encoding or -E option.

and

The locale encoding (ENCODING), not ::default_internal, is used as the encoding of created strings.

I wonder if it can be made "just work" by swtching it to UTF-8 or UTF-16
However, Google says

I can not know what this "theory" would mean in practice given all the legacy code...

@the-Arioch
Copy link
Author

Well, "-E" option is as good as not existing

I was thinking about just modifying the "gem.cmd" and call it Hail Mary day, but no luck.

Feels like dead-end on my part (short of removing that loop altogether).

The doc seem to suggest, that overriding global part is possible in the sources, but WHERE to do it safely, if that is even possible at all is above my level.

From abstract common sense it shouldbe OK for Ruby internals just to run full Unicode inside the "OS API" perimeter, but who knows.

larskanis added a commit to larskanis/ruby that referenced this issue Dec 25, 2023
Since ruby-3.0 usually all strings from the Windows-API are returned as UTF-8 strings.
This is a leftover from 2.x times.

Fixes: oneclick/rubyinstaller2#348
larskanis added a commit to oneclick/rubyinstaller2-packages that referenced this issue Dec 25, 2023
larskanis added a commit to larskanis/ruby that referenced this issue Dec 25, 2023
Since ruby-3.0 usually all strings from the Windows-API are returned as UTF-8 strings.
Win32::Registry so far returned OEM encoding.
This was a leftover from 2.x times.
This commit changes it to UTF-8.

Fixes: oneclick/rubyinstaller2#348
larskanis added a commit to ruby/win32-registry that referenced this issue Jan 5, 2024
Since ruby-3.0 usually all strings from the Windows-API are returned as UTF-8 strings.
Win32::Registry so far returned OEM encoding.
This was a leftover from 2.x times.
This commit changes it to UTF-8.

Fixes: oneclick/rubyinstaller2#348
larskanis added a commit to oneclick/rubyinstaller2-packages that referenced this issue Jan 19, 2024
larskanis added a commit to oneclick/rubyinstaller2-packages that referenced this issue Jan 20, 2024
larskanis added a commit to ruby/win32-registry that referenced this issue Oct 1, 2024
Since ruby-3.0 usually all strings from the Windows-API are returned as UTF-8 strings.
Win32::Registry partly returned OEM encoding.
This was a leftover from 2.x times.
This commit changes it to UTF-8.

Fixes: oneclick/rubyinstaller2#348
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant