Windows, Vagrant and Git Bash: process_builder.rb encoding error

16.07.2022 Programming

Under certain conditions, Vagrant triggers UndefinedConversionError in process_builder.rb when accessed in Git Bash on Windows.

Symtoms

A simple vagrant status in Git Bash on Windows is enough to trigger an error similar to this one:


$ vagrant status
C:/HashiCorp/Vagrant/embedded/gems/2.2.19/gems/childprocess-4.1.0/
lib/childprocess/windows/process_builder.rb:44:in `encode!': 
"\\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to UTF-16LE 
(Encoding::UndefinedConversionError)

The problem can also occur in the context of other commands such as vagrant up.

Problem

Inspecting the environment shows that the value of the environment variable MYVAR contains an umlaut.


$  env | grep --color='auto' -P -n "[\x80-\xFF]"
5:MYVAR=John Döe

In your case, the variable might be named different and might contain other non-ASCII characters. Other environment variables may contain non-ASCII characters, too.

Solution

Change the value of the environment variable

e.g. by executing setx MYVAR "John Doe". After a restart of Git Bash the umlaut should be gone and vagrant status should run without an error. This is a quick fix that doesn't change the root cause, just treats the symptom. A permanent value change of environment variables could affect the behavior of the system elsewhere.

Apply a fix to the vagrant code

The location where the error occurs is exactly specified in the error message: C:/HashiCorp/Vagrant/embedded/gems/2.2.19/gems/childprocess-4.1.0/lib/childprocess/windows/process_builder.rb:44

A quick look at file process_builder.rb reveals the following code at line 44:
process_builder.rb

def to_wide_string(str)
  newstr = str + "\0".encode(str.encoding)
  newstr.encode!('UTF-16LE')
end

The error message tells us what is going wrong: "\\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to UTF-16LE (Encoding::UndefinedConversionError). Assuming that the environment is read in ASCII-8BIT encoding the error is easily reproduceable by executing this little test script:

test.rb

def to_wide_string(str)
  newstr = str + "\0".encode(str.encoding)
  # ! after encode -> apply method "in place", don't return a new string object
  newstr.encode!('UTF-16LE')
end
str = "John Döe"
astr = str.force_encoding('ASCII-8BIT')
puts to_wide_string(astr)

$ /C/HashiCorp/Vagrant/embedded/mingw64/bin/ruby.exe /p/ath/to/test.rb
C:/HashiCorp/Vagrant/embedded/gems/2.2.19/gems/childprocess-4.1.0/
lib/childprocess/windows/process_builder.rb:44:in `encode!': 
"\\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to UTF-16LE 
(Encoding::UndefinedConversionError)
test.rb:4:in `encode!': "\\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8
        to UTF-16LE (Encoding::UndefinedConversionError)
        from test.rb:4:in 'to_wide_string'
        from test.rb:8:in '<main>'

The proposed solution follows two approaches:

  1. The selective replacement of known problematic characters at the ASCII-8BIT UTF-8 level using the fallback option of the ruby method encode [1]
  2. Global replacement of undefined characters with '?' [1, 2]

process_builder.rb

def to_wide_string(str)
  if str.encoding.to_s == 'ASCII-8BIT'
    str.encode!("ASCII-8BIT", "UTF-8", fallback: {"ö" => "o"})
  end
  newstr = str + "\0".encode(str.encoding)
  newstr.encode!('UTF-16LE', invalid: :replace, undef: :replace, replace: "?")
end

The disadvantage of this solution is that this patch can be overwritten with every update of Vagrant. In addition, the effects of selective or global replacement are unclear. If in doubt, the proposed changes should not be applied. Otherwise apply at your own risk.

Wait for someone to fix that problem

This does not help in the short term but should be mentioned here for the sake of completeness. Who could be 'someone'? Vagrant 2.2.16 depends on Ruby 2.6.7 but the encoding problems were apparently only fixed with Ruby version 3 [3]. Therefore, I think, no fix could be expected from Vagrant in the near future. This issue and others might be the reason migrating Vagrant to Go: A long-standing issue with some plugins and third-party distributions packaging Vagrant revolves around Vagrant shipping its own Ruby. This causes friction when you want to run Vagrant on the latest versions of Ruby or when Ruby plugins depend on packages that require specific libraries [4].

References

  1. Jesus Castello, Understanding Ruby: String Encoding, ASCII & Unicode, RubyGuides, May 27, 2019. Accessed on Jul 16, 2022. [Online]. Available: https://www.rubyguides.com/2019/05/ruby-ascii-unicode/
  2. José M. Gilgado, Troubleshooting encoding errors in Ruby, Honeybadger, Jun 24, 2020. Accessed on Jul 16, 2022. [Online]. Available: https://www.honeybadger.io/blog/troubleshooting-encoding-errors-in-ruby/
  3. Thomas Thomassen, ENV data yield ASCII-8BIT encoded strings under Windows with unicode username, Ruby Issue Tracking System, Feb 26, 2021. Accessed on Jul 20, 2022. [Online]. Available: https://bugs.ruby-lang.org/issues/9715
  4. Chris Roberts, Sophia Castellarin, Toward Vagrant 3.0, HashiCorp, Jun 11, 2021. Accessed on Jul 20, 2022. [Online]. Available: https://www.hashicorp.com/blog/toward-vagrant-3-0

Did you like the content? If you want to do something good for me, you can pour a little coffee into my empty pot.
Just click the PayPal.Me-Button* below. Thank you very much!

*For more information on our PayPal.Me link, see Impressum & Datenschutz