module Puppet::Util::CharacterEncoding
A module to centralize heuristics/practices for managing character encoding in Puppet
Public Class Methods
Given a string, attempts to convert a copy of the string to UTF-8. Conversion uses encode - the string's internal byte representation is modifed to UTF-8.
This method is intended for situations where we generally trust that the string's bytes are a faithful representation of the current encoding associated with it, and can use it as a starting point for transcoding (conversion) to UTF-8.
@api public @param [String] string a string to transcode @return [String] copy of the original string, in UTF-8 if transcodable
# File lib/puppet/util/character_encoding.rb 16 def convert_to_utf_8(string) 17 original_encoding = string.encoding 18 string_copy = string.dup 19 begin 20 if original_encoding == Encoding::UTF_8 21 if !string_copy.valid_encoding? 22 Puppet.debug { 23 _("%{value} is already labeled as UTF-8 but this encoding is invalid. It cannot be transcoded by Puppet.") % { value: string.dump } 24 } 25 end 26 # String is already valid UTF-8 - noop 27 return string_copy 28 else 29 # If the string comes to us as BINARY encoded, we don't know what it 30 # started as. However, to encode! we need a starting place, and our 31 # best guess is whatever the system currently is (default_external). 32 # So set external_encoding to default_external before we try to 33 # transcode to UTF-8. 34 string_copy.force_encoding(Encoding.default_external) if original_encoding == Encoding::BINARY 35 return string_copy.encode(Encoding::UTF_8) 36 end 37 rescue EncodingError => detail 38 # Set the encoding on our copy back to its original if we modified it 39 string_copy.force_encoding(original_encoding) if original_encoding == Encoding::BINARY 40 41 # Catch both our own self-determined failure to transcode as well as any 42 # error on ruby's part, ie Encoding::UndefinedConversionError on a 43 # failure to encode!. 44 Puppet.debug { 45 _("%{error}: %{value} cannot be transcoded by Puppet.") % { error: detail.inspect, value: string.dump } 46 } 47 return string_copy 48 end 49 end
Given a string, tests if that string's bytes represent valid UTF-8, and if so return a copy of the string with external encoding set to UTF-8. Does not modify the byte representation of the string. If the string does not represent valid UTF-8, does not set the external encoding.
This method is intended for situations where we do not believe that the encoding associated with a string is an accurate reflection of its actual bytes, i.e., effectively when we believe Ruby is incorrect in its assertion of the encoding of the string.
@api public @param [String] string to set external encoding (re-label) to utf-8 @return [String] a copy of string with external encoding set to utf-8, or a copy of the original string if override would result in invalid encoding.
# File lib/puppet/util/character_encoding.rb 65 def override_encoding_to_utf_8(string) 66 string_copy = string.dup 67 original_encoding = string_copy.encoding 68 return string_copy if original_encoding == Encoding::UTF_8 69 if string_copy.force_encoding(Encoding::UTF_8).valid_encoding? 70 return string_copy 71 else 72 Puppet.debug { 73 _("%{value} is not valid UTF-8 and result of overriding encoding would be invalid.") % { value: string.dump } 74 } 75 # Set copy back to its original encoding before returning 76 return string_copy.force_encoding(original_encoding) 77 end 78 end