module Puppet::Util::CharacterEncoding

A module to centralize heuristics/practices for managing character encoding in Puppet

Public Class Methods

convert_to_utf_8(string) click to toggle source

Given a string, attempts to convert a copy of the string to UTF-8. Conversion uses encode - the string's internal byte representation is modifed to UTF-8.

This method is intended for situations where we generally trust that the string's bytes are a faithful representation of the current encoding associated with it, and can use it as a starting point for transcoding (conversion) to UTF-8.

@api public @param [String] string a string to transcode @return [String] copy of the original string, in UTF-8 if transcodable

   # File lib/puppet/util/character_encoding.rb
16 def convert_to_utf_8(string)
17   original_encoding = string.encoding
18   string_copy = string.dup
19   begin
20     if original_encoding == Encoding::UTF_8
21       if !string_copy.valid_encoding?
22         Puppet.debug {
23           _("%{value} is already labeled as UTF-8 but this encoding is invalid. It cannot be transcoded by Puppet.") % { value: string.dump }
24         }
25       end
26       # String is already valid UTF-8 - noop
27       return string_copy
28     else
29       # If the string comes to us as BINARY encoded, we don't know what it
30       # started as. However, to encode! we need a starting place, and our
31       # best guess is whatever the system currently is (default_external).
32       # So set external_encoding to default_external before we try to
33       # transcode to UTF-8.
34       string_copy.force_encoding(Encoding.default_external) if original_encoding == Encoding::BINARY
35       return string_copy.encode(Encoding::UTF_8)
36     end
37   rescue EncodingError => detail
38     # Set the encoding on our copy back to its original if we modified it
39     string_copy.force_encoding(original_encoding) if original_encoding == Encoding::BINARY
40 
41     # Catch both our own self-determined failure to transcode as well as any
42     # error on ruby's part, ie Encoding::UndefinedConversionError on a
43     # failure to encode!.
44     Puppet.debug {
45       _("%{error}: %{value} cannot be transcoded by Puppet.") % { error: detail.inspect, value: string.dump }
46     }
47     return string_copy
48   end
49 end
override_encoding_to_utf_8(string) click to toggle source

Given a string, tests if that string's bytes represent valid UTF-8, and if so return a copy of the string with external encoding set to UTF-8. Does not modify the byte representation of the string. If the string does not represent valid UTF-8, does not set the external encoding.

This method is intended for situations where we do not believe that the encoding associated with a string is an accurate reflection of its actual bytes, i.e., effectively when we believe Ruby is incorrect in its assertion of the encoding of the string.

@api public @param [String] string to set external encoding (re-label) to utf-8 @return [String] a copy of string with external encoding set to utf-8, or a copy of the original string if override would result in invalid encoding.

   # File lib/puppet/util/character_encoding.rb
65 def override_encoding_to_utf_8(string)
66   string_copy = string.dup
67   original_encoding = string_copy.encoding
68   return string_copy if original_encoding == Encoding::UTF_8
69   if string_copy.force_encoding(Encoding::UTF_8).valid_encoding?
70     return string_copy
71   else
72     Puppet.debug {
73       _("%{value} is not valid UTF-8 and result of overriding encoding would be invalid.") % { value: string.dump }
74     }
75     # Set copy back to its original encoding before returning
76     return string_copy.force_encoding(original_encoding)
77   end
78 end