module Puppet::Pops::Parser::SlurpSupport

This module is an integral part of the Lexer. It defines the string slurping behavior - finding the string and non string parts in interpolated strings, translating escape sequences in strings to their single character equivalence.

PERFORMANCE NOTE: The various kinds of slurping could be made even more generic, but requires additional parameter passing and evaluation of conditional logic. TODO: More detailed performance analysis of excessive character escaping and interpolation.

Constants

DQ_ESCAPES
SLURP_ALL_PATTERN
SLURP_DQ_PATTERN
SLURP_SQ_PATTERN
SLURP_UQNE_PATTERN

unquoted, no escapes

SLURP_UQ_PATTERN
SQ_ESCAPES
UQ_ESCAPES

Public Instance Methods

slurp(scanner, pattern, escapes, ignore_invalid_escapes) click to toggle source

Slurps a string from the given scanner until the given pattern and then replaces any escaped characters given by escapes into their control-character equivalent or in case of line breaks, replaces the pattern r?n with an empty string. The returned string contains the terminating character. Returns nil if the scanner can not scan until the given pattern.

    # File lib/puppet/pops/parser/slurp_support.rb
 66 def slurp(scanner, pattern, escapes, ignore_invalid_escapes)
 67   str = scanner.scan_until(pattern) || return
 68 
 69   return str unless str.include?('\\')
 70 
 71   return str.gsub!(/\\(\\|')/m, '\1') || str if escapes.equal?(SQ_ESCAPES)
 72 
 73   # Process unicode escapes first as they require getting 4 hex digits
 74   # If later a \u is found it is warned not to be a unicode escape
 75   if escapes.include?('u')
 76     # gsub must be repeated to cater for adjacent escapes
 77     while(str.gsub!(/((?:[^\\]|^)(?:[\\]{2})*)\\u(?:([\da-fA-F]{4})|\{([\da-fA-F]{1,6})\})/m) { $1 + [($2 || $3).hex].pack("U") })
 78       # empty block. Everything happens in the gsub block
 79     end
 80   end
 81 
 82   begin
 83   str.gsub!(/\\([^\r\n]|(?:\r?\n))/m) {
 84     ch = $1
 85     if escapes.include? ch
 86       case ch
 87       when 'r'   ; "\r"
 88       when 'n'   ; "\n"
 89       when 't'   ; "\t"
 90       when 's'   ; ' '
 91       when 'u'
 92         lex_warning(Issues::ILLEGAL_UNICODE_ESCAPE)
 93         "\\u"
 94       when "\n"  ; ''
 95       when "\r\n"; ''
 96       else      ch
 97       end
 98     else
 99       lex_warning(Issues::UNRECOGNIZED_ESCAPE, :ch => ch) unless ignore_invalid_escapes
100       "\\#{ch}"
101     end
102   }
103   rescue ArgumentError => e
104     # A invalid byte sequence may be the result of faulty input as well, but that could not possibly
105     # have reached this far... Unfortunately there is no more specific error and a match on message is
106     # required to differentiate from other internal problems.
107     if e.message =~ /invalid byte sequence/
108       lex_error(Issues::ILLEGAL_UNICODE_ESCAPE)
109     else
110       raise e
111     end
112   end
113   str
114 end
slurp_dqstring() click to toggle source
   # File lib/puppet/pops/parser/slurp_support.rb
32 def slurp_dqstring
33   scn = @scanner
34   last = scn.matched
35   str = slurp(scn, SLURP_DQ_PATTERN, DQ_ESCAPES, false)
36   unless str
37     lex_error(Issues::UNCLOSED_QUOTE, :after => format_quote(last), :followed_by => followed_by)
38   end
39 
40   # Terminator may be a single char '"', '$', or two characters '${' group match 1 (scn[1]) from the last slurp holds this
41   terminator = scn[1]
42   [str[0..(-1 - terminator.length)], terminator]
43 end
slurp_sqstring() click to toggle source
   # File lib/puppet/pops/parser/slurp_support.rb
24 def slurp_sqstring
25   # skip the leading '
26   @scanner.pos += 1
27   str = slurp(@scanner, SLURP_SQ_PATTERN, SQ_ESCAPES, :ignore_invalid_escapes)
28   lex_error(Issues::UNCLOSED_QUOTE, :after => "\"'\"", :followed_by => followed_by) unless str
29   str[0..-2] # strip closing "'" from result
30 end
slurp_uqstring() click to toggle source

Copy from old lexer - can do much better

   # File lib/puppet/pops/parser/slurp_support.rb
46 def slurp_uqstring
47   scn = @scanner
48   str = slurp(scn, @lexing_context[:uq_slurp_pattern], @lexing_context[:escapes], :ignore_invalid_escapes)
49 
50   # Terminator may be a single char '$', two characters '${', or empty string '' at the end of intput.
51   # Group match 1 holds this.
52   # The exceptional case is found by looking at the subgroup 1 of the most recent match made by the scanner (i.e. @scanner[1]).
53   # This is the last match made by the slurp method (having called scan_until on the scanner).
54   # If there is a terminating character is must be stripped and returned separately.
55   #
56   terminator = scn[1]
57   [str[0..(-1 - terminator.length)], terminator]
58 end