module Puppet::Pops::Parser::SlurpSupport
This module is an integral part of the Lexer. It defines the string slurping behavior - finding the string and non string parts in interpolated strings, translating escape sequences in strings to their single character equivalence.
PERFORMANCE NOTE: The various kinds of slurping could be made even more generic, but requires additional parameter passing and evaluation of conditional logic. TODO: More detailed performance analysis of excessive character escaping and interpolation.
Constants
- DQ_ESCAPES
- SLURP_ALL_PATTERN
- SLURP_DQ_PATTERN
- SLURP_SQ_PATTERN
- SLURP_UQNE_PATTERN
unquoted, no escapes
- SLURP_UQ_PATTERN
- SQ_ESCAPES
- UQ_ESCAPES
Public Instance Methods
Slurps a string from the given scanner until the given pattern and then replaces any escaped characters given by escapes into their control-character equivalent or in case of line breaks, replaces the pattern r?n with an empty string. The returned string contains the terminating character. Returns nil if the scanner can not scan until the given pattern.
# File lib/puppet/pops/parser/slurp_support.rb 66 def slurp(scanner, pattern, escapes, ignore_invalid_escapes) 67 str = scanner.scan_until(pattern) || return 68 69 return str unless str.include?('\\') 70 71 return str.gsub!(/\\(\\|')/m, '\1') || str if escapes.equal?(SQ_ESCAPES) 72 73 # Process unicode escapes first as they require getting 4 hex digits 74 # If later a \u is found it is warned not to be a unicode escape 75 if escapes.include?('u') 76 # gsub must be repeated to cater for adjacent escapes 77 while(str.gsub!(/((?:[^\\]|^)(?:[\\]{2})*)\\u(?:([\da-fA-F]{4})|\{([\da-fA-F]{1,6})\})/m) { $1 + [($2 || $3).hex].pack("U") }) 78 # empty block. Everything happens in the gsub block 79 end 80 end 81 82 begin 83 str.gsub!(/\\([^\r\n]|(?:\r?\n))/m) { 84 ch = $1 85 if escapes.include? ch 86 case ch 87 when 'r' ; "\r" 88 when 'n' ; "\n" 89 when 't' ; "\t" 90 when 's' ; ' ' 91 when 'u' 92 lex_warning(Issues::ILLEGAL_UNICODE_ESCAPE) 93 "\\u" 94 when "\n" ; '' 95 when "\r\n"; '' 96 else ch 97 end 98 else 99 lex_warning(Issues::UNRECOGNIZED_ESCAPE, :ch => ch) unless ignore_invalid_escapes 100 "\\#{ch}" 101 end 102 } 103 rescue ArgumentError => e 104 # A invalid byte sequence may be the result of faulty input as well, but that could not possibly 105 # have reached this far... Unfortunately there is no more specific error and a match on message is 106 # required to differentiate from other internal problems. 107 if e.message =~ /invalid byte sequence/ 108 lex_error(Issues::ILLEGAL_UNICODE_ESCAPE) 109 else 110 raise e 111 end 112 end 113 str 114 end
# File lib/puppet/pops/parser/slurp_support.rb 32 def slurp_dqstring 33 scn = @scanner 34 last = scn.matched 35 str = slurp(scn, SLURP_DQ_PATTERN, DQ_ESCAPES, false) 36 unless str 37 lex_error(Issues::UNCLOSED_QUOTE, :after => format_quote(last), :followed_by => followed_by) 38 end 39 40 # Terminator may be a single char '"', '$', or two characters '${' group match 1 (scn[1]) from the last slurp holds this 41 terminator = scn[1] 42 [str[0..(-1 - terminator.length)], terminator] 43 end
# File lib/puppet/pops/parser/slurp_support.rb 24 def slurp_sqstring 25 # skip the leading ' 26 @scanner.pos += 1 27 str = slurp(@scanner, SLURP_SQ_PATTERN, SQ_ESCAPES, :ignore_invalid_escapes) 28 lex_error(Issues::UNCLOSED_QUOTE, :after => "\"'\"", :followed_by => followed_by) unless str 29 str[0..-2] # strip closing "'" from result 30 end
Copy from old lexer - can do much better
# File lib/puppet/pops/parser/slurp_support.rb 46 def slurp_uqstring 47 scn = @scanner 48 str = slurp(scn, @lexing_context[:uq_slurp_pattern], @lexing_context[:escapes], :ignore_invalid_escapes) 49 50 # Terminator may be a single char '$', two characters '${', or empty string '' at the end of intput. 51 # Group match 1 holds this. 52 # The exceptional case is found by looking at the subgroup 1 of the most recent match made by the scanner (i.e. @scanner[1]). 53 # This is the last match made by the slurp method (having called scan_until on the scanner). 54 # If there is a terminating character is must be stripped and returned separately. 55 # 56 terminator = scn[1] 57 [str[0..(-1 - terminator.length)], terminator] 58 end