module Escape

Escape module provides several escape functions.

Public Instance Methods

html_attr(str) click to toggle source

Escape.html_attr encodes a string as a double-quoted HTML attribute using character references.

Escape.html_attr("abc") #=> "\"abc\""
Escape.html_attr("a&b") #=> "\"a&b\""
Escape.html_attr("ab&<>\"c") #=> "\"ab&amp;&lt;&gt;&quot;c\""
Escape.html_attr("a'c") #=> "\"a'c\""

It escapes 4 characters:

  • ‘&’ to ‘&amp;’

  • ‘<’ to ‘&lt;’

  • ‘>’ to ‘&gt;’

  • ‘“’ to ‘&quot;’

    # File lib/escape.rb
244 def html_attr(str)
245   '"' + str.gsub(/[&<>"]/) {|ch| HTML_ATTR_ESCAPE_HASH[ch] } + '"'
246 end
html_form(pairs, sep='&') click to toggle source

Escape.html_form composes HTML form key-value pairs as a x-www-form-urlencoded encoded string.

Escape.html_form takes an array of pair of strings or an hash from string to string.

Escape.html_form([["a","b"], ["c","d"]]) #=> "a=b&c=d"
Escape.html_form({"a"=>"b", "c"=>"d"}) #=> "a=b&c=d"

In the array form, it is possible to use same key more than once. (It is required for a HTML form which contains checkboxes and select element with multiple attribute.)

Escape.html_form([["k","1"], ["k","2"]]) #=> "k=1&k=2"

If the strings contains characters which must be escaped in x-www-form-urlencoded, they are escaped using %-encoding.

Escape.html_form([["k=","&;="]]) #=> "k%3D=%26%3B%3D"

The separator can be specified by the optional second argument.

Escape.html_form([["a","b"], ["c","d"]], ";") #=> "a=b;c=d"

See HTML 4.01 for details.

    # File lib/escape.rb
164 def html_form(pairs, sep='&')
165   r = ''
166   first = true
167   pairs.each {|k, v|
168     # query-chars - pct-encoded - x-www-form-urlencoded-delimiters =
169     #   unreserved / "!" / "$" / "'" / "(" / ")" / "*" / "," / ":" / "@" / "/" / "?"
170     # query-char - pct-encoded = unreserved / sub-delims / ":" / "@" / "/" / "?"
171     # query-char = pchar / "/" / "?" = unreserved / pct-encoded / sub-delims / ":" / "@" / "/" / "?"
172     # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
173     # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
174     # x-www-form-urlencoded-delimiters = "&" / "+" / ";" / "="
175     r << sep if !first
176     first = false
177     k.each_byte {|byte|
178       ch = byte.chr
179       if %r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n =~ ch
180         r << "%" << ch.unpack("H2")[0].upcase
181       else
182         r << ch
183       end
184     }
185     r << '='
186     v.each_byte {|byte|
187       ch = byte.chr
188       if %r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n =~ ch
189         r << "%" << ch.unpack("H2")[0].upcase
190       else
191         r << ch
192       end
193     }
194   }
195   r
196 end
html_text(str) click to toggle source

Escape.html_text escapes a string appropriate for HTML text using character references.

It escapes 3 characters:

  • ‘&’ to ‘&amp;’

  • ‘<’ to ‘&lt;’

  • ‘>’ to ‘&gt;’

Escape.html_text("abc") #=> "abc"
Escape.html_text("a & b < c > d") #=> "a &amp; b &lt; c &gt; d"

This function is not appropriate for escaping HTML element attribute because quotes are not escaped.

    # File lib/escape.rb
218 def html_text(str)
219   str.gsub(/[&<>]/) {|ch| HTML_TEXT_ESCAPE_HASH[ch] }
220 end
shell_command(command) click to toggle source

Escape.shell_command composes a sequence of words to a single shell command line. All shell meta characters are quoted and the words are concatenated with interleaving space.

Escape.shell_command(["ls", "/"]) #=> "ls /"
Escape.shell_command(["echo", "*"]) #=> "echo '*'"

Note that system(*command) and system(Escape.shell_command(command)) is roughly same. There are two exception as follows.

  • The first is that the later may invokes /bin/sh.

  • The second is an interpretation of an array with only one element: the element is parsed by the shell with the former but it is recognized as single word with the later. For example, system(*[“echo foo”]) invokes echo command with an argument “foo”. But system(Escape.shell_command([“echo foo”])) invokes “echo foo” command without arguments (and it probably fails).

   # File lib/escape.rb
52 def shell_command(command)
53   command.map {|word| shell_single_word(word) }.join(' ')
54 end
shell_single_word(str) click to toggle source

Escape.shell_single_word quotes shell meta characters.

The result string is always single shell word, even if the argument is “”. Escape.shell_single_word(“”) returns “””.

Escape.shell_single_word("") #=> "''"
Escape.shell_single_word("foo") #=> "foo"
Escape.shell_single_word("*") #=> "'*'"
   # File lib/escape.rb
65 def shell_single_word(str)
66   if str.empty?
67     "''"
68   elsif %r{\A[0-9A-Za-z+,./:=@_-]+\z} =~ str
69     str
70   else
71     result = ''
72     str.scan(/('+)|[^']+/) {
73       if $1
74         result << %q{\'} * $1.length
75       else
76         result << "'#{$&}'"
77       end
78     }
79     result
80   end
81 end
uri_path(str) click to toggle source

Escape.uri_path escapes URI path using percent-encoding. The given path should be a sequence of (non-escaped) segments separated by “/”. The segments cannot contains “/”.

Escape.uri_path("a/b/c") #=> "a/b/c"
Escape.uri_path("a?b/c?d/e?f") #=> "a%3Fb/c%3Fd/e%3Ff"

The path is the part after authority before query in URI, as follows.

scheme://authority/path#fragment

See RFC 3986 for details of URI.

Note that this function is not appropriate to convert OS path to URI.

    # File lib/escape.rb
115 def uri_path(str)
116   str.gsub(%r{[^/]+}n) { uri_segment($&) }
117 end
uri_segment(str) click to toggle source

Escape.uri_segment escapes URI segment using percent-encoding.

Escape.uri_segment("a/b") #=> "a%2Fb"

The segment is “/”-splitted element after authority before query in URI, as follows.

scheme://authority/segment1/segment2/.../segmentN?query#fragment

See RFC 3986 for details of URI.

   # File lib/escape.rb
92 def uri_segment(str)
93   # pchar - pct-encoded = unreserved / sub-delims / ":" / "@"
94   # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
95   # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
96   str.gsub(%r{[^A-Za-z0-9\-._~!$&'()*+,;=:@]}n) {
97     '%' + $&.unpack("H2")[0].upcase
98   }
99 end