module PlainText::Split

Contains a method that splits a String in a reversible way

String#split is a powerful method. One caveat is there is no way to guarantee the possibility to reverse the process when a random Regexp (as opposed to String or when the user knows what exactly the Regexp is or has a perfect control about it) is given, because the resultant Array contains all the group-ed String as elements.

This module provides a method to enable it. Requiring this file makes the method included in the String class.

@example Reversible (the method is assumed to be included in String)

my_str.split_with_delimiter(/MyRegexp/).join == my_str  # => true

@author Masa Sakano (Wise Babel Ltd)

Public Class Methods

count_lines(instr, linebreak: $/) click to toggle source

The class-method version of the instance method of the same name.

One more parameter (input String) is required to specify.

@param instr [String] String that is examined. @param linebreak: [String] \n etc (Default: $/). @return [Integer] always positive @see count_lines

# File lib/plain_text/split.rb, line 77
def self.count_lines(instr, linebreak: $/)
  return 0 if instr.empty?
  ar = instr.split(linebreak, -1)  # -1 is specified to preserve the last linebreak(s).
  ar.pop if "" == ar[-1]
  ar.size
end
count_regexp(instr, re_in, like_linenum: false, with_if_end: false) click to toggle source

The class-method version of the instance method of the same name.

One more parameter (input String) is required to specify.

@param instr [String] String that is examined. @param re_in [Regexp, String] If String, it is interpreted literally as in String#split. @param like_linenum: [Boolean] if true (Def: false), it counts like the line number. @param with_if_end: [Boolean] a special case (see the description). @return [Integer] always positive @see PlainText::Split#count_regexp

# File lib/plain_text/split.rb, line 58
def self.count_regexp(instr, re_in, like_linenum: false, with_if_end: false)
  like_linenum = true if with_if_end
  return (with_if_end ? [0, true] : 0) if instr.empty?
  allsize = split_with_delimiter(instr, re_in).size

  n_normal = allsize.div(2)
  return n_normal if !like_linenum
  n_lines = (allsize.even? ? allsize : allsize+1).div 2
  with_if_end ? [n_normal, (n_normal ==  n_lines)] : n_lines
end
split_with_delimiter(instr, re_in) click to toggle source

The class-method version of the instance method of the same name.

One more parameter (input String) is required to specify.

@param instr [String] String that is examined. @param re_in [Regexp, String] If String, it is interpreted literally as in String#split. @return [Array] @see PlainText::Split#split_with_delimiter

# File lib/plain_text/split.rb, line 30
def self.split_with_delimiter(instr, re_in)
  re_in = Regexp.new(Regexp.quote(re_in)) if re_in.class.method_defined? :to_str
  re_grp = add_grouping(re_in)  # Ensure grouping.

  arspl = instr.split re_grp, -1
  return arspl if arspl.size <= 1  # n.b., Size is 0 for an empty string (only?).

  n_grouping = re_grp.match(instr).size  # The number of grouping - should be at least 2, including $&.
  return adjust_last_element(arspl) if n_grouping <= 2

  # Takes only the split main contents and delimeter
  arret = []
  arspl.each_with_index do |ec, ei|
    arret << ec if (1..2).include?( (ei + 1) % n_grouping )
  end
  adjust_last_element(arret) # => Array
end

Private Class Methods

add_grouping(rule_re) click to toggle source

This method encloses the given Regexp with '()'

@param rule_re [Regexp] @return [Regexp]

# File lib/plain_text/split.rb, line 92
def self.add_grouping(rule_re)
  Regexp.new '('+rule_re.source+')', rule_re.options
end
adjust_last_element(ary) click to toggle source

Utility

# File lib/plain_text/split.rb, line 98
def self.adjust_last_element(ary)
  ary.pop if ary[-1].empty?  # ary.size > 0 is guaranteed
  ary
end

Public Instance Methods

count_lines(**kwd) click to toggle source

Returns the number of lines.

@param linebreak: [String] \n etc (Default: $/). @return [Integer] always positive @see PlainText::Split#count_regexp

# File lib/plain_text/split.rb, line 166
def count_lines(**kwd)
  PlainText::Split.public_send(__method__, self, **kwd)
end
count_regexp(*rest, **kwd) click to toggle source

Count the number of matches to self that satisfy the given Regexp

If like_linenum option is specified, it is counted like the number of lines, namely the returned value is incremented from the number of matches by 1 unless the very last characters of the String is the last match. For example, if no matches are found, this still returns one.

Note if the String (self) is empty, this always returns 0.

The special option is with_if_end. If given true,

this returns Array<Integer, Boolean> instead of a simple Integer, with the first parameter being the Integer of the count as with the default like_linenum=false, and the second parameter gives true if the number is the same even if it was like_linenum=true, namely if the end of the String coincides with the last match, else false. (This parameter is introduced just to reduce the overhead of potentially calling this routine twice or user's making their own check.)

@param re_in [Regexp, String] If String, it is interpreted literally as in String#split. @param like_linenum: [Boolean] if true (Def: false), it counts like the line number. @param with_if_end: [Boolean] a special case (see the description). @return [Integer, Array<Integer, Boolean>] always positive @see PlainText::Split#count_regexp

# File lib/plain_text/split.rb, line 157
def count_regexp(*rest, **kwd)
  PlainText::Split.public_send(__method__, self, *rest, **kwd)
end
split_with_delimiter(*rest) click to toggle source

Split with the delimiter even when Regexp (or String) is given

Note the last empty component, if exists, is deleted in the returned Array. If the input string is empty, the returned Array is also empty, as in String#split.

@example Standard split (without grouping) : +s=“XQabXXcXQ”+

s.split(/X+Q?/)         #=> ["", "ab", "c"],
s.split(/X+Q?/, -1)     #=> ["", "ab", "c", ""],

@example Standard split (with grouping) : +s=“XQabXXcXQ”+

s.split(/X+(Q?)/, -1)   #=> ["", "Q", "ab", "", "c", "Q", ""],
s.split(/(X+(Q?))/, -1) #=> ["", "XQ", "Q", "ab", "XX", "", "c", "XQ", "Q", ""],

@example This method (when included in String (as Default)) : +s=“XQabXXcXQ”+

s.split_with_delimiter(/X+(Q?)/)
                        #=> ["", "XQ", "ab", "XX", "c", "XQ"]

@param re_in [Regexp, String] If String, it is interpreted literally as in String#split. @return [Array]

# File lib/plain_text/split.rb, line 128
def split_with_delimiter(*rest)
  PlainText::Split.public_send(__method__, self, *rest)
end