class Dates::DateExtractor

Public Instance Methods

extract_date(paragraph_text, date) click to toggle source

The $LAST_MATCH_INFO global is equivalent to Rexexp.last_match and returns a MatchData object. This can be used as an array, where indices 1 - n are the matched backreferences of the last successful match @param [String] paragraph_text a paragraph from a DSL text file @param [Time] date Date of this paragraph. May be nil if not known.

This date is taken from the Date class instance variable of the
paragraph class.

@return [Array<String, Array, Time>] Array of values to be returned

String return value

'paragraph_text' the same paragraph that was passed to the function but without the matched date character if there were any.

Array return value

'time_array' array of 4 integer representing the hours and minutes of the from and to times

Time return value

'date' the date in (day month year) of this paragraph taken from the matched date_regex if there was one. Will be nil if there was no match and if the date passed to the function was also nil.

# File lib/docfolio/paragraph_modules/dates.rb, line 94
def extract_date(paragraph_text, date)
  time_array = []

  # if text contains a date match
  if date_regex =~ paragraph_text
    # $POSTMATCH (or $'), contains the characters after the match position
    paragraph_text = $POSTMATCH

    # strip whitespace if any remaining match or set to empty string
    # if no match. If there is just white space after the match then
    # this is truncated to an empty string
    paragraph_text.nil? ? paragraph_text = '' : paragraph_text.strip!

    # extracts the 'from' and 'to' times from the last match above. the
    # time_array contains from_hour, from_min, to_hour, to_min, the
    # date parameter is updated if the match found a new date
    time_array, date = date_from_globals($LAST_MATCH_INFO, date)
  end
  [paragraph_text, time_array, date]
end

Private Instance Methods

date_from_globals(glob_a, date) click to toggle source

returns a date from the 26 globals returned by date_regex @param [MatchData] glob_a the MatchData object return when the date_regex

was matched to the paragraph

@param [Time] date the date of the paragraph; may be nil if not known @return [Array] array of 4 integer representing the

hours and minutes of the from and to times

@return [Time] 'date' the date (day month year) of this paragraph

# File lib/docfolio/paragraph_modules/dates.rb, line 140
def date_from_globals(glob_a, date)
  from_hour = glob([1, 23], glob_a)
  from_min  = glob([2, 24], glob_a)
  to_hour   = glob([3, 25], glob_a)
  to_min    = glob([4, 26], glob_a)
  day       = glob([5, 8, 12, 14, 17, 21], glob_a)
  month     = glob([6, 9, 11, 15, 18, 20], glob_a)
  year      = glob([7, 10, 13, 16, 19, 22], glob_a)
  date = Time.at(DateFormatter.new.format_date("#{day}-#{month}-#{year}")) unless day.nil?
  [[from_hour, from_min, to_hour, to_min], date]
end
date_regex() click to toggle source

Returns a regular expression to be used to match dates and times of the paragraph. @return [Regex] a regular expression to use to match dates and times

in the paragraph
# File lib/docfolio/paragraph_modules/dates.rb, line 156
def date_regex
  dy      = /(?<day>\d{1,2})/
  mt      = /(?<month>\w+)/
  yr      = /(?<year>\d{2,4})/
  time    = /(?<hour>\d{1,2}):(?<min>\d{2})/
  period  = /#{time}( ?(?:-|–|to) ?#{time})?/
  date1   = %r{#{dy}/#{dy}/#{yr}}     # d/m/y
  date2   = /#{dy},? #{mt},? #{yr}/   # d Month Year
  date3   = /#{mt},? #{dy},? #{yr}/   # Month d Year
  date    = /#{date1}|#{date2}|#{date3}/
  /^(#{period} ?#{date}?|#{date} ?#{period}?)/
end
glob(i_a, glob_a) click to toggle source

Extracts a particular parameter from the MatchData object return when the paragraph was matched with the date regex. Treats the MatchData as an array, iterating through each index represented in the i_a array to find and return a value if there is one. @param [Array] i_a Array of integers representing positions to test in

array glob_a

@param [MatchData] glob_a Array of matched backreferences of the last

successful regular expression match

@return the first element in MatchData that is not nil. Returns

nil if there are no elements in MatchData at the indices in i_a that
are not nil.
# File lib/docfolio/paragraph_modules/dates.rb, line 128
def glob(i_a, glob_a)
  i_a.each { |n| return glob_a[n] unless glob_a[n].nil? }
  nil
end