class Dates::DateExtractor
Public Instance Methods
The $LAST_MATCH_INFO global is equivalent to Rexexp.last_match and returns a MatchData object. This can be used as an array, where indices 1 - n are the matched backreferences of the last successful match @param [String] paragraph_text a paragraph from a DSL text file @param [Time] date Date of this paragraph. May be nil if not known.
This date is taken from the Date class instance variable of the paragraph class.
@return [Array<String, Array, Time>] Array of values to be returned
- String return value
-
'paragraph_text' the same paragraph that was passed to the function but without the matched date character if there were any.
- Array return value
-
'time_array' array of 4 integer representing the hours and minutes of the from and to times
- Time return value
-
'date' the date in (day month year) of this paragraph taken from the matched
date_regex
if there was one. Will be nil if there was no match and if the date passed to the function was also nil.
# File lib/docfolio/paragraph_modules/dates.rb, line 94 def extract_date(paragraph_text, date) time_array = [] # if text contains a date match if date_regex =~ paragraph_text # $POSTMATCH (or $'), contains the characters after the match position paragraph_text = $POSTMATCH # strip whitespace if any remaining match or set to empty string # if no match. If there is just white space after the match then # this is truncated to an empty string paragraph_text.nil? ? paragraph_text = '' : paragraph_text.strip! # extracts the 'from' and 'to' times from the last match above. the # time_array contains from_hour, from_min, to_hour, to_min, the # date parameter is updated if the match found a new date time_array, date = date_from_globals($LAST_MATCH_INFO, date) end [paragraph_text, time_array, date] end
Private Instance Methods
returns a date from the 26 globals returned by date_regex
@param [MatchData] glob_a the MatchData object return when the date_regex
was matched to the paragraph
@param [Time] date the date of the paragraph; may be nil if not known @return [Array] array of 4 integer representing the
hours and minutes of the from and to times
@return [Time] 'date' the date (day month year) of this paragraph
# File lib/docfolio/paragraph_modules/dates.rb, line 140 def date_from_globals(glob_a, date) from_hour = glob([1, 23], glob_a) from_min = glob([2, 24], glob_a) to_hour = glob([3, 25], glob_a) to_min = glob([4, 26], glob_a) day = glob([5, 8, 12, 14, 17, 21], glob_a) month = glob([6, 9, 11, 15, 18, 20], glob_a) year = glob([7, 10, 13, 16, 19, 22], glob_a) date = Time.at(DateFormatter.new.format_date("#{day}-#{month}-#{year}")) unless day.nil? [[from_hour, from_min, to_hour, to_min], date] end
Returns a regular expression to be used to match dates and times of the paragraph. @return [Regex] a regular expression to use to match dates and times
in the paragraph
# File lib/docfolio/paragraph_modules/dates.rb, line 156 def date_regex dy = /(?<day>\d{1,2})/ mt = /(?<month>\w+)/ yr = /(?<year>\d{2,4})/ time = /(?<hour>\d{1,2}):(?<min>\d{2})/ period = /#{time}( ?(?:-|–|to) ?#{time})?/ date1 = %r{#{dy}/#{dy}/#{yr}} # d/m/y date2 = /#{dy},? #{mt},? #{yr}/ # d Month Year date3 = /#{mt},? #{dy},? #{yr}/ # Month d Year date = /#{date1}|#{date2}|#{date3}/ /^(#{period} ?#{date}?|#{date} ?#{period}?)/ end
Extracts a particular parameter from the MatchData object return when the paragraph was matched with the date regex. Treats the MatchData as an array, iterating through each index represented in the i_a array to find and return a value if there is one. @param [Array] i_a Array of integers representing positions to test in
array glob_a
@param [MatchData] glob_a Array of matched backreferences of the last
successful regular expression match
@return the first element in MatchData that is not nil. Returns
nil if there are no elements in MatchData at the indices in i_a that are not nil.
# File lib/docfolio/paragraph_modules/dates.rb, line 128 def glob(i_a, glob_a) i_a.each { |n| return glob_a[n] unless glob_a[n].nil? } nil end