class EmailParser
Parses text and attempts to locate email
Constants
- GR_REGEX
Regex to find the emails, must have .com or something similar to match
- GR_REGEX_WITHOUT_DOT
Regex to find the emails, does .com or something similar to match
- GR_REGEX_WITH_AT
Regex to find the emails, must have .com or something similar to match and also checks for the word 'at' as '@'
- MR_REGEX
Regex to find emails for MapReduce, must have .com or something similar to match
- MR_REGEX_WITHOUT_DOT
Regex to find emails for MapReduce, does not have to have .com or something similar to match
- REPLACEMENTS
Map these occurences down to their constituent parts
- TEXT_MATCH
Matches a certain string of text allowed in emails
Public Instance Methods
Counts email occurrences within a block of text Note: Uses map reduce algorithm
# File lib/ramparts/parsers/email_parser.rb, line 10 def count_email_instances(text, options) raise ArgumentError, ARGUMENT_ERROR_TEXT unless text.is_a? String text = parse_email(text) email_instances(MR_ALGO, text, options).length end
Fins the occurrences of emails within a block of text and returns their positions
# File lib/ramparts/parsers/email_parser.rb, line 26 def find_email_instances(text, options) raise ArgumentError, ARGUMENT_ERROR_TEXT unless text.is_a? String text = text.downcase email_instances(GR_ALGO, text, options) end
Replaces the occurrences of email within the block of text with an insertable
# File lib/ramparts/parsers/email_parser.rb, line 18 def replace_email_instances(text, options, &block) raise ArgumentError, ARGUMENT_ERROR_TEXT unless text.is_a? String instances = find_email_instances(text, options) replace(text, instances.reverse!, &block) end
Private Instance Methods
# File lib/ramparts/parsers/email_parser.rb, line 66 def email_instances(algo, text, options) # Determines which algorithm to use regex = algo == MR_ALGO ? MR_REGEX : GR_REGEX regex_without_dot = algo == MR_ALGO ? MR_REGEX_WITHOUT_DOT : GR_REGEX_WITHOUT_DOT regex_with_at = GR_REGEX_WITH_AT instances = [] if options.fetch(:aggressive, false) temp_instances = scan(text, regex_without_dot, :email) # Since this is the aggressive option where '.com' or similar isn't needed # Check to make sure the last word of the string is a domain temp_instances.each do |instance| instances << instance if EMAIL_DOMAINS.any? { |domain| instance[:value].split('@')[1]&.include? domain } end elsif options.fetch(:check_for_at, false) instances = scan(text, regex_with_at, :email) else instances = scan(text, regex, :email) end instances end
Parses the email and maps down certain occurrences
# File lib/ramparts/parsers/email_parser.rb, line 62 def parse_email(text) text.downcase.gsub(/\ at\ |\(at\)|\ dot\ /, REPLACEMENTS).gsub(/[^\w\@\.\_\%\+\-]/, '$') end