class EmailParser

Parses text and attempts to locate email

Constants

GR_REGEX

Regex to find the emails, must have .com or something similar to match

GR_REGEX_WITHOUT_DOT

Regex to find the emails, does .com or something similar to match

GR_REGEX_WITH_AT

Regex to find the emails, must have .com or something similar to match and also checks for the word 'at' as '@'

MR_REGEX

Regex to find emails for MapReduce, must have .com or something similar to match

MR_REGEX_WITHOUT_DOT

Regex to find emails for MapReduce, does not have to have .com or something similar to match

REPLACEMENTS

Map these occurences down to their constituent parts

TEXT_MATCH

Matches a certain string of text allowed in emails

Public Instance Methods

count_email_instances(text, options) click to toggle source

Counts email occurrences within a block of text Note: Uses map reduce algorithm

# File lib/ramparts/parsers/email_parser.rb, line 10
def count_email_instances(text, options)
  raise ArgumentError, ARGUMENT_ERROR_TEXT unless text.is_a? String

  text = parse_email(text)
  email_instances(MR_ALGO, text, options).length
end
find_email_instances(text, options) click to toggle source

Fins the occurrences of emails within a block of text and returns their positions

# File lib/ramparts/parsers/email_parser.rb, line 26
def find_email_instances(text, options)
  raise ArgumentError, ARGUMENT_ERROR_TEXT unless text.is_a? String

  text = text.downcase
  email_instances(GR_ALGO, text, options)
end
replace_email_instances(text, options, &block) click to toggle source

Replaces the occurrences of email within the block of text with an insertable

# File lib/ramparts/parsers/email_parser.rb, line 18
def replace_email_instances(text, options, &block)
  raise ArgumentError, ARGUMENT_ERROR_TEXT unless text.is_a? String

  instances = find_email_instances(text, options)
  replace(text, instances.reverse!, &block)
end

Private Instance Methods

email_instances(algo, text, options) click to toggle source
# File lib/ramparts/parsers/email_parser.rb, line 66
def email_instances(algo, text, options)
  # Determines which algorithm to use
  regex = algo == MR_ALGO ? MR_REGEX : GR_REGEX
  regex_without_dot = algo == MR_ALGO ? MR_REGEX_WITHOUT_DOT : GR_REGEX_WITHOUT_DOT
  regex_with_at = GR_REGEX_WITH_AT

  instances = []
  if options.fetch(:aggressive, false)
    temp_instances = scan(text, regex_without_dot, :email)

    # Since this is the aggressive option where '.com' or similar isn't needed
    # Check to make sure the last word of the string is a domain
    temp_instances.each do |instance|
      instances << instance if EMAIL_DOMAINS.any? { |domain| instance[:value].split('@')[1]&.include? domain }
    end
  elsif options.fetch(:check_for_at, false)
    instances = scan(text, regex_with_at, :email)
  else
    instances = scan(text, regex, :email)
  end
  instances
end
parse_email(text) click to toggle source

Parses the email and maps down certain occurrences

# File lib/ramparts/parsers/email_parser.rb, line 62
def parse_email(text)
  text.downcase.gsub(/\ at\ |\(at\)|\ dot\ /, REPLACEMENTS).gsub(/[^\w\@\.\_\%\+\-]/, '$')
end