class Natter::Parser

Public: The parser is the main workhorse, responsible for deriving the intent from an utterance.

Attributes

known_utterances[R]

Read access to the Hash containing known utterances.

rules[R]

Read access to the Hash containing known rules.

Public Class Methods

new() click to toggle source
# File lib/natter/parser.rb, line 10
def initialize
  @known_utterances = Hash.new # key = utterance, value = Intent
  @contractions = init_contractions # key = contraction, value = expansion
  @intent_cache = Hash.new # key = utterance, value = Intent
  @rules = Hash.new # key = rule regex pattern, value = Rule object
end

Public Instance Methods

add_rule(rule) click to toggle source

Public: Adds a regex-based Rule to the parser.

rule - The Natter::Rule to add.

# File lib/natter/parser.rb, line 20
def add_rule(rule)
  raise ArgumentError, "Expected Natter::Rule but got `#{rule}`" unless rule.is_a?(Rule)
  if @rules.has_key?(rule.pattern)
    raise ArgumentError, "Regex pattern already defined by " +\
    "#{@rules[rule.pattern].identifier}: #{rule.pattern}"
  end
  # Make sure that this rule's owning skill is capitalised
  rule.skill.capitalize!
  @rules[rule.pattern] = rule
end
add_rules(rules) click to toggle source

Public: Adds one or more regex-based Rules to the parser. A convenience method.

rules - Either a Natter::Rule or an array of Natter::Rules.

# File lib/natter/parser.rb, line 35
def add_rules(rules)
  if rules.kind_of?(Array)
    rules.each { |rule| add_rule(rule) }
  else
    add_rule(rules)
  end
end
add_utterance(example) click to toggle source

Public: Adds a pre-computed utterance/intent pair to the parser. Used when a specific utterance(s) match a predetermined intent. This saves overhead as there is no regex processing required. These utterances are evaluated before the regex rules. Multiple examples can be added at once. Adding an utterance that already exists will overwrite the old one.

example - A Hash where:

key   = A single utterance or array of utterances
value = Natter::Intent

Examples

add_utterance('hello' => Intent.new('greeting')) add_utterance(['what time is it', 'what is the time'] => Intent.new('currentTime')) add_utterance(

'night night' => Intent.new('goodnight'),
'lock the door' => Intent.new('lock')

)

Returns nothing.

# File lib/natter/parser.rb, line 64
def add_utterance(example)
  raise ArgumentError, "Expected {utterance => Intent} or {[utterances] => Intent}" unless example.is_a?(Hash)
  example.map do |utterance, intent|
    if utterance.kind_of?(Array)
      utterance.each { |phrase| @known_utterances[phrase] = intent }
    else
      @known_utterances[utterance] = intent
    end
  end
end
determine_confidences(intents) click to toggle source

Internal: Determines the confidence of each intent in the passed array and then sorts them based on the calculated confidence values. Basically, if we have more than one intent then whichever intent has the greatest number of entities is likely to be the best match.

intents - An array of Intent objects.

Returns a sorted (by confidence) array of Intent objects. Mutates original array.

# File lib/natter/parser.rb, line 132
def determine_confidences(intents)
  # Handle where there's only one matching intent
  if intents.length == 1
    intents[0].confidence = 1.0
    return intents
  end

  # First determine the total number of entities in any of the intents
  total = 0
  intents.each { |i| total += i.entities.length }

  if total == 0
    # Edge case: all matching intents contain no entities.
    # Assign equal confidence to all intents
    result = intents.map do |i|
      i.confidence = 1.0/intents.length
      i # return this intent from the map
    end
  else
    result = intents.map do |i|
      i.confidence = i.entities.length.to_f/total
      i # return this intent from the map
    end
  end

  # Sort the array by descending confidence values
  result.sort_by { |i| i.confidence }.reverse
end
expand_contractions(text) click to toggle source

Expand the contractions within this string.

Examples

t = “I'm hot” t.expand_contractions!

# => "I am hot"
# File lib/natter/parser.rb, line 318
def expand_contractions(text)
  result = ''
  text.strip.split(' ').each do |word|
    result = result + @contractions.fetch(word, word) + ' '
  end
  return result.strip
end
init_contractions() click to toggle source

Private: Initialise the @contractions Hash. Only needs doing once. OPTIMISE: Perhaps move these values to an editable text file?

# File lib/natter/parser.rb, line 223
def init_contractions
  {
    "that's" => "that is",
    "aren't" => "are not",
    "can't" => "can not",
    "could've" => "could have",
    "couldn't" => "could not",
    "didn't" => "did not",
    "doesn't" => "does not",
    "don't" => "do not",
    "dunno" => "do not know",
    "gonna" => "going to",
    "gotta" => "got to",
    "hadn't" => "had not",
    "hasn't" => "has not",
    "haven't" => "have not",
    "he'd" => "he had",
    "he'll" => "he will",
    "he's" => "he is",
    "how'd" => "how would",
    "how'll" => "how will",
    "how're" => "how are",
    "how's" => "how is",
    "i'd" => "i would",
    "i'll" => "i will",
    "i'm" => "i am",
    "i've" => "i have",
    "isn't" => "is not",
    "it'd" => "it would",
    "it'll" => "it will",
    "it's" => "it is",
    "mightn't" => "might not",
    "might've" => "might have",
    "mustn't" => "must not",
    "must've" => "must have",
    "ol'" => "old",
    "oughtn't" => "ought not",
    "shan't" => "shall not",
    "she'd" => "she would",
    "she'll" => "she will",
    "she's" => "she is",
    "should've" => "should have",
    "shouldn't" => "should not",
    "somebody's" => "somebody is",
    "someone'll" => "someone will",
    "someone's" => "someone is",
    "something'll" => "something will",
    "something's" => "something is",
    "that'll" => "that will",
    "that'd" => "that would",
    "there'd" => "there had",
    "there's" => "there is",
    "they'd" => "they would",
    "they'll" => "they will",
    "they're" => "they are",
    "they've" => "they have",
    "wasn't" => "was not",
    "we'd" => "we had",
    "we'll" => "we will",
    "we're" => "we are",
    "we've" => "we have",
    "weren't" => "were not",
    "what'd" => "what did",
    "what'll" => "what will",
    "what're" => "what are",
    "what's" => "what is",
    "what've" => "what have",
    "when's" => "when is",
    "where'd" => "where did",
    "where's" => "where is",
    "where've" => "where have",
    "who'd" => "who would",
    "who'll" => "who will",
    "who's" => "who is",
    "why'd" => "why did",
    "why're" => "why are",
    "why's" => "why is",
    "won't" => "will not",
    "won't've" => "will not have",
    "would've" => "would have",
    "wouldn't" => "would not",
    "you'd" => "you would",
    "you'll" => "you will",
    "you're" => "you are",
    "you've" => "you have"
  }
end
intent_from_match(rule, m) click to toggle source

Internal: Converts a positive regex match and returns an Intent object. Note that the confidence is set to 0 as it will be determined later.

rule - The Rule definining this intent. m - The positive regex match.

Returns Intent.

# File lib/natter/parser.rb, line 168
def intent_from_match(rule, m)
  if m.named_captures.empty?
    # No capture groups found. Double-check the rule doesn't need any entities
    if rule.entities.empty?
      return Intent.new(rule.name, rule.skill, 0)
    else
      # Expected at least one entity. This can't be a valid match then
      return nil
    end
  else
    # Found some entities. Check they match up with the rule
    intent = Intent.new(rule.name, rule.skill, 0)
    rule.entities.each do |entity|
      if m.named_captures.has_key?(entity.name)
        e = Entity.new(entity.name, entity.type, m.named_captures[entity.name].strip)
        intent.entities << e
      else
        # Found a named capture group that doesn't match an entity defined
        # in the rule
        return nil
      end
    end
    if intent.entities.length != m.named_captures.length
      # Found some entity matches but not all
      return nil
    else
      return intent
    end
  end
end
parse(text, use_cache = true) click to toggle source

Public: Analyse an utterance and return any matching intents.

utterance - The natural language string to analyse use_cache - If true then we will check a cache of previously returned

utterance/intent pairs to return rather than re-parsing.
(default: true)

Returns an Intent, an array of Intents or nil if the intent cannot be determined.

# File lib/natter/parser.rb, line 84
def parse(text, use_cache = true)
  raise ArgumentError, "Cannot parse thin air!" unless text.length > 0

  # Store the original string for later
  original = text

  # Tidy up the string for parsing
  utterance = purify(original)

  if @known_utterances.has_key?(utterance)
    return @known_utterances[utterance]
  end

  if use_cache && @intent_cache.has_key?(utterance)
    return @intent_cache[utterance]
  end

  intents = []
  @rules.each do |pattern, rule|
    m = utterance.match(rule.pattern)
    if m == nil
      next
    else
      intent = intent_from_match(rule, m)
      if intent then intents << intent end
    end
  end

  if intents.empty? then return nil end

  # Calculate the confidence of each intent
  intents = determine_confidences(intents)

  # Cache the matches
  @intent_cache[utterance] = intents

  return intents
end
purify(t) click to toggle source

Internal: Tidies up the passed string to remove unnecessary characters and replace ambiguous phrases such as contractions.

t - The string to purify.

Examples

str = “what're you doing?!” str = purify(str)

# => "what are you doing"
# File lib/natter/parser.rb, line 209
def purify(t)
  t = expand_contractions(t)
  t = strip_trailing_punctuation(t)
end
strip_trailing_punctuation(t) click to toggle source

Internal: Removes trailing '?' and '!' from the passed string.

t - The string from which to remove superfluous trailing punctuation.

# File lib/natter/parser.rb, line 217
def strip_trailing_punctuation(t)
  t.sub(/[?!]+\z/, '')
end