class Natter::Parser
Public: The parser is the main workhorse, responsible for deriving the intent from an utterance.
Attributes
Read access to the Hash containing known utterances.
Read access to the Hash containing known rules.
Public Class Methods
# File lib/natter/parser.rb, line 10 def initialize @known_utterances = Hash.new # key = utterance, value = Intent @contractions = init_contractions # key = contraction, value = expansion @intent_cache = Hash.new # key = utterance, value = Intent @rules = Hash.new # key = rule regex pattern, value = Rule object end
Public Instance Methods
Public: Adds a regex-based Rule
to the parser.
rule - The Natter::Rule
to add.
# File lib/natter/parser.rb, line 20 def add_rule(rule) raise ArgumentError, "Expected Natter::Rule but got `#{rule}`" unless rule.is_a?(Rule) if @rules.has_key?(rule.pattern) raise ArgumentError, "Regex pattern already defined by " +\ "#{@rules[rule.pattern].identifier}: #{rule.pattern}" end # Make sure that this rule's owning skill is capitalised rule.skill.capitalize! @rules[rule.pattern] = rule end
Public: Adds one or more regex-based Rules to the parser. A convenience method.
rules - Either a Natter::Rule
or an array of Natter::Rules.
# File lib/natter/parser.rb, line 35 def add_rules(rules) if rules.kind_of?(Array) rules.each { |rule| add_rule(rule) } else add_rule(rules) end end
Public: Adds a pre-computed utterance/intent pair to the parser. Used when a specific utterance(s) match a predetermined intent. This saves overhead as there is no regex processing required. These utterances are evaluated before the regex rules. Multiple examples can be added at once. Adding an utterance that already exists will overwrite the old one.
example - A Hash where:
key = A single utterance or array of utterances value = Natter::Intent
Examples
add_utterance
('hello' => Intent.new
('greeting')) add_utterance
(['what time is it', 'what is the time'] => Intent.new
('currentTime')) add_utterance
(
'night night' => Intent.new('goodnight'), 'lock the door' => Intent.new('lock')
)
Returns nothing.
# File lib/natter/parser.rb, line 64 def add_utterance(example) raise ArgumentError, "Expected {utterance => Intent} or {[utterances] => Intent}" unless example.is_a?(Hash) example.map do |utterance, intent| if utterance.kind_of?(Array) utterance.each { |phrase| @known_utterances[phrase] = intent } else @known_utterances[utterance] = intent end end end
Internal: Determines the confidence of each intent in the passed array and then sorts them based on the calculated confidence values. Basically, if we have more than one intent then whichever intent has the greatest number of entities is likely to be the best match.
intents - An array of Intent
objects.
Returns a sorted (by confidence) array of Intent
objects. Mutates original array.
# File lib/natter/parser.rb, line 132 def determine_confidences(intents) # Handle where there's only one matching intent if intents.length == 1 intents[0].confidence = 1.0 return intents end # First determine the total number of entities in any of the intents total = 0 intents.each { |i| total += i.entities.length } if total == 0 # Edge case: all matching intents contain no entities. # Assign equal confidence to all intents result = intents.map do |i| i.confidence = 1.0/intents.length i # return this intent from the map end else result = intents.map do |i| i.confidence = i.entities.length.to_f/total i # return this intent from the map end end # Sort the array by descending confidence values result.sort_by { |i| i.confidence }.reverse end
Expand the contractions within this string.
Examples
t = “I'm hot” t.expand_contractions!
# => "I am hot"
# File lib/natter/parser.rb, line 318 def expand_contractions(text) result = '' text.strip.split(' ').each do |word| result = result + @contractions.fetch(word, word) + ' ' end return result.strip end
Private: Initialise the @contractions Hash. Only needs doing once. OPTIMISE: Perhaps move these values to an editable text file?
# File lib/natter/parser.rb, line 223 def init_contractions { "that's" => "that is", "aren't" => "are not", "can't" => "can not", "could've" => "could have", "couldn't" => "could not", "didn't" => "did not", "doesn't" => "does not", "don't" => "do not", "dunno" => "do not know", "gonna" => "going to", "gotta" => "got to", "hadn't" => "had not", "hasn't" => "has not", "haven't" => "have not", "he'd" => "he had", "he'll" => "he will", "he's" => "he is", "how'd" => "how would", "how'll" => "how will", "how're" => "how are", "how's" => "how is", "i'd" => "i would", "i'll" => "i will", "i'm" => "i am", "i've" => "i have", "isn't" => "is not", "it'd" => "it would", "it'll" => "it will", "it's" => "it is", "mightn't" => "might not", "might've" => "might have", "mustn't" => "must not", "must've" => "must have", "ol'" => "old", "oughtn't" => "ought not", "shan't" => "shall not", "she'd" => "she would", "she'll" => "she will", "she's" => "she is", "should've" => "should have", "shouldn't" => "should not", "somebody's" => "somebody is", "someone'll" => "someone will", "someone's" => "someone is", "something'll" => "something will", "something's" => "something is", "that'll" => "that will", "that'd" => "that would", "there'd" => "there had", "there's" => "there is", "they'd" => "they would", "they'll" => "they will", "they're" => "they are", "they've" => "they have", "wasn't" => "was not", "we'd" => "we had", "we'll" => "we will", "we're" => "we are", "we've" => "we have", "weren't" => "were not", "what'd" => "what did", "what'll" => "what will", "what're" => "what are", "what's" => "what is", "what've" => "what have", "when's" => "when is", "where'd" => "where did", "where's" => "where is", "where've" => "where have", "who'd" => "who would", "who'll" => "who will", "who's" => "who is", "why'd" => "why did", "why're" => "why are", "why's" => "why is", "won't" => "will not", "won't've" => "will not have", "would've" => "would have", "wouldn't" => "would not", "you'd" => "you would", "you'll" => "you will", "you're" => "you are", "you've" => "you have" } end
Internal: Converts a positive regex match and returns an Intent
object. Note that the confidence is set to 0 as it will be determined later.
rule - The Rule
definining this intent. m - The positive regex match.
Returns Intent
.
# File lib/natter/parser.rb, line 168 def intent_from_match(rule, m) if m.named_captures.empty? # No capture groups found. Double-check the rule doesn't need any entities if rule.entities.empty? return Intent.new(rule.name, rule.skill, 0) else # Expected at least one entity. This can't be a valid match then return nil end else # Found some entities. Check they match up with the rule intent = Intent.new(rule.name, rule.skill, 0) rule.entities.each do |entity| if m.named_captures.has_key?(entity.name) e = Entity.new(entity.name, entity.type, m.named_captures[entity.name].strip) intent.entities << e else # Found a named capture group that doesn't match an entity defined # in the rule return nil end end if intent.entities.length != m.named_captures.length # Found some entity matches but not all return nil else return intent end end end
Public: Analyse an utterance and return any matching intents.
utterance - The natural language string to analyse use_cache - If true then we will check a cache of previously returned
utterance/intent pairs to return rather than re-parsing. (default: true)
Returns an Intent
, an array of Intents or nil if the intent cannot be determined.
# File lib/natter/parser.rb, line 84 def parse(text, use_cache = true) raise ArgumentError, "Cannot parse thin air!" unless text.length > 0 # Store the original string for later original = text # Tidy up the string for parsing utterance = purify(original) if @known_utterances.has_key?(utterance) return @known_utterances[utterance] end if use_cache && @intent_cache.has_key?(utterance) return @intent_cache[utterance] end intents = [] @rules.each do |pattern, rule| m = utterance.match(rule.pattern) if m == nil next else intent = intent_from_match(rule, m) if intent then intents << intent end end end if intents.empty? then return nil end # Calculate the confidence of each intent intents = determine_confidences(intents) # Cache the matches @intent_cache[utterance] = intents return intents end
Internal: Tidies up the passed string to remove unnecessary characters and replace ambiguous phrases such as contractions.
t - The string to purify.
Examples
str = “what're you doing?!” str = purify(str)
# => "what are you doing"
# File lib/natter/parser.rb, line 209 def purify(t) t = expand_contractions(t) t = strip_trailing_punctuation(t) end
Internal: Removes trailing '?' and '!' from the passed string.
t - The string from which to remove superfluous trailing punctuation.
# File lib/natter/parser.rb, line 217 def strip_trailing_punctuation(t) t.sub(/[?!]+\z/, '') end