class MarkovChains::Dictionary

Attributes

order[R]

Public Class Methods

new(text, order = 1) click to toggle source

Initialized the dictionary with a text source.

@example Create a new dictionary of order 1

MarkovChains::Dictionary.new(string)

@example Create a new dictinary of order 2

MarkovChains::Dictionary.new(string, 2)

@param [String] the text source @param [int] the order of the dictionary. The order is the “memory” of the dictionary, meaning that an order <n> dictionary will consider <n> words to generate the next one. Order of 1 or 2 are typical. More than 3 and the generated sentences will be the same as the source.

# File lib/markov_chains/dictionary.rb, line 15
def initialize(text, order = 1)
  @order = order
  @words_for = Hash.new
  @start_words = Array.new
  
  # Standarize input text
  text.delete! "\n"
  
  # Process each sentence
  
  # <sentences> has format sentence+terminator:
  #   ["sent1", "term1", "sent2", "term2", ...]
  seps = /([.!?]+)/
  sentences = text.split seps
  sentences.each_slice(2) { |s,t| process_sentence(s.strip,t) }
end

Public Instance Methods

get(words) click to toggle source

Returns a word based on the likelihood of it appearing after the input array of words

@example Get a word likely to appear next to the word 'It'

get(['It'])           # => 'has'

@example Get a word likely to appear next to the words 'It has been' (with a dictionary of order 2)

get(['It has'])  # => 'been'

@param [[String]] array of words for which we want a possible next word @return [String] word that is likely to follow the input

# File lib/markov_chains/dictionary.rb, line 42
def get(words)
  (@words_for[words] || []).sample
end
get_start_words() click to toggle source

Returns a list of words beginning a sentence seen in the source

@example Get a start words

get_start_word    # => ['It', 'has']

@return [[String]] array of words that could start a sentence

# File lib/markov_chains/dictionary.rb, line 53
def get_start_words
  @start_words.sample
end

Private Instance Methods

process_sentence(sentence, terminator) click to toggle source

Processes a single sentence with terminator

@example Process a sentence

process_sentence("It is sunny today", "!")

@param [String] sentence to process @param [Character] sentence terminator

# File lib/markov_chains/dictionary.rb, line 67
def process_sentence(sentence, terminator)
  # Consider phrases/words/clauses separators when splitting
  seps = "([,;:])"

  # Split <sentence> into words
  words = sentence.gsub(/[^#{seps}\w'\s]/, "").gsub(/(#{seps})\s+/, '\1').split(/\s+|#{seps}/)
  words << terminator
  
  # Add <@order> start words to the list
  @start_words << words[0, @order]
  
  # Add the words to the frequency hash <words_for>
  until words.size < @order + 1 do
    (@words_for[words[0, @order]] ||= []) << words[@order]
    words.shift
  end
end