class MorMor::FSA

@private

This class and its subclasses contains a loose simplified port of the whole `morfologik-fsa` package. Original source at: github.com/morfologik/morfologik-stemming/tree/master/morfologik-fsa/src/main/java/morfologik/fsa

NB: TBH, I don't always understand deeply what am I doing here. Just ported Java algorithms statement-by-statement, then rubyfied a bit and debugged in parallel with original package to make sure it produces the same result.

Code contains some of my comments, original implementations referred where appropriate. Also, in more straightforwardly ported code, original comments are left and marked with “OC:”.

Constants

Match
VERSIONS

LanguageTool seems to use CFSA2 and FSA5, so CFSA is not implemented.

Public Class Methods

read(path) click to toggle source
# File lib/mormor/fsa.rb, line 32
def read(path)
  io = File.open(path, 'rb')
  io.read(4) == '\\fsa' or fail ArgumentError, 'Invalid file header, probably not an FSA.'
  choose_impl(io.getbyte).new(io)
end

Private Class Methods

choose_impl(version_byte) click to toggle source
# File lib/mormor/fsa.rb, line 40
def choose_impl(version_byte)
  VERSIONS
    .fetch(version_byte) { fail ArgumentError 'Unsupported version byte, probably not FSA' }
    .tap { |name|
      constants.include?(name.to_sym) or
        fail ArgumentError "Unsupported version: #{name}"
    }
    .then(&method(:const_get))
end

Public Instance Methods

each_arc(from:) { |arc| ... } click to toggle source
# File lib/mormor/fsa.rb, line 59
def each_arc(from:)
  return to_enum(__method__, from: from) unless block_given?

  arc = first_arc(from)
  until arc.zero?
    yield arc
    arc = next_arc(arc)
  end
end
each_sequence(from: root_node, &block) click to toggle source
# File lib/mormor/fsa.rb, line 51
def each_sequence(from: root_node, &block)
  Enumerator.new(self, from).then { |e| block ? e.each(&block) : e }
end
find_arc(node, label) click to toggle source
# File lib/mormor/fsa.rb, line 69
def find_arc(node, label)
  each_arc(from: node).detect { |a| arc_label(a) == label } || 0
end
match(sequence, node = root_node) click to toggle source

Port of FSATraversal.java Method is left unsplit to leave original algorithm recognizable, hence rubocop:disable's

# File lib/mormor/fsa.rb, line 75
def match(sequence, node = root_node) # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity
  return Match.new(:no) if node.zero?

  sequence.each_with_index do |byte, i|
    a = find_arc(node, byte)

    case
    when a.zero?
      return i.zero? ? Match.new(:no, i, node) : Match.new(:automaton_has_prefix, i, node)
    when i + 1 == sequence.size && final_arc?(a)
      return Match.new(:exact, i, node)
    when terminal_arc?(a)
      return Match.new(:automaton_has_prefix, i + 1, node)
    else
      node = end_node(a)
    end
  end

  Match.new(:sequence_is_a_prefix, 0, node)
end
next_arc(arc) click to toggle source
# File lib/mormor/fsa.rb, line 55
def next_arc(arc)
  last_arc?(arc) ? 0 : skip_arc(arc)
end