class Myasorubka::MSD
MSD
is a morphosyntactic descriptor model.
This representation, with the concrete applications which display and exemplify the attributes and values and provide their internal constraints and relationships, makes the proposal self-explanatory. Other groups can easily test the specifications on their language, simply by following the method of the applications. The possibility of incorporating idiosyncratic classes and distinctions after the common core features makes the proposal relatively adaptable and flexible, without compromising compatibility.
MSD
implementation and documentation are based on MULTEXT-East Morphosyntactic Specifications, Version 4: nl.ijs.si/ME/V4/msd/html/msd.html
You may use Myasorubka::MSD
either as parser and generator.
“`ruby msd = Myasorubka::MSD.new
(Myasorubka::MSD::Russian
) msd = :noun msd = :common msd = :plural msd = :locative msd.to_s # => “Nc-pl” “`
“`ruby msd = Myasorubka::MSD.new
(Myasorubka::MSD::Russian
, 'Vmps-snpfel') msd # => :verb msd # => :past msd # => nil msd.grammemes # => {:type=>:main, :vform=>:participle, …} “`
Constants
- EMPTY_DESCRIPTOR
Empty descriptor character.
Attributes
Public Class Methods
Creates a new morphosyntactic descriptor model instance. Please specify a `language` module with defined `CATEGORIES`.
Optionally, you can parse MSD
string that is passed as `msd` argument.
@param language [Myasorubka::MSD::Language] a language to use. @param msd [String] a String to initialize new MSD
.
# File lib/myasorubka/msd.rb, line 63 def initialize(language, msd = '') @language, @pos, @grammemes = language, nil, {} unless language.const_defined? 'CATEGORIES' raise ArgumentError, 'given language has no morphosyntactic descriptions' end parse! msd if msd && !msd.empty? end
Public Instance Methods
@private
# File lib/myasorubka/msd.rb, line 104 def <=> other to_s <=> other.to_s end
@private
# File lib/myasorubka/msd.rb, line 109 def == other to_s == other.to_s end
Retrieves the morphosyntactic descriptor corresponding to the `key` object. If not, returns `nil`.
@param key [Symbol] a key to look at. @return [Symbol] a value of `key`.
# File lib/myasorubka/msd.rb, line 80 def [] key return pos if :pos == key grammemes[key] end
Assignes the morphosyntactic descriptor given by `value` with the key given by `key` object.
@param key [Symbol] a key to be set. @param value [Symbol] a value to be assigned. @return [Symbol] the assigned value.
# File lib/myasorubka/msd.rb, line 92 def []= key, value return @pos = value if :pos == key raise InvalidDescriptor, 'category is not set yet' unless pos grammemes[key] = value end
@private
# File lib/myasorubka/msd.rb, line 99 def inspect '#<%s msd=%s>' % [language.name, to_s.inspect] end
Merges grammemes that are stored in `hash` into the MSD
grammemes.
@param hash [Hash<Symbol, Symbol>] a hash to be processed. @return [MSD] self.
# File lib/myasorubka/msd.rb, line 140 def merge! hash hash.each do |key, value| self[key.to_sym] = value.to_sym end self end
Drop every attribute that does not appear in the category.
@return [MSD] self.
# File lib/myasorubka/msd.rb, line 192 def prune! unless category = language::CATEGORIES[pos] self.pos = nil grammemes.clear return self end attributes = category[:attrs] grammemes.reject! do |attribute, value| if index = attributes.index { |name, _| name == attribute } _, values = attributes[index] !values[value] else true end end self end
Generates Regexp from the MSD
that is useful to perform database queries.
“`ruby msd = Myasorubka::MSD.new
(Myasorubka::MSD::Russian
, 'Vm') r = msd.to_regexp # => /^Vm.*$/ 'Vmp' =~ r # 0 'Nc-pl' =~ r # nil “`
@return [Regexp] the correspondent regular expression.
# File lib/myasorubka/msd.rb, line 125 def to_regexp Regexp.new([ '^', self.to_s.gsub(EMPTY_DESCRIPTOR, '.'), '.*', '$' ].join) end
@private
# File lib/myasorubka/msd.rb, line 149 def to_s return '' unless pos unless category = language::CATEGORIES[pos] raise InvalidDescriptor, "category is nil" end attributes = category[:attrs] msd = [category[:code]] grammemes.each do |attribute, value| next unless value unless index = attributes.index { |name, _| name == attribute } raise InvalidDescriptor, 'no such attribute "%s" of category "%s"' % [attribute, pos] end _, values = attributes[index] unless attribute_value = values[value] raise InvalidDescriptor, 'no such value "%s" for attribute "%s" of category "%s"' % [value, attribute, pos] end msd[index + 1] = attribute_value end msd.map { |e| e || EMPTY_DESCRIPTOR }.join end
Protected Instance Methods
@private
# File lib/myasorubka/msd.rb, line 215 def parse! msd_line msd = msd_line.chars.to_a category_code = msd.shift @pos, category = language::CATEGORIES.find do |name, candidate| candidate[:code] == category_code end raise InvalidDescriptor, msd_line unless @pos attrs = category[:attrs] msd.each_with_index do |value_code, i| attr_name, values = attrs[i] raise InvalidDescriptor, msd_line unless attr_name next if :blank == attr_name next if EMPTY_DESCRIPTOR == value_code attribute = values.find { |name, code| code == value_code } raise InvalidDescriptor, msd_line unless attribute self[attr_name] = attribute.first end end