class Sanzang::TranslationTable
A translation table encapsulates a set of rules for translating with the Sanzang system.
Attributes
The records for the translation table, as an array
Original encoding when the table was read
Public Class Methods
The translation table file format is summarized as follows:
-
Each line of text is a record for a translation rule.
-
Fields in the record are separated by the “|” character.
-
The first field contains the term in the source language.
-
Subsequent fields are equivalent terms in destination languages.
-
The number of columns must be consistent for the entire table.
The rules passed in here may either be a file descriptor or a string.
# File lib/sanzang/translation_table.rb, line 43 def initialize(rules) contents = rules.kind_of?(String) ? rules : rules.read @source_encoding = contents.encoding contents.encode!(Encoding::UTF_8) if contents =~ /~\||\|~|\| / # If there is any old formatting... contents.gsub!(/~\||\|~/, "") # Rm old style "~|" and "|~" contents.gsub!(/^\s+|\s+$/, "") # Rm WS around lines contents.gsub!(/\s*\|\s*/, "|") # Rm WS around delimiters end @records = contents.strip.split("\n").collect {|r| r.strip.split("|") } @sorted = false check_dims #sort! end
Public Instance Methods
Retrieve a record by its numeric index.
# File lib/sanzang/translation_table.rb, line 62 def [](index) @records[index] end
Check the basic dimensions of the translation table
# File lib/sanzang/translation_table.rb, line 68 def check_dims if @records.size < 1 raise "Table must have at least 1 row" elsif records[0].size < 2 raise "Table must have at least 2 columns" end @records.each do |r| if r.size != width raise "Column mismatch: Line #{i + 1}" end end end
The text encoding used internally for all translation table data
# File lib/sanzang/translation_table.rb, line 97 def encoding Encoding::UTF_8 end
Find a record by the source language term (first column).
# File lib/sanzang/translation_table.rb, line 103 def find(term) @records.find {|rec| rec[0] == term } end
The number of records in the table
# File lib/sanzang/translation_table.rb, line 149 def length @records.length end
Merge another table into this one. If the same source term exists in both tables, then the record from the other table will be used instead. Note: after a merge, the resulting table is unsorted.
# File lib/sanzang/translation_table.rb, line 129 def merge!(tab2) if tab2.width != width raise "Table widths must match when merging tables" end h1 = to_h tab2.records.each do |rec| h1[rec[0]] = rec end @records = h1.values @sorted = false end
Reverse sort all records by length
# File lib/sanzang/translation_table.rb, line 89 def sort! @records.sort! {|x,y| y[0].size <=> x[0].size } @sorted = true nil end
Check if the table records are sorted
# File lib/sanzang/translation_table.rb, line 83 def sorted? @sorted end
Return a CSV formatted string
# File lib/sanzang/translation_table.rb, line 143 def to_csv @records.map {|r| r.join("|") }.join("\n") end
Convert to a hash. The original records are the values.
For example: “A” => [“A”, “B”, “C”]
# File lib/sanzang/translation_table.rb, line 111 def to_h h = Hash.new @records.each {|rec| h[rec[0]] = rec if not h[rec[0]] } h end
Only include unique source values. The resulting table is unsorted.
# File lib/sanzang/translation_table.rb, line 119 def uniq! @records = to_h.values @sorted = false nil end
The number of columns in the table
# File lib/sanzang/translation_table.rb, line 155 def width @records[0].length end