module Kanjidic
Public Class Methods
Return a hash containing the informations about non dictionary codes
Modifying the return value will change the behaviour of the module. See implementation for details
# File lib/kanjidic.rb, line 268 def self.additional_codes @@additional_codes end
Return a hash of all symbols used in the datastructure, associated with a description string
The hash is build from the values returned by Kanjidic::dictionaries
, Kanjidic::additional_codes
and Kanjidic::uncoded_symbols
. Modifying it will not affect the behaviour of the module.
The hash is cached, reload can be forced by passing true to the function.
# File lib/kanjidic.rb, line 280 def self.all_symbols reload = false return @@all_symbols if @@all_symbols and !reload coded_symboles.merge(uncoded_symbols) end
Parse a Kanjidic
file
Parse the file at the location given in argument and return a data structure representing it
# File lib/kanjidic.rb, line 160 def self.build filename, jis File.open(filename) do |f| result = [] f.each do |l| if r = parse(l, jis) result << r end end result end end
Close the Kanji dictionary
The Kanjidic
is a big file, resulting in a big structure in memory.
Use this function if you need to close it
# File lib/kanjidic.rb, line 145 def self.close @@dic = nil GC.start end
Returns a hash of all symbols and their String representations
# File lib/kanjidic.rb, line 291 def self.coded_symboles dictionaries.to_a.map { |e| sym, arr = *e [ sym, arr[1] ] }.to_h. merge(additional_codes.to_a.map { |e| sym, arr = *e [ sym, arr[1] ] }.to_h) end
Return a hash of all the informations that will be used when building the dictionary
The Hash is build from the values returned by Kanjidic::dictionaries
and Kanjidic::additional_codes
and cached for further use.
The parameter in a boolean indicating whether the value should be fetched from the cache or rebuild (default to false: from cache)
# File lib/kanjidic.rb, line 244 def self.codes reload = false return @@codes if @@codes and !reload @@codes = dictionaries.to_a.map { |e| sym, arr = *e [ arr[0], ->(s, v, _) { { dictionaries: { sym => s + v } } } ] }.to_h. merge(additional_codes.to_a.map { |e| sym, arr = *e [ arr[0], ->(s, v, _) { { sym => s + v } } ] }.to_h).merge(special_codes) end
Return a hash containing all the informations about dictionary codes
Modifying the return value will change the behaviour of the module. See implementation for details
# File lib/kanjidic.rb, line 260 def self.dictionaries @@dictionaries end
Expand the Kanji dictionary
Load a file, parse it and add its informations to an existing in-memory dictionary
# File lib/kanjidic.rb, line 136 def self.expand filename, jis @@dic.concat build(filename, jis) end
Turns a Kanjidic
entry into an easy to read string
# File lib/kanjidic.rb, line 334 def self.format e, opt = {} if e.is_a? Array e.map { |el| format el, opt }.join("\n") elsif e.is_a? Hash opt = { character: 0 }.merge(opt) ret = "" opt.sort_by { |_, value| value ? value : 0 }.to_h.each { |key, visible| ret += _to_s(key, e[key]) if visible and e[key] } e.each { |k,v| ret += _to_s k, v unless opt.has_key?(k) } ret else raise ArgumentError, "Invalid parameter #{e}" end end
Forward anything not specificaly defined to the dictionary array if it is loaded
# File lib/kanjidic.rb, line 317 def self.method_missing sym, *args, &blck raise NoMethodError, "No method named #{sym} for Kanjidic#{" (try loading the dictionary with Kanjidic::open first)" if [].respond_to?(sym)}" unless @@dic @@dic.send sym, *args, &blck end
Load the Kanji dictionary
Load a file at the location given in argument in the KANJIDIC format and parse it into a data structure in memory.
Raise an exception if a file has already been loaded. See also Kanjidic::close
, Kanjidic::expand
# File lib/kanjidic.rb, line 128 def self.open filename, jis raise "Kanjidic already open (use Kanjidic::close first if you want to reload it, or Kanjidic::expand if you want to extend it)" if @@dic @@dic = build(filename, jis) end
Refer to the Kanjidic
homepage for details about the accepted structure of the string.
# File lib/kanjidic.rb, line 179 def self.parse line, jis return nil if line =~ /^[[:ascii:]]/ #Anything that doesn't start with a (supposedly) kanji is treated as a comment elements = line.scan(/{[^}]+}|\S+/) kanji = { character: elements.shift, jis_code: jis.to_s + elements.shift, dictionaries: {} } kanji.extend self kana = :reading elements.each do |e| # We'll only consider the first match, because reasons # (namely a well formed file should never yield more than 1 match array) matches = e.scan(parser)[0] unless matches _insert kanji, { undefined: e } else matches.compact! case matches.length when 1 # It's a reading, see Kanjidic::parser _insert kanji, { kana => matches[0] } when 2 # It's a meaning, see Kanjidic::parser m = matches[1] (m == "(kokuji)") ? kanji[:kokuji] = true : _insert(kanji, { meanings: m }) when 3 # It's a code, see Kanjidic::parser code, subcode, value = *matches _insert kanji, codes[code].call(subcode, value, ->(n) { kana = n }) else raise "Unhandled case" end end end kanji end
Builds a Regexp for line parsing
Builds a Regexp based on the informations available in the @@dictionaries variables.
Takes a boolean parameter to indicate whether the regexp should be constructed from scratches as opposed to retrieved from a cached value, false by default (returns the cache).
The resulting regexp will return matches as follow:
3 groups (code, sub code, value) if the element is code based,
2 groups (“{”, content) if it is a bracket delimited string,
1 group (content) if it is a string of japanese characters
# File lib/kanjidic.rb, line 224 def self.parser reload = false return @@parser if @@parser and !reload # It's gonna get ugly so here's the reasoning: take all the codes and check for them, # then take the remaining informations and refer it for later # First fetch the dictionary codes and assemble them in a A|B|DR|... fashion dic_codes = codes.keys.join("|") # Build the actual regexp. # The format is dic_code + optionaly 1 or 2 uppercase letters + kanji_code # OR {text with spaces} OR <japanese characters> @@parser = /(#{dic_codes})([A-Z]{0,2})(.+)|({)(.*)}|(\W+)/ end
Basic search tool matching any entry in the dictionary that matches the conditions given in parameter
Example:
search pronunciation: "あ", jlpt: '1'
# File lib/kanjidic.rb, line 354 def self.search h = {} raise "Load the dictionary before searching" unless @@dic @@dic.select do |kanji| h.inject(true) do |acc, a| key, value = a acc && _include?(value, kanji[key]) end end end
Return a hash of all the special codes and associated Procs
# File lib/kanjidic.rb, line 310 def self.special_codes @@special_codes end
Alias for Kanjidic::all_symbols
# File lib/kanjidic.rb, line 286 def self.symbols all_symbols end
Return a hash of all symboles not associated with a letter code.
The values are the description strings
# File lib/kanjidic.rb, line 305 def self.uncoded_symbols @@uncoded end
Private Class Methods
# File lib/kanjidic.rb, line 409 def self._include? value, target case target when Hash value.is_a?(Hash) ? (value.to_a.inject(true) { |acc, a| k, v = a acc && (target[k] == v) }) : false when Array value.is_a?(Array) ? (value.inject(true) { |acc, e| acc && target.include?(e) }) : target.include?(value) else value == target end end
Insert values in a hash depending on the previous content of the hash
Essentially a deep_merge implementation..
# File lib/kanjidic.rb, line 367 def self._insert hash, dic dic.each do |key, value| t = hash[key] # If the key doesn't exist, insert if t.nil? hash[key] = value # If the key exist and its value is an array, add to it elsif t.is_a?(Array) hash[key] << value # If the key exist and its value is a hash, merge them following the rules of this function elsif t.is_a?(Hash) _insert hash[key], value # If the key exists and its value is anything else, build an array to contain the previous value and # the new one else hash[key] = [hash[key], value] end end end
# File lib/kanjidic.rb, line 401 def self._resolve key, value, resolve return "" unless open? and resolve r = Kanjidic.find { |e| (e[key] == value) || (e[:dictionaries][key] == value) } r ? " (#{r[:character]})" : "" end
# File lib/kanjidic.rb, line 387 def self._to_s key, value, nesting = 1, resolve = false resolve = (resolve || key == :crossreference) ret = "#{all_symbols[key] || key}:" if value.is_a? Hash ret += "\n" value.each { |k, v| ret += " " * 2 * nesting + _to_s(k, v, nesting + 1, resolve) } elsif value.is_a? Array ret += " " + value.map{ |e| e.to_s + _resolve(key, e, resolve) }.join(", ") + "\n" else ret += " #{value}#{_resolve(key, value, resolve)}\n" end ret end
Public Instance Methods
# File lib/kanjidic.rb, line 323 def to_s Kanjidic::format self, character: 0, reading: 1, name_reading: 2, radical_name: 3, meanings: 4, dictionaries: false end