module Kanjidic

Public Class Methods

additional_codes() click to toggle source

Return a hash containing the informations about non dictionary codes

Modifying the return value will change the behaviour of the module. See implementation for details

# File lib/kanjidic.rb, line 268
def self.additional_codes
        @@additional_codes
end
all_symbols(reload = false) click to toggle source

Return a hash of all symbols used in the datastructure, associated with a description string

The hash is build from the values returned by Kanjidic::dictionaries, Kanjidic::additional_codes and Kanjidic::uncoded_symbols. Modifying it will not affect the behaviour of the module.

The hash is cached, reload can be forced by passing true to the function.

# File lib/kanjidic.rb, line 280
def self.all_symbols reload = false
        return @@all_symbols if @@all_symbols and !reload
        coded_symboles.merge(uncoded_symbols)
end
build(filename, jis) click to toggle source

Parse a Kanjidic file

Parse the file at the location given in argument and return a data structure representing it

# File lib/kanjidic.rb, line 160
def self.build filename, jis
        File.open(filename) do |f|
                result = []
                f.each do |l|
                        if r = parse(l, jis)
                                result << r
                        end
                end
                result
        end
end
close() click to toggle source

Close the Kanji dictionary

The Kanjidic is a big file, resulting in a big structure in memory.

Use this function if you need to close it

# File lib/kanjidic.rb, line 145
def self.close
        @@dic = nil
        GC.start
end
coded_symboles() click to toggle source

Returns a hash of all symbols and their String representations

# File lib/kanjidic.rb, line 291
def self.coded_symboles
        dictionaries.to_a.map { |e|
                sym, arr = *e
                [ sym, arr[1] ]
        }.to_h.
        merge(additional_codes.to_a.map { |e|
                sym, arr = *e
                [ sym, arr[1] ]
        }.to_h)
end
codes(reload = false) click to toggle source

Return a hash of all the informations that will be used when building the dictionary

The Hash is build from the values returned by Kanjidic::dictionaries and Kanjidic::additional_codes and cached for further use.

The parameter in a boolean indicating whether the value should be fetched from the cache or rebuild (default to false: from cache)

# File lib/kanjidic.rb, line 244
def self.codes reload = false
        return @@codes if @@codes and !reload
        @@codes = dictionaries.to_a.map { |e|
                sym, arr = *e
                [ arr[0], ->(s, v, _) { { dictionaries: { sym => s + v } } } ]
        }.to_h.
        merge(additional_codes.to_a.map { |e|
                sym, arr = *e
                [ arr[0], ->(s, v, _) { { sym => s + v } } ]
        }.to_h).merge(special_codes)
end
dictionaries() click to toggle source

Return a hash containing all the informations about dictionary codes

Modifying the return value will change the behaviour of the module. See implementation for details

# File lib/kanjidic.rb, line 260
def self.dictionaries
        @@dictionaries
end
expand(filename, jis) click to toggle source

Expand the Kanji dictionary

Load a file, parse it and add its informations to an existing in-memory dictionary

# File lib/kanjidic.rb, line 136
def self.expand filename, jis
        @@dic.concat build(filename, jis)
end
format(e, opt = {}) click to toggle source

Turns a Kanjidic entry into an easy to read string

# File lib/kanjidic.rb, line 334
def self.format e, opt = {}
        if e.is_a? Array
                e.map { |el| format el, opt }.join("\n")
        elsif e.is_a? Hash
                opt = { character: 0 }.merge(opt)
                ret = ""
                opt.sort_by { |_, value| value ? value : 0 }.to_h.each { |key, visible| ret += _to_s(key, e[key]) if visible and e[key] }
                e.each { |k,v| ret += _to_s k, v unless opt.has_key?(k) }
                ret
        else
                raise ArgumentError, "Invalid parameter #{e}"
        end
end
method_missing(sym, *args, &blck) click to toggle source

Forward anything not specificaly defined to the dictionary array if it is loaded

# File lib/kanjidic.rb, line 317
def self.method_missing sym, *args, &blck
        raise NoMethodError,
                "No method named #{sym} for Kanjidic#{" (try loading the dictionary with Kanjidic::open first)" if [].respond_to?(sym)}" unless @@dic
        @@dic.send sym, *args, &blck
end
open(filename, jis) click to toggle source

Load the Kanji dictionary

Load a file at the location given in argument in the KANJIDIC format and parse it into a data structure in memory.

Raise an exception if a file has already been loaded. See also Kanjidic::close, Kanjidic::expand

# File lib/kanjidic.rb, line 128
def self.open filename, jis
        raise "Kanjidic already open (use Kanjidic::close first if you want to reload it, or Kanjidic::expand if you want to extend it)" if @@dic
        @@dic = build(filename, jis)
end
open?() click to toggle source

Checks whether the Kanjidic is loaded

Returns true if a Kanjidic is available to use through the Kanjidic module interface, false otherwise.

# File lib/kanjidic.rb, line 153
def self.open?
        !!@@dic
end
parse(line, jis) click to toggle source

Refer to the Kanjidic homepage for details about the accepted structure of the string.

# File lib/kanjidic.rb, line 179
def self.parse line, jis
        return nil if line =~ /^[[:ascii:]]/ #Anything that doesn't start with a (supposedly) kanji is treated as a comment
        elements = line.scan(/{[^}]+}|\S+/)
        kanji = { character: elements.shift, jis_code: jis.to_s + elements.shift, dictionaries: {} }
        kanji.extend self
        kana = :reading
        elements.each do |e|
                # We'll only consider the first match, because reasons
                # (namely a well formed file should never yield more than 1 match array)
                matches = e.scan(parser)[0]
                unless matches
                        _insert kanji, { undefined: e }
                else
                        matches.compact!
                        case matches.length
                        when 1 # It's a reading, see Kanjidic::parser
                                _insert kanji, { kana => matches[0] }
                        when 2 # It's a meaning, see Kanjidic::parser
                                m = matches[1]
                                (m == "(kokuji)") ? kanji[:kokuji] = true : _insert(kanji, { meanings: m })
                        when 3 # It's a code, see Kanjidic::parser
                                code, subcode, value = *matches
                                _insert kanji, codes[code].call(subcode, value, ->(n) { kana = n })
                        else raise "Unhandled case"
                        end
                end
        end
        kanji
end
parser(reload = false) click to toggle source

Builds a Regexp for line parsing

Builds a Regexp based on the informations available in the @@dictionaries variables.

Takes a boolean parameter to indicate whether the regexp should be constructed from scratches as opposed to retrieved from a cached value, false by default (returns the cache).

The resulting regexp will return matches as follow:

3 groups (code, sub code, value) if the element is code based,

2 groups (“{”, content) if it is a bracket delimited string,

1 group (content) if it is a string of japanese characters

# File lib/kanjidic.rb, line 224
def self.parser reload = false
        return @@parser if @@parser and !reload
        # It's gonna get ugly so here's the reasoning: take all the codes and check for them,
        # then take the remaining informations and refer it for later

        # First fetch the dictionary codes and assemble them in a A|B|DR|... fashion
        dic_codes = codes.keys.join("|")
        # Build the actual regexp.
        # The format is dic_code + optionaly 1 or 2 uppercase letters + kanji_code
        # OR {text with spaces} OR <japanese characters>
        @@parser = /(#{dic_codes})([A-Z]{0,2})(.+)|({)(.*)}|(\W+)/
end
special_codes() click to toggle source

Return a hash of all the special codes and associated Procs

# File lib/kanjidic.rb, line 310
def self.special_codes
        @@special_codes
end
symbols() click to toggle source

Alias for Kanjidic::all_symbols

# File lib/kanjidic.rb, line 286
def self.symbols
        all_symbols
end
uncoded_symbols() click to toggle source

Return a hash of all symboles not associated with a letter code.

The values are the description strings

# File lib/kanjidic.rb, line 305
def self.uncoded_symbols
        @@uncoded
end

Private Class Methods

_include?(value, target) click to toggle source
# File lib/kanjidic.rb, line 409
                     def self._include? value, target
        case target
        when Hash
                value.is_a?(Hash) ? 
                        (value.to_a.inject(true) { |acc, a|
                        k, v = a
                        acc && (target[k] == v)
                }) : false
        when Array
                value.is_a?(Array) ? 
                        (value.inject(true) { |acc, e| 
                        acc && target.include?(e)
                }) : target.include?(value)

        else
                value == target
        end
end
_insert(hash, dic) click to toggle source

Insert values in a hash depending on the previous content of the hash

Essentially a deep_merge implementation..

# File lib/kanjidic.rb, line 367
                     def self._insert hash, dic
        dic.each do |key, value|
                t = hash[key]
                # If the key doesn't exist, insert
                if t.nil?
                        hash[key] = value
                        # If the key exist and its value is an array, add to it
                elsif t.is_a?(Array)
                        hash[key] << value
                        # If the key exist  and its value is a hash, merge them following the rules of this function
                elsif t.is_a?(Hash)
                        _insert hash[key], value
                        # If the key exists and its value is anything else, build an array to contain the previous value and
                        # the new one
                else
                hash[key] = [hash[key], value]
                end
        end
end
_resolve(key, value, resolve) click to toggle source
# File lib/kanjidic.rb, line 401
                     def self._resolve key, value, resolve
        return "" unless open? and resolve
        r = Kanjidic.find { |e|
                (e[key] == value) || (e[:dictionaries][key] == value)
        }
        r ? " (#{r[:character]})" : ""
end
_to_s(key, value, nesting = 1, resolve = false) click to toggle source
# File lib/kanjidic.rb, line 387
                     def self._to_s key, value, nesting = 1, resolve = false
        resolve = (resolve || key == :crossreference)
        ret = "#{all_symbols[key] || key}:"
        if value.is_a? Hash
                ret += "\n"
                value.each { |k, v| ret += " " * 2 * nesting + _to_s(k, v, nesting + 1, resolve) }
        elsif value.is_a? Array
                ret += " " + value.map{ |e| e.to_s + _resolve(key, e, resolve) }.join(", ") +  "\n"
        else
                ret += " #{value}#{_resolve(key, value, resolve)}\n"
        end
        ret
end

Public Instance Methods

to_s() click to toggle source
# File lib/kanjidic.rb, line 323
def to_s
        Kanjidic::format self,
                character: 0,
                reading: 1,
                name_reading: 2,
                radical_name: 3,
                meanings: 4,
                dictionaries: false
end