class EncodingEstimator::Distribution

Public Class Methods

new( language ) click to toggle source

Create a new distribution object for a given language

@param [EncodingEstimator::LanguageModel] language Language to load the distribution for

# File lib/encoding_estimator/distribution.rb, line 10
def initialize( language )

  @@distributions[ language.path ] ||= load_language language
  @distribution                      = @@distributions[ language.path ]
end

Public Instance Methods

evaluate( str, penalty ) click to toggle source

Calculate the likelihood of a string for the given language

@param [String] str Data to calculate the likelihood for @param [Float] penalty Threshold which defines when chars are weighted negative (-> calc score - thresh) @return [Float] Total likelihood

# File lib/encoding_estimator/distribution.rb, line 21
def evaluate( str, penalty )
  dist = @distribution
  sum = 0.0
  str.each_char { |c| sum += dist.fetch( c, 0.0 ) - penalty }
  sum
end

Private Instance Methods

load_language( language ) click to toggle source

Try to load the language from filesystem

@param [EncodingEstimator::LanguageModel] language 2-letter-symbol indicating the language to load @return [Hash] Hash representing the distribution for a language

# File lib/encoding_estimator/distribution.rb, line 35
def load_language( language )
  return {} unless language.valid?

  begin
    distribution = JSON.parse(
        File.read( language.path, encoding: 'utf-8' )
    )
  rescue Exception
    distribution = {}
  end

  distribution
end