class EncodingEstimator::Detector
Class to perform an encoding detection on strings
Attributes
Public Class Methods
Create a new instance with a given configuration consisting of a list of conversions, languages and the number of processes.
@param [Array<EncodingEstimator::Conversion>] conversions Conversions to perform/test on the inputs. @param [Array<EncodingEstimator::LanguageModel>] languages Languages to consider when evaluating the input. Array
of two-letter-codes
@param [Float] penalty Base penalty subtracted from each char's score @param [Integer] num_processes
Number of processes the detection will run on -> true
multi-threading through the parallel gem
# File lib/encoding_estimator/detector.rb, line 80 def initialize( conversions, languages, penalty = 0.01, num_processes = nil ) @conversions = conversions @languages = languages @num_processes = num_processes @penalty = penalty end
Public Instance Methods
Detect the encoding using the current configuration given an input string
@param [String] str Input string the detection will be performed on
@return [EncodingEstimator::Detection] Result of the detection process
# File lib/encoding_estimator/detector.rb, line 92 def detect( str ) sums = {} results = (num_processes.nil? or !EncodingEstimator::ParallelSupport.supported?) ? detect_st( str, combinations ) : detect_mt( str, combinations ) results.each do |result| sums[result.key] = sums.fetch(result.key, 0.0) + result.score end range = EncodingEstimator::RangeScale.new( sums.values.min, sums.values.max ) scaled_scores = {} sums.each do |k,s| scaled_scores[ k ] = range.scale s end EncodingEstimator::Detection.new( scaled_scores, @conversions ) end
Private Instance Methods
Calculate the list of all combinations of languages and conversions
@return [Array<EncodingEstimator::CDCombination>] Conversion-Distribution-Combinations of the current config
# File lib/encoding_estimator/detector.rb, line 142 def combinations @languages.map { |l| @conversions.map { |c| EncodingEstimator::CDCombination.new( c, l.distribution ) } }.flatten end
Compute the scores of all combinations of languages and conversions on multiple processes. See num_processes.
@param [String] str Input string to compute the encoding on @param [Array<Hash>] matrix List of Conversion-Distribution-Combinations
@return [Array<Hash>] Hash with the keys “key” and “score”: key is the key of the conversion, score the result of
the evaluation for the input string
# File lib/encoding_estimator/detector.rb, line 133 def detect_mt( str, matrix ) Parallel.map( matrix, in_processes: num_processes ) do |combination| detect_single str, combination end end
Perform the evaluation of a Conversion-Distribution-Combination on an input string
@param [String] str Input to evaluate @param [EncodingEstimator::CDCombination] combination Distribution/Conversion to evaluate on the input
@return [EncodingEstimator::SingleDetectionResult] Result of the evaluation of the given combination on the input
# File lib/encoding_estimator/detector.rb, line 154 def detect_single( str, combination ) EncodingEstimator::SingleDetectionResult.new( combination.conversion.key, combination.distribution.evaluate( combination.conversion.perform(str), @penalty ) ) end
Compute the scores of all combinations of languages and conversions on a single thread.
@param [String] str Input string to compute the encoding on @param [Array<Hash>] matrix List of Conversion-Distribution-Combinations
@return [Array<Hash>] Hash with the keys “key” and “score”: key is the key of the conversion, score the result of
the evaluation for the input string
# File lib/encoding_estimator/detector.rb, line 120 def detect_st( str, matrix ) matrix.map do |combination| detect_single str, combination end end