class Wurfl::UserAgentMatcher
A class that lists wurfl handsets that match user_agents using the shortest Levenshtein distance, also sometimes called the edit distance, see en.wikipedia.org/wiki/Levenshtein_distance
The implementation of the Levenshtein distance used here is based on the algorithm found in the Text gem originally written by Paul Battley (pbattley@gmail.com)
The implementation given here makes heavy use of optimizations based on the estimation of the lower bound that can be achieved. Depending on the length of the user agent this brought an over all increase by a factor of about 40, although it renders the code slightly unreadable. In general the longer the user agent string and the greater the result distance, the longer the computation takes.
Author: Kai W. Zimmermann (kwz@kai-zimmermann.de)
Public Class Methods
Constructor Parameters: handsets: A hashtable of wurfl handsets indexed by wurfl_id.
# File lib/wurfl/user_agent_matcher.rb, line 25 def initialize(handsets) @handsets = handsets @longest_user_agent_length = @handsets.values.inject(0) do |m,hand| hand.user_agent.length > m ? hand.user_agent.length : m end @d=(0..@longest_user_agent_length).to_a @unpack_rule = 'C*' if RUBY_VERSION < "1.9" @unpack_rule = ($KCODE =~ /^U/i) ? 'U*' : 'C*' end end
Public Instance Methods
A method to retrieve a list of the uamatcher's handsets that match the passed user_agent closest using the Levenshtein distance. Parameters: user_agent: is a user_agent string to be matched Returns: An Array of all WurflHandsets that match the user_agent closest with the same distance The Levenshtein distance for these matches
# File lib/wurfl/user_agent_matcher.rb, line 45 def match_handsets(user_agent) rez = [] shortest_distance = [user_agent.length, @longest_user_agent_length].max s = user_agent.unpack(@unpack_rule) @handsets.values.each do |hand| distance = levenshtein_distance(user_agent, hand.user_agent, shortest_distance, s) # found a shorter distance match, flush old results rez.clear if shortest_distance > distance if shortest_distance >= distance # always add the first handset matched and each that has the same # distance as the shortest distance so far rez << hand shortest_distance = distance end break if shortest_distance == 0 end return rez, shortest_distance end
Private Instance Methods
Compute the Levenshtein distance or stop if the minimum found so far will be exceeded. Optimizations: Avoid GC, where possible reuse array The distance computation in the outer loop is monotonously decreasing thus we can safely stop if the estimated possible minimum exceeds the current minimum Parameters: s: is the first string, already unpacked t: is the second string, already unpacked min: is the minimum distance found so far Returns: The routine returns the computed distance or the minimum found so far if the # distance computed so far exceeds the minimum
# File lib/wurfl/user_agent_matcher.rb, line 111 def distance(s, t, min) n = s.length m = t.length return m if 0 == n return n if 0 == m d = @d # Optimization: Avoid GC by reusing array (0...m).each do |j| d[j] = j end x = 0 (0...n).each do |i| e = i+1 (0...m).each do |j| cost = (s[i] == t[j]) ? 0 : 1 x = d[j+1] + 1 # insertion y = e+1 x = y if y < x # deletion z = d[j] + cost x = z if z < x # substitution d[j] = e e = x end d[m] = x # estimate the minimum LD that still can be achieved, this will be # increasing monotonously stop once we exceed the current minimum return x - n + i + 1 if x - n + i + 1 > min end x end
A method to estimate and compute the Levenshtein distance (LD) based on the implementation from the Text gem. The implementation given here applies known upper and lower bounds as found at: en.wikipedia.org/wiki/Levenshtein_distance LD is always at least the difference of the sizes of the two strings.
-> We can safely discard the handset if the least distance is longer than the current shortest distance
LD is zero if and only if the strings are identical.
-> We can optimize the test for equality and stop searching after an exact match
Parameters: str1: is the user-agent to look up str2: is the user-agent to compare against min: is the minimum distance found so far s: the unpacked version of the user-agent string we look up Returns: It returns the least bound estimation if the least bound is already greater than the current minimum distance Otherwise it will compute the Levenshtein distance of str1 and str2 It optimizes the check for equality
# File lib/wurfl/user_agent_matcher.rb, line 90 def levenshtein_distance(str1, str2, min, s) diff = (str1.length - str2.length).abs return 0 if diff == 0 && str1 == str2 return diff if diff > min t = str2.unpack(@unpack_rule) distance(s, t, min) end