iter_long(string, [start, [end]])¶
Perform the modified Aho-Corasick search procedure which matches the longest words from set.
Return an iterator of tuples (end_index
, value
) for keys found in
string where:
end_index
is the end index in the input string where a trie key string was found.value
is the value associated with the found key string.
The start
and end
optional arguments can be used to limit the search
to an input string slice as in string[start:end]
.
Example¶
The default Aho-Corasick algorithm returns all occurrences of words stored
in the automaton, including substring of other words from string. Method
iter_long
reports only the longest match.
For set of words {“he”, “her”, “here”} and a needle “he here her” the default algorithm finds following words: “he”, “he”, “her”, “here”, “he”, “her”, while the modified one yields only: “he”, “here”, “her”.
>>> import ahocorasick
>>> A = ahocorasick.Automaton()
>>> A.add_word("he", "he")
True
>>> A.add_word("her", "her")
True
>>> A.add_word("here", "here")
True
>>> A.make_automaton()
>>> needle = "he here her"
>>> list(A.iter_long(needle))
[(1, 'he'), (6, 'here'), (10, 'her')]
>>> list(A.iter(needle))
[(1, 'he'), (4, 'he'), (5, 'her'), (6, 'here'), (9, 'he'), (10, 'her')]