class UV::AbstractTokenizer
AbstractTokenizer
is similar to BufferedTokernizer however should only be used when there is no delimiter to work with. It uses a callback based system for application level tokenization without the heavy lifting.
Constants
- DEFAULT_ENCODING
Attributes
callback[RW]
indicator[RW]
size_limit[RW]
verbose[RW]
Public Class Methods
new(options)
click to toggle source
@param [Hash] options
# File lib/uv-rays/abstract_tokenizer.rb, line 16 def initialize(options) @callback = options[:callback] @indicator = options[:indicator] @size_limit = options[:size_limit] @verbose = options[:verbose] if @size_limit @encoding = options[:encoding] || DEFAULT_ENCODING raise ArgumentError, 'no callback provided' unless @callback reset if @indicator.is_a?(String) @indicator = String.new(@indicator).force_encoding(@encoding).freeze end end
Public Instance Methods
bytesize()
click to toggle source
@return [Integer]
# File lib/uv-rays/abstract_tokenizer.rb, line 109 def bytesize @input.bytesize end
empty?()
click to toggle source
@return [Boolean]
# File lib/uv-rays/abstract_tokenizer.rb, line 104 def empty? @input.empty? end
extract(data)
click to toggle source
Extract takes an arbitrary string of input data and returns an array of tokenized entities using a message start indicator
@example
tokenizer.extract(data). map { |entity| Decode(entity) }.each { ... }
@param [String] data
# File lib/uv-rays/abstract_tokenizer.rb, line 40 def extract(data) data.force_encoding(@encoding) @input << data entities = [] loop do found = false last = if @indicator check = @input.partition(@indicator) break unless check[1].length > 0 check[2] else @input end result = @callback.call(last) if result found = true # Check for multi-byte indicator edge case case result when Integer entities << last[0...result] @input = last[result..-1] else entities << last reset end end break if not found end # Check to see if the buffer has exceeded capacity, if we're imposing a limit if @size_limit && @input.size > @size_limit if @indicator.respond_to?(:length) # check for regex # save enough of the buffer that if one character of the indicator were # missing we would match on next extract (very much an edge case) and # best we can do with a full buffer. @input = @input[-(@indicator.length - 1)..-1] else reset end raise 'input buffer exceeded limit' if @verbose end return entities end
flush()
click to toggle source
Flush the contents of the input buffer, i.e. return the input buffer even though a token has not yet been encountered.
@return [String]
# File lib/uv-rays/abstract_tokenizer.rb, line 97 def flush buffer = @input reset buffer end
Private Instance Methods
reset()
click to toggle source
# File lib/uv-rays/abstract_tokenizer.rb, line 117 def reset @input = String.new.force_encoding(@encoding) end