class AsciiMath::Tokenizer

Internal: Splits an ASCIIMath expression into a sequence of tokens. Each token is represented as a Hash containing the keys :value and :type. The :value key is used to store the text associated with each token. The :type key indicates the semantics of the token. The value for :type will be one of the following symbols:

Constants

NUMBER
QUOTED_TEXT
TEX_TEXT
WHITESPACE

Public Class Methods

new(string, symbols) click to toggle source

Public: Initializes an ASCIIMath tokenizer.

string - The ASCIIMath expression to tokenize symbols - The symbol table to use while tokenizing

# File lib/asciimath/parser.rb, line 58
def initialize(string, symbols)
  @string = StringScanner.new(string)
  @symbols = symbols
  lookahead = @symbols.keys.map { |k| k.length }.max
  @symbol_regexp = /((?:\\[\s0-9]|[^\s0-9]){1,#{lookahead}})/
  @push_back = nil
end

Public Instance Methods

next_token() click to toggle source

Public: Read the next token from the ASCIIMath expression and move the tokenizer ahead by one token.

Returns the next token as a Hash

# File lib/asciimath/parser.rb, line 70
def next_token
  if @push_back
    t = @push_back
    @push_back = nil
    return t
  end

  @string.scan(WHITESPACE)

  return {:value => nil, :type => :eof} if @string.eos?

  case @string.peek(1)
    when '"'
      read_quoted_text
    when 't'
      case @string.peek(5)
        when 'text('
          read_tex_text
        else
          read_symbol
      end
    when '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
      read_number || read_symbol
    else
      read_symbol
  end
end
push_back(token) click to toggle source

Public: Pushes the given token back to the tokenizer. A subsequent call to next_token will return the given token rather than generating a new one. At most one token can be pushed back.

token - The token to push back

# File lib/asciimath/parser.rb, line 103
def push_back(token)
  @push_back = token unless token[:type] == :eof
end

Private Instance Methods

byte_size(s) click to toggle source
# File lib/asciimath/parser.rb, line 194
def byte_size(s)
  s.byte_size
end
bytesize(s) click to toggle source
# File lib/asciimath/parser.rb, line 140
def bytesize(s)
  s.bytesize
end
read_number() click to toggle source

Private: Reads a number token from the input string

Returns the number token or nil if a number token could not be matched at the current position

# File lib/asciimath/parser.rb, line 133
def read_number
  read_value(NUMBER) do |number|
    {:value => number, :type => :number}
  end
end
read_quoted_text() click to toggle source

Private: Reads a text token from the input string

Returns the text token or nil if a text token could not be matched at the current position

# File lib/asciimath/parser.rb, line 113
def read_quoted_text
  read_value(QUOTED_TEXT) do |text|
    {:value => text[1..-2], :type => :text}
  end
end
read_symbol() click to toggle source

Private: Reads a symbol token from the input string. This method first creates a String from the input String starting from the current position with a length that matches that of the longest key in the symbol table. It then looks up that substring in the symbol table. If the substring is present in the symbol table, the associated value is returned and the position is moved ahead by the length of the substring. Otherwise this method chops one character off the end of the substring and repeats the symbol lookup. This continues until a single character is left. If that character can still not be found in the symbol table, then an identifier token is returned whose value is the remaining single character string.

Returns the token that was read or nil if a token could not be matched at the current position

# File lib/asciimath/parser.rb, line 162
def read_symbol
  position = @string.pos
  read_value(@symbol_regexp) do |s|
    until s.length == 1 || @symbols.include?(s)
      s.chop!
    end
    @string.pos = position + bytesize(s)
    symbol = @symbols[s]
    if symbol
      symbol.merge({:text => s})
    else
      {:value => s, :type => :identifier}
    end
  end
end
read_tex_text() click to toggle source

Private: Reads a text token from the input string

Returns the text token or nil if a text token could not be matched at the current position

# File lib/asciimath/parser.rb, line 123
def read_tex_text
  read_value(TEX_TEXT) do |text|
    {:value => text[5..-2], :type => :text}
  end
end
read_value(regexp) { |s| ... } click to toggle source

Private: Reads a String from the input String that matches the given RegExp

regexp - a RegExp that will be used to match the token block - if a block is provided the matched token will be passed to the block

Returns the matched String or the value returned by the block if one was given

# File lib/asciimath/parser.rb, line 184
def read_value(regexp)
  s = @string.scan(regexp)
  if s && block_given?
    yield s
  else
    s
  end
end