class AsciiMath::Tokenizer
Internal: Splits an ASCIIMath expression into a sequence of tokens. Each token is represented as a Hash containing the keys :value and :type. The :value key is used to store the text associated with each token. The :type key indicates the semantics of the token. The value for :type will be one of the following symbols:
-
:symbol a symbolic name or a bit of text without any further semantics
-
:text a bit of arbitrary text
-
:number a number
-
:operator a mathematical operator symbol
-
:unary a unary operator (e.g., sqrt, text, …)
-
:infix an infix operator (e.g, /, _, ^, …)
-
:binary a binary operator (e.g., frac, root, …)
-
:eof indicates no more tokens are available
Constants
- NUMBER
- QUOTED_TEXT
- TEX_TEXT
- WHITESPACE
Public Class Methods
Public: Initializes an ASCIIMath tokenizer.
string - The ASCIIMath expression to tokenize symbols - The symbol table to use while tokenizing
# File lib/asciimath/parser.rb, line 58 def initialize(string, symbols) @string = StringScanner.new(string) @symbols = symbols lookahead = @symbols.keys.map { |k| k.length }.max @symbol_regexp = /((?:\\[\s0-9]|[^\s0-9]){1,#{lookahead}})/ @push_back = nil end
Public Instance Methods
Public: Read the next token from the ASCIIMath expression and move the tokenizer ahead by one token.
Returns the next token as a Hash
# File lib/asciimath/parser.rb, line 70 def next_token if @push_back t = @push_back @push_back = nil return t end @string.scan(WHITESPACE) return {:value => nil, :type => :eof} if @string.eos? case @string.peek(1) when '"' read_quoted_text when 't' case @string.peek(5) when 'text(' read_tex_text else read_symbol end when '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' read_number || read_symbol else read_symbol end end
Public: Pushes the given token back to the tokenizer. A subsequent call to next_token
will return the given token rather than generating a new one. At most one token can be pushed back.
token - The token to push back
# File lib/asciimath/parser.rb, line 103 def push_back(token) @push_back = token unless token[:type] == :eof end
Private Instance Methods
# File lib/asciimath/parser.rb, line 194 def byte_size(s) s.byte_size end
# File lib/asciimath/parser.rb, line 140 def bytesize(s) s.bytesize end
Private: Reads a number token from the input string
Returns the number token or nil if a number token could not be matched at the current position
# File lib/asciimath/parser.rb, line 133 def read_number read_value(NUMBER) do |number| {:value => number, :type => :number} end end
Private: Reads a text token from the input string
Returns the text token or nil if a text token could not be matched at the current position
# File lib/asciimath/parser.rb, line 113 def read_quoted_text read_value(QUOTED_TEXT) do |text| {:value => text[1..-2], :type => :text} end end
Private: Reads a symbol token from the input string. This method first creates a String from the input String starting from the current position with a length that matches that of the longest key in the symbol table. It then looks up that substring in the symbol table. If the substring is present in the symbol table, the associated value is returned and the position is moved ahead by the length of the substring. Otherwise this method chops one character off the end of the substring and repeats the symbol lookup. This continues until a single character is left. If that character can still not be found in the symbol table, then an identifier token is returned whose value is the remaining single character string.
Returns the token that was read or nil if a token could not be matched at the current position
# File lib/asciimath/parser.rb, line 162 def read_symbol position = @string.pos read_value(@symbol_regexp) do |s| until s.length == 1 || @symbols.include?(s) s.chop! end @string.pos = position + bytesize(s) symbol = @symbols[s] if symbol symbol.merge({:text => s}) else {:value => s, :type => :identifier} end end end
Private: Reads a text token from the input string
Returns the text token or nil if a text token could not be matched at the current position
# File lib/asciimath/parser.rb, line 123 def read_tex_text read_value(TEX_TEXT) do |text| {:value => text[5..-2], :type => :text} end end
Private: Reads a String from the input String that matches the given RegExp
regexp - a RegExp that will be used to match the token block - if a block is provided the matched token will be passed to the block
Returns the matched String or the value returned by the block if one was given
# File lib/asciimath/parser.rb, line 184 def read_value(regexp) s = @string.scan(regexp) if s && block_given? yield s else s end end