class SimpleLexer::Lexer

Object defined with certain rules that takes text as input and outputs Tokens based on the rules. @!attribute [r] rules

@return [Array<Regexp>] A list of the rules for the Lexer.

@!attribute [rw] pos

@return [Fixnum] The current position of the input pointer.

Attributes

pos[RW]
rules[R]

Public Class Methods

new(&rules) click to toggle source

Creates a new instance of Lexer. @yield [] Some rules passed to instance_eval. @see tok An example of a number Lexer using tok.

# File lib/simple_lexer.rb, line 30
def initialize(&rules)
  @rules = [] # list of {:rule => Regexp, :token => :token_id}
  @ignore = [] # list of Regexp
  @pos = 0 # position in input
  instance_eval &rules
end

Public Instance Methods

all_tokens() click to toggle source

Tokenize the entire input stream. @return [Array<Hash>] An Array of Tokens processed by the Lexer

# File lib/simple_lexer.rb, line 121
def all_tokens  
  tokens = []  
  loop do
    tokens << next_token
  end
rescue EndOfStreamException => e
  tokens 
end
finished?() click to toggle source

Checks if the Lexer has finished Tokenizing the entire input stream. @return [Boolean] Whether Lexer has reached the end of input.

# File lib/simple_lexer.rb, line 132
def finished?
  return @pos >= @load.length
end
ign(rule) click to toggle source

Defines rules of input classes to ignore (consume and not output any tokens.) @param [Regexp, Symbol] rule Regular expression that defines ignored

characters.

@note You can set rule to :whitespace to ignore whitespace

characters.

@example Ignoring parentheses

my_lexer = SimpleLexer::Lexer.new do
  tok /\w+/, :identifier
  ign /[\(\)]/
end

@example Ignoring whitespace

my_lexer = SimpleLexer::Lexer.new do
  tok /\w+/, :identifier
  ign :whitespace
end
# File lib/simple_lexer.rb, line 67
def ign(rule) 
  if rule == :whitespace
    rule = /\s+/
  end
  
  @ignore << Regexp.new('\A' + rule.source)
end
load() click to toggle source

What still remains to be processed. @return [String] Substring of the input starting from input pointer.

# File lib/simple_lexer.rb, line 84
def load 
  @load[@pos..-1]
end
load=(input) click to toggle source

Give the Lexer some text to tokenize. @param [String] input Text for the Lexer to tokenize.

# File lib/simple_lexer.rb, line 77
def load=(input)
  @load = input 
  @pos = 0 
end
next_token() click to toggle source

Gets the next Token in the input and advances the input pointer. @return [Hash{Symbol=>Values}]

- <code>:token</code> Token class
- <code>:text</code> Matched text
- <code>:value</code> Value as defined by passed block, if applicable.

@raise [NoMatchError] If load contains a sequence for which the Lexer has

no rule.
# File lib/simple_lexer.rb, line 95
def next_token
  # get the next token
  # my_lexer.next_token -> [ :token => :token_id, :text => matched ]
  for rule in @ignore
    if match = load[rule]
      @pos += match.length
    end
  end

  if @pos >= @load.length
    raise EndOfStreamException, "Finished lexing, no more tokens left."
  end

  for rule in @rules
    if match = load[rule[:rule]]
      @pos += match.length
      return {:token => rule[:token], :text => match, 
              :value => (!rule[:action].nil? ? rule[:action].call(match) : nil) } 
    end
  end

  raise NoMatchError, "Unable to match, unexpected characters: '#{load[0..10]}...'"
end
tok(rule, token, &action) click to toggle source

Defines a new Token rule for the Lexer to match. @param [Regexp] rule Regular expression that defines the token @param [Symbol] token Token class @yield [text] The expression will give the Token its value. @example Rule for numbers

my_lexer = SimpleLexer::Lexer.new do
  tok /-?\d+(\.\d+)?/, :number do |text| text.to_f end
end
my_lexer.load = "-435.234"
puts my_lexer.next_token[:value] # -435.234
# File lib/simple_lexer.rb, line 47
def tok(rule, token, &action)
  @rules << {:rule => Regexp.new('\A' + rule.source), :token => token, :action => action}
end