class SimpleLexer::Lexer
Object defined with certain rules that takes text as input and outputs Tokens based on the rules. @!attribute [r] rules
@return [Array<Regexp>] A list of the rules for the Lexer.
@!attribute [rw] pos
@return [Fixnum] The current position of the input pointer.
Attributes
Public Class Methods
Creates a new instance of Lexer
. @yield [] Some rules passed to instance_eval. @see tok
An example of a number Lexer
using tok
.
# File lib/simple_lexer.rb, line 30 def initialize(&rules) @rules = [] # list of {:rule => Regexp, :token => :token_id} @ignore = [] # list of Regexp @pos = 0 # position in input instance_eval &rules end
Public Instance Methods
Tokenize the entire input stream. @return [Array<Hash>] An Array of Tokens processed by the Lexer
# File lib/simple_lexer.rb, line 121 def all_tokens tokens = [] loop do tokens << next_token end rescue EndOfStreamException => e tokens end
Defines rules of input classes to ignore (consume and not output any tokens.) @param [Regexp, Symbol] rule Regular expression that defines ignored
characters.
@note You can set rule to :whitespace
to ignore whitespace
characters.
@example Ignoring parentheses
my_lexer = SimpleLexer::Lexer.new do tok /\w+/, :identifier ign /[\(\)]/ end
@example Ignoring whitespace
my_lexer = SimpleLexer::Lexer.new do tok /\w+/, :identifier ign :whitespace end
# File lib/simple_lexer.rb, line 67 def ign(rule) if rule == :whitespace rule = /\s+/ end @ignore << Regexp.new('\A' + rule.source) end
What still remains to be processed. @return [String] Substring of the input starting from input pointer.
# File lib/simple_lexer.rb, line 84 def load @load[@pos..-1] end
Gets the next Token in the input and advances the input pointer. @return [Hash{Symbol=>Values}]
- <code>:token</code> Token class - <code>:text</code> Matched text - <code>:value</code> Value as defined by passed block, if applicable.
@raise [NoMatchError] If load contains a sequence for which the Lexer
has
no rule.
# File lib/simple_lexer.rb, line 95 def next_token # get the next token # my_lexer.next_token -> [ :token => :token_id, :text => matched ] for rule in @ignore if match = load[rule] @pos += match.length end end if @pos >= @load.length raise EndOfStreamException, "Finished lexing, no more tokens left." end for rule in @rules if match = load[rule[:rule]] @pos += match.length return {:token => rule[:token], :text => match, :value => (!rule[:action].nil? ? rule[:action].call(match) : nil) } end end raise NoMatchError, "Unable to match, unexpected characters: '#{load[0..10]}...'" end
Defines a new Token rule for the Lexer
to match. @param [Regexp] rule Regular expression that defines the token @param [Symbol] token Token class @yield [text] The expression will give the Token its value. @example Rule for numbers
my_lexer = SimpleLexer::Lexer.new do tok /-?\d+(\.\d+)?/, :number do |text| text.to_f end end my_lexer.load = "-435.234" puts my_lexer.next_token[:value] # -435.234
# File lib/simple_lexer.rb, line 47 def tok(rule, token, &action) @rules << {:rule => Regexp.new('\A' + rule.source), :token => token, :action => action} end