class RequestLogAnalyzer::Source::LogParser

The LogParser class reads log data from a given source and uses a file format definition to parse all relevent information about requests from the file. A FileFormat module should be provided that contains the definitions of the lines that occur in the log data.

De order in which lines occur is used to combine lines to a single request. If these lines are mixed, requests cannot be combined properly. This can be the case if data is written to the log file simultaneously by different mongrel processes. This problem is detected by the parser. It will emit warnings when this occurs. LogParser supports multiple parse strategies that deal differently with this problem.

Constants

DEFAULT_LINE_DIVIDER
DEFAULT_MAX_LINE_LENGTH

The maximum number of bytes to read from a line.

DEFAULT_PARSE_STRATEGY

The default parse strategy that will be used to parse the input.

PARSE_STRATEGIES

All available parse strategies.

Attributes

current_file[R]
current_lineno[R]
parsed_lines[R]
parsed_requests[R]
processed_files[R]
skipped_lines[R]
skipped_requests[R]
source_files[R]
warnings[R]

Public Class Methods

new(format, options = {}) click to toggle source

Initializes the log file parser instance. It will apply the language specific FileFormat module to this instance. It will use the line definitions in this module to parse any input that it is given (see parse_io).

format

The current file format instance

options

A hash of options that are used by the parser

Calls superclass method RequestLogAnalyzer::Source::Base::new
   # File lib/request_log_analyzer/source/log_parser.rb
34 def initialize(format, options = {})
35   super(format, options)
36   @warnings         = 0
37   @parsed_lines     = 0
38   @parsed_requests  = 0
39   @skipped_lines    = 0
40   @skipped_requests = 0
41   @current_request  = nil
42   @current_source   = nil
43   @current_file     = nil
44   @current_lineno   = nil
45   @processed_files  = []
46   @source_files     = options[:source_files]
47   @progress_handler = nil
48   @warning_handler  = nil
49 
50   @options[:parse_strategy] ||= DEFAULT_PARSE_STRATEGY
51   unless PARSE_STRATEGIES.include?(@options[:parse_strategy])
52     fail "Unknown parse strategy: #{@options[@parse_strategy]}"
53   end
54 end

Public Instance Methods

decompress_file?(filename) click to toggle source

Check if a file has a compressed extention in the filename. If recognized, return the command string used to decompress the file

    # File lib/request_log_analyzer/source/log_parser.rb
 97 def decompress_file?(filename)
 98   nice_command = 'nice -n 5'
 99 
100   return "#{nice_command} gunzip -c -d #{filename}" if filename.match(/\.tar.gz$/) || filename.match(/\.tgz$/) || filename.match(/\.gz$/)
101   return "#{nice_command} bunzip2 -c -d #{filename}" if filename.match(/\.bz2$/)
102   return "#{nice_command} unzip -p #{filename}" if filename.match(/\.zip$/)
103 
104   ''
105 end
each(options = {})

Make sure the Enumerable methods work as expected

Alias for: each_request
each_request(options = {}) { |:request, request| ... } click to toggle source

Reads the input, which can either be a file, sequence of files or STDIN to parse lines specified in the FileFormat. This lines will be combined into Request instances, that will be yielded. The actual parsing occurs in the parse_io method.

options

A Hash of options that will be pased to parse_io.

   # File lib/request_log_analyzer/source/log_parser.rb
68 def each_request(options = {}, &block) # :yields: :request, request
69   case @source_files
70   when IO
71     if @source_files == $stdin
72       puts 'Parsing from the standard input. Press CTRL+C to finish.' # FIXME: not here
73     end
74     parse_stream(@source_files, options, &block)
75   when String
76     parse_file(@source_files, options, &block)
77   when Array
78     parse_files(@source_files, options, &block)
79   else
80     fail 'Unknown source provided'
81   end
82 end
Also aliased as: each
line_divider() click to toggle source
   # File lib/request_log_analyzer/source/log_parser.rb
60 def line_divider
61   file_format.line_divider || DEFAULT_LINE_DIVIDER
62 end
max_line_length() click to toggle source
   # File lib/request_log_analyzer/source/log_parser.rb
56 def max_line_length
57   file_format.max_line_length || DEFAULT_MAX_LINE_LENGTH
58 end
parse_file(file, options = {}, &block) click to toggle source

Parses a log file. Creates an IO stream for the provided file, and sends it to parse_io for further handling. This method supports progress updates that can be used to display a progressbar

If the logfile is compressed, it is uncompressed to stdout and read. TODO: Check if IO.popen encounters problems with the given command line. TODO: Fix progress bar that is broken for IO.popen, as it returns a single string.

file

The file that should be parsed.

options

A Hash of options that will be pased to parse_io.

    # File lib/request_log_analyzer/source/log_parser.rb
116 def parse_file(file, options = {}, &block)
117   if File.directory?(file)
118     parse_files(Dir["#{ file }/*"], options, &block)
119     return
120   end
121 
122   @current_source = File.expand_path(file)
123   @source_changes_handler.call(:started, @current_source) if @source_changes_handler
124 
125   if decompress_file?(file).empty?
126 
127     @progress_handler = @dormant_progress_handler
128     @progress_handler.call(:started, file) if @progress_handler
129 
130     File.open(file, 'rb') { |f| parse_io(f, options, &block) }
131 
132     @progress_handler.call(:finished, file) if @progress_handler
133     @progress_handler = nil
134 
135     @processed_files.push(@current_source.dup)
136 
137   else
138     IO.popen(decompress_file?(file), 'rb') { |f| parse_io(f, options, &block) }
139   end
140 
141   @source_changes_handler.call(:finished, @current_source) if @source_changes_handler
142 
143   @current_source = nil
144 end
parse_files(files, options = {}) { |request| ... } click to toggle source

Parses a list of subsequent files of the same format, by calling parse_file for every file in the array.

files

The Array of files that should be parsed

options

A Hash of options that will be pased to parse_io.

   # File lib/request_log_analyzer/source/log_parser.rb
91 def parse_files(files, options = {}, &block) # :yields: request
92   files.each { |file| parse_file(file, options, &block) }
93 end
parse_io_18(io, options = {}) { |request| ... } click to toggle source

This method loops over each line of the input stream. It will try to parse this line as any of the lines that are defined by the current file format (see RequestLogAnalyazer::FileFormat). It will then combine these parsed line into requests using heuristics. These requests (see RequestLogAnalyzer::Request) will then be yielded for further processing in the pipeline.

  • RequestLogAnalyzer::LineDefinition#matches is called to test if a line matches a line definition of the file format.

  • update_current_request is used to combine parsed lines into requests using heuristics.

  • The method will yield progress updates if a progress handler is installed using progress=

  • The method will yield parse warnings if a warning handler is installed using warning=

This is a Ruby 1.8 specific version that doesn’t offer memory protection.

io

The IO instance to use as source

options

A hash of options that can be used by the parser.

    # File lib/request_log_analyzer/source/log_parser.rb
203 def parse_io_18(io, options = {}, &block) # :yields: request
204   @line_divider    = options[:line_divider]    || line_divider
205   @current_lineno  = 0
206   while line = io.gets(@line_divider)
207     @current_lineno += 1
208     @progress_handler.call(:progress, io.pos) if @progress_handler && @current_lineno % 255 == 0
209     parse_line(line, &block)
210   end
211 
212   warn(:unfinished_request_on_eof, 'End of file reached, but last request was not completed!') unless @current_request.nil?
213   @current_lineno = nil
214 end
parse_io_19(io, options = {}) { |request| ... } click to toggle source

This method loops over each line of the input stream. It will try to parse this line as any of the lines that are defined by the current file format (see RequestLogAnalyazer::FileFormat). It will then combine these parsed line into requests using heuristics. These requests (see RequestLogAnalyzer::Request) will then be yielded for further processing in the pipeline.

  • RequestLogAnalyzer::LineDefinition#matches is called to test if a line matches a line definition of the file format.

  • update_current_request is used to combine parsed lines into requests using heuristics.

  • The method will yield progress updates if a progress handler is installed using progress=

  • The method will yield parse warnings if a warning handler is installed using warning=

This is a Ruby 1.9 specific version that offers memory protection.

io

The IO instance to use as source

options

A hash of options that can be used by the parser.

    # File lib/request_log_analyzer/source/log_parser.rb
175 def parse_io_19(io, options = {}, &block) # :yields: request
176   @max_line_length = options[:max_line_length] || max_line_length
177   @line_divider    = options[:line_divider]    || line_divider
178   @current_lineno  = 0
179   while line = io.gets(@line_divider, @max_line_length)
180     @current_lineno += 1
181     @progress_handler.call(:progress, io.pos) if @progress_handler && @current_lineno % 255 == 0
182     parse_line(line, &block)
183   end
184 
185   warn(:unfinished_request_on_eof, 'End of file reached, but last request was not completed!') unless @current_request.nil?
186   @current_lineno = nil
187 end
parse_line(line) { |request| ... } click to toggle source

Parses a single line using the current file format. If successful, use the parsed information to build a request

line

The line to parse

block

The block to send fully parsed requests to.

    # File lib/request_log_analyzer/source/log_parser.rb
222 def parse_line(line, &block) # :yields: request
223   if request_data = file_format.parse_line(line) { |wt, message| warn(wt, message) }
224     @parsed_lines += 1
225     update_current_request(request_data.merge(source: @current_source, lineno: @current_lineno), &block)
226   end
227 end
parse_stream(stream, options = {}, &block) click to toggle source

Parses an IO stream. It will simply call parse_io. This function does not support progress updates because the length of a stream is not known.

stream

The IO stream that should be parsed.

options

A Hash of options that will be pased to parse_io.

    # File lib/request_log_analyzer/source/log_parser.rb
150 def parse_stream(stream, options = {}, &block)
151   parse_io(stream, options, &block)
152 end
parse_string(string, options = {}, &block) click to toggle source

Parses a string. It will simply call parse_io. This function does not support progress updates.

string

The string that should be parsed.

options

A Hash of options that will be pased to parse_io.

    # File lib/request_log_analyzer/source/log_parser.rb
157 def parse_string(string, options = {}, &block)
158   parse_io(StringIO.new(string), options, &block)
159 end
progress=(proc) click to toggle source

Add a block to this method to install a progress handler while parsing.

proc

The proc that will be called to handle progress update messages

    # File lib/request_log_analyzer/source/log_parser.rb
231 def progress=(proc)
232   @dormant_progress_handler = proc
233 end
source_changes=(proc) click to toggle source

Add a block to this method to install a source change handler while parsing,

proc

The proc that will be called to handle source changes

    # File lib/request_log_analyzer/source/log_parser.rb
243 def source_changes=(proc)
244   @source_changes_handler = proc
245 end
warn(type, message) click to toggle source

This method is called by the parser if it encounteres any parsing problems. It will call the installed warning handler if any.

By default, RequestLogAnalyzer::Controller will install a warning handler that will pass the warnings to each aggregator so they can do something useful with it.

type

The warning type (a Symbol)

message

A message explaining the warning

    # File lib/request_log_analyzer/source/log_parser.rb
256 def warn(type, message)
257   @warnings += 1
258   @warning_handler.call(type, message, @current_lineno) if @warning_handler
259 end
warning=(proc) click to toggle source

Add a block to this method to install a warning handler while parsing,

proc

The proc that will be called to handle parse warning messages

    # File lib/request_log_analyzer/source/log_parser.rb
237 def warning=(proc)
238   @warning_handler = proc
239 end

Protected Instance Methods

alternative_header_line?(hash) click to toggle source

Checks whether a given line hash is an alternative header line according to the current file format.

hash

A hash of data that was parsed from the line.

    # File lib/request_log_analyzer/source/log_parser.rb
338 def alternative_header_line?(hash)
339   hash[:line_definition].header == :alternative
340 end
handle_request(request) { |:request, request| ... } click to toggle source

Handles the parsed request by sending it into the pipeline.

request

The parsed request instance (RequestLogAnalyzer::Request)

    # File lib/request_log_analyzer/source/log_parser.rb
329 def handle_request(request, &_block) # :yields: :request, request
330   @parsed_requests += 1
331   request.validate
332   accepted = block_given? ? yield(request) : true
333   @skipped_requests += 1 unless accepted
334 end
header_line?(hash) click to toggle source

Checks whether a given line hash is a header line according to the current file format.

hash

A hash of data that was parsed from the line.

    # File lib/request_log_analyzer/source/log_parser.rb
344 def header_line?(hash)
345   hash[:line_definition].header == true
346 end
update_current_request(request_data) { |request| ... } click to toggle source

Combines the different lines of a request into a single Request object. It will start a new request when a header line is encountered en will emit the request when a footer line is encountered.

Combining the lines is done using heuristics. Problems can occur in this process. The current parse strategy defines how these cases are handled.

When using the ‘assume-correct’ parse strategy (default):

  • Every line that is parsed before a header line is ignored as it cannot be included in any request. It will emit a :no_current_request warning.

  • If a header line is found before the previous requests was closed, the previous request will be yielded and a new request will be started.

When using the ‘cautious’ parse strategy:

  • Every line that is parsed before a header line is ignored as it cannot be included in any request. It will emit a :no_current_request warning.

  • A header line that is parsed before a request is closed by a footer line, is a sign of an unproperly ordered file. All data that is gathered for the request until then is discarded and the next request is ignored as well. An :unclosed_request warning is emitted.

request_data

A hash of data that was parsed from the last line.

    # File lib/request_log_analyzer/source/log_parser.rb
285 def update_current_request(request_data, &block) # :yields: request
286   if alternative_header_line?(request_data)
287     if @current_request
288       @current_request << request_data
289     else
290       @current_request = @file_format.request(request_data)
291     end
292   elsif header_line?(request_data)
293     if @current_request
294       case options[:parse_strategy]
295       when 'assume-correct'
296         handle_request(@current_request, &block)
297         @current_request = @file_format.request(request_data)
298       when 'cautious'
299         @skipped_lines += 1
300         warn(:unclosed_request, "Encountered header line (#{request_data[:line_definition].name.inspect}), but previous request was not closed!")
301         @current_request = nil # remove all data that was parsed, skip next request as well.
302       end
303     elsif footer_line?(request_data)
304       handle_request(@file_format.request(request_data), &block)
305     else
306       @current_request = @file_format.request(request_data)
307     end
308   else
309     if @current_request
310       @current_request << request_data
311       if footer_line?(request_data)
312         handle_request(@current_request, &block) # yield @current_request
313         @current_request = nil
314       end
315     else
316       @skipped_lines += 1
317       warn(:no_current_request, "Parseable line (#{request_data[:line_definition].name.inspect}) found outside of a request!")
318     end
319   end
320 end