class RequestLogAnalyzer::Source::LogParser
The LogParser
class reads log data from a given source and uses a file format definition to parse all relevent information about requests from the file. A FileFormat
module should be provided that contains the definitions of the lines that occur in the log data.
De order in which lines occur is used to combine lines to a single request. If these lines are mixed, requests cannot be combined properly. This can be the case if data is written to the log file simultaneously by different mongrel processes. This problem is detected by the parser. It will emit warnings when this occurs. LogParser
supports multiple parse strategies that deal differently with this problem.
Constants
- DEFAULT_LINE_DIVIDER
- DEFAULT_MAX_LINE_LENGTH
The maximum number of bytes to read from a line.
- DEFAULT_PARSE_STRATEGY
The default parse strategy that will be used to parse the input.
- PARSE_STRATEGIES
All available parse strategies.
Attributes
Public Class Methods
Initializes the log file parser instance. It will apply the language specific FileFormat
module to this instance. It will use the line definitions in this module to parse any input that it is given (see parse_io).
format
-
The current file format instance
options
-
A hash of options that are used by the parser
RequestLogAnalyzer::Source::Base::new
# File lib/request_log_analyzer/source/log_parser.rb 34 def initialize(format, options = {}) 35 super(format, options) 36 @warnings = 0 37 @parsed_lines = 0 38 @parsed_requests = 0 39 @skipped_lines = 0 40 @skipped_requests = 0 41 @current_request = nil 42 @current_source = nil 43 @current_file = nil 44 @current_lineno = nil 45 @processed_files = [] 46 @source_files = options[:source_files] 47 @progress_handler = nil 48 @warning_handler = nil 49 50 @options[:parse_strategy] ||= DEFAULT_PARSE_STRATEGY 51 unless PARSE_STRATEGIES.include?(@options[:parse_strategy]) 52 fail "Unknown parse strategy: #{@options[@parse_strategy]}" 53 end 54 end
Public Instance Methods
Check if a file has a compressed extention in the filename. If recognized, return the command string used to decompress the file
# File lib/request_log_analyzer/source/log_parser.rb 97 def decompress_file?(filename) 98 nice_command = 'nice -n 5' 99 100 return "#{nice_command} gunzip -c -d #{filename}" if filename.match(/\.tar.gz$/) || filename.match(/\.tgz$/) || filename.match(/\.gz$/) 101 return "#{nice_command} bunzip2 -c -d #{filename}" if filename.match(/\.bz2$/) 102 return "#{nice_command} unzip -p #{filename}" if filename.match(/\.zip$/) 103 104 '' 105 end
Reads the input, which can either be a file, sequence of files or STDIN to parse lines specified in the FileFormat
. This lines will be combined into Request
instances, that will be yielded. The actual parsing occurs in the parse_io method.
options
-
A Hash of options that will be pased to parse_io.
# File lib/request_log_analyzer/source/log_parser.rb 68 def each_request(options = {}, &block) # :yields: :request, request 69 case @source_files 70 when IO 71 if @source_files == $stdin 72 puts 'Parsing from the standard input. Press CTRL+C to finish.' # FIXME: not here 73 end 74 parse_stream(@source_files, options, &block) 75 when String 76 parse_file(@source_files, options, &block) 77 when Array 78 parse_files(@source_files, options, &block) 79 else 80 fail 'Unknown source provided' 81 end 82 end
# File lib/request_log_analyzer/source/log_parser.rb 60 def line_divider 61 file_format.line_divider || DEFAULT_LINE_DIVIDER 62 end
# File lib/request_log_analyzer/source/log_parser.rb 56 def max_line_length 57 file_format.max_line_length || DEFAULT_MAX_LINE_LENGTH 58 end
Parses a log file. Creates an IO stream for the provided file, and sends it to parse_io for further handling. This method supports progress updates that can be used to display a progressbar
If the logfile is compressed, it is uncompressed to stdout and read. TODO: Check if IO.popen encounters problems with the given command line. TODO: Fix progress bar that is broken for IO.popen, as it returns a single string.
file
-
The file that should be parsed.
options
-
A Hash of options that will be pased to parse_io.
# File lib/request_log_analyzer/source/log_parser.rb 116 def parse_file(file, options = {}, &block) 117 if File.directory?(file) 118 parse_files(Dir["#{ file }/*"], options, &block) 119 return 120 end 121 122 @current_source = File.expand_path(file) 123 @source_changes_handler.call(:started, @current_source) if @source_changes_handler 124 125 if decompress_file?(file).empty? 126 127 @progress_handler = @dormant_progress_handler 128 @progress_handler.call(:started, file) if @progress_handler 129 130 File.open(file, 'rb') { |f| parse_io(f, options, &block) } 131 132 @progress_handler.call(:finished, file) if @progress_handler 133 @progress_handler = nil 134 135 @processed_files.push(@current_source.dup) 136 137 else 138 IO.popen(decompress_file?(file), 'rb') { |f| parse_io(f, options, &block) } 139 end 140 141 @source_changes_handler.call(:finished, @current_source) if @source_changes_handler 142 143 @current_source = nil 144 end
Parses a list of subsequent files of the same format, by calling parse_file
for every file in the array.
files
-
The Array of files that should be parsed
options
-
A Hash of options that will be pased to parse_io.
# File lib/request_log_analyzer/source/log_parser.rb 91 def parse_files(files, options = {}, &block) # :yields: request 92 files.each { |file| parse_file(file, options, &block) } 93 end
This method loops over each line of the input stream. It will try to parse this line as any of the lines that are defined by the current file format (see RequestLogAnalyazer::FileFormat). It will then combine these parsed line into requests using heuristics. These requests (see RequestLogAnalyzer::Request
) will then be yielded for further processing in the pipeline.
-
RequestLogAnalyzer::LineDefinition#matches
is called to test if a line matches a line definition of the file format. -
update_current_request
is used to combine parsed lines into requests using heuristics. -
The method will yield progress updates if a progress handler is installed using progress=
-
The method will yield parse warnings if a warning handler is installed using warning=
This is a Ruby 1.8 specific version that doesn’t offer memory protection.
io
-
The IO instance to use as source
options
-
A hash of options that can be used by the parser.
# File lib/request_log_analyzer/source/log_parser.rb 203 def parse_io_18(io, options = {}, &block) # :yields: request 204 @line_divider = options[:line_divider] || line_divider 205 @current_lineno = 0 206 while line = io.gets(@line_divider) 207 @current_lineno += 1 208 @progress_handler.call(:progress, io.pos) if @progress_handler && @current_lineno % 255 == 0 209 parse_line(line, &block) 210 end 211 212 warn(:unfinished_request_on_eof, 'End of file reached, but last request was not completed!') unless @current_request.nil? 213 @current_lineno = nil 214 end
This method loops over each line of the input stream. It will try to parse this line as any of the lines that are defined by the current file format (see RequestLogAnalyazer::FileFormat). It will then combine these parsed line into requests using heuristics. These requests (see RequestLogAnalyzer::Request
) will then be yielded for further processing in the pipeline.
-
RequestLogAnalyzer::LineDefinition#matches
is called to test if a line matches a line definition of the file format. -
update_current_request
is used to combine parsed lines into requests using heuristics. -
The method will yield progress updates if a progress handler is installed using progress=
-
The method will yield parse warnings if a warning handler is installed using warning=
This is a Ruby 1.9 specific version that offers memory protection.
io
-
The IO instance to use as source
options
-
A hash of options that can be used by the parser.
# File lib/request_log_analyzer/source/log_parser.rb 175 def parse_io_19(io, options = {}, &block) # :yields: request 176 @max_line_length = options[:max_line_length] || max_line_length 177 @line_divider = options[:line_divider] || line_divider 178 @current_lineno = 0 179 while line = io.gets(@line_divider, @max_line_length) 180 @current_lineno += 1 181 @progress_handler.call(:progress, io.pos) if @progress_handler && @current_lineno % 255 == 0 182 parse_line(line, &block) 183 end 184 185 warn(:unfinished_request_on_eof, 'End of file reached, but last request was not completed!') unless @current_request.nil? 186 @current_lineno = nil 187 end
Parses a single line using the current file format. If successful, use the parsed information to build a request
line
-
The line to parse
block
-
The block to send fully parsed requests to.
# File lib/request_log_analyzer/source/log_parser.rb 222 def parse_line(line, &block) # :yields: request 223 if request_data = file_format.parse_line(line) { |wt, message| warn(wt, message) } 224 @parsed_lines += 1 225 update_current_request(request_data.merge(source: @current_source, lineno: @current_lineno), &block) 226 end 227 end
Parses an IO stream. It will simply call parse_io. This function does not support progress updates because the length of a stream is not known.
stream
-
The IO stream that should be parsed.
options
-
A Hash of options that will be pased to parse_io.
# File lib/request_log_analyzer/source/log_parser.rb 150 def parse_stream(stream, options = {}, &block) 151 parse_io(stream, options, &block) 152 end
Parses a string. It will simply call parse_io. This function does not support progress updates.
string
-
The string that should be parsed.
options
-
A Hash of options that will be pased to parse_io.
# File lib/request_log_analyzer/source/log_parser.rb 157 def parse_string(string, options = {}, &block) 158 parse_io(StringIO.new(string), options, &block) 159 end
Add a block to this method to install a progress handler while parsing.
proc
-
The proc that will be called to handle progress update messages
# File lib/request_log_analyzer/source/log_parser.rb 231 def progress=(proc) 232 @dormant_progress_handler = proc 233 end
Add a block to this method to install a source change handler while parsing,
proc
-
The proc that will be called to handle source changes
# File lib/request_log_analyzer/source/log_parser.rb 243 def source_changes=(proc) 244 @source_changes_handler = proc 245 end
This method is called by the parser if it encounteres any parsing problems. It will call the installed warning handler if any.
By default, RequestLogAnalyzer::Controller
will install a warning handler that will pass the warnings to each aggregator so they can do something useful with it.
type
-
The warning type (a Symbol)
message
-
A message explaining the warning
# File lib/request_log_analyzer/source/log_parser.rb 256 def warn(type, message) 257 @warnings += 1 258 @warning_handler.call(type, message, @current_lineno) if @warning_handler 259 end
Add a block to this method to install a warning handler while parsing,
proc
-
The proc that will be called to handle parse warning messages
# File lib/request_log_analyzer/source/log_parser.rb 237 def warning=(proc) 238 @warning_handler = proc 239 end
Protected Instance Methods
Checks whether a given line hash is an alternative header line according to the current file format.
hash
-
A hash of data that was parsed from the line.
# File lib/request_log_analyzer/source/log_parser.rb 338 def alternative_header_line?(hash) 339 hash[:line_definition].header == :alternative 340 end
Handles the parsed request by sending it into the pipeline.
-
It will call
RequestLogAnalyzer::Request#validate
on the request instance -
It will send the request into the pipeline, checking whether it was accepted by all the filters.
-
It will update the
parsed_requests
andskipped_requests
variables accordingly
request
-
The parsed request instance (
RequestLogAnalyzer::Request
)
# File lib/request_log_analyzer/source/log_parser.rb 329 def handle_request(request, &_block) # :yields: :request, request 330 @parsed_requests += 1 331 request.validate 332 accepted = block_given? ? yield(request) : true 333 @skipped_requests += 1 unless accepted 334 end
Checks whether a given line hash is a header line according to the current file format.
hash
-
A hash of data that was parsed from the line.
# File lib/request_log_analyzer/source/log_parser.rb 344 def header_line?(hash) 345 hash[:line_definition].header == true 346 end
Combines the different lines of a request into a single Request
object. It will start a new request when a header line is encountered en will emit the request when a footer line is encountered.
Combining the lines is done using heuristics. Problems can occur in this process. The current parse strategy defines how these cases are handled.
When using the ‘assume-correct’ parse strategy (default):
-
Every line that is parsed before a header line is ignored as it cannot be included in any request. It will emit a :no_current_request warning.
-
If a header line is found before the previous requests was closed, the previous request will be yielded and a new request will be started.
When using the ‘cautious’ parse strategy:
-
Every line that is parsed before a header line is ignored as it cannot be included in any request. It will emit a :no_current_request warning.
-
A header line that is parsed before a request is closed by a footer line, is a sign of an unproperly ordered file. All data that is gathered for the request until then is discarded and the next request is ignored as well. An :unclosed_request warning is emitted.
request_data
-
A hash of data that was parsed from the last line.
# File lib/request_log_analyzer/source/log_parser.rb 285 def update_current_request(request_data, &block) # :yields: request 286 if alternative_header_line?(request_data) 287 if @current_request 288 @current_request << request_data 289 else 290 @current_request = @file_format.request(request_data) 291 end 292 elsif header_line?(request_data) 293 if @current_request 294 case options[:parse_strategy] 295 when 'assume-correct' 296 handle_request(@current_request, &block) 297 @current_request = @file_format.request(request_data) 298 when 'cautious' 299 @skipped_lines += 1 300 warn(:unclosed_request, "Encountered header line (#{request_data[:line_definition].name.inspect}), but previous request was not closed!") 301 @current_request = nil # remove all data that was parsed, skip next request as well. 302 end 303 elsif footer_line?(request_data) 304 handle_request(@file_format.request(request_data), &block) 305 else 306 @current_request = @file_format.request(request_data) 307 end 308 else 309 if @current_request 310 @current_request << request_data 311 if footer_line?(request_data) 312 handle_request(@current_request, &block) # yield @current_request 313 @current_request = nil 314 end 315 else 316 @skipped_lines += 1 317 warn(:no_current_request, "Parseable line (#{request_data[:line_definition].name.inspect}) found outside of a request!") 318 end 319 end 320 end