class RequestLogAnalyzer::FileFormat::Apache
The Apache
file format is able to log Apache
access.log files.
The access.log can be configured in Apache
to have many different formats. In theory, this FileFormat
can handle any format, but it must be aware of the log formatting that is used by sending the formatting string as parameter to the create method, e.g.:
RequestLogAnalyzer::FileFormat::Apache.create('%h %l %u %t "%r" %>s %b')
It also supports the predefined Apache
log formats “common” and “combined”. The line definition and the report definition will be constructed using this file format string. From the command line, you can provide the format string using the --apache-format
command line option.
Constants
- APACHE_TIMESTAMP
I have encountered two timestamp types, with timezone and without. Parse both.
- LOG_DIRECTIVES
A hash that defines how the log format directives should be parsed.
- LOG_FORMAT_DEFAULTS
A hash of predefined
Apache
log formats
Public Class Methods
Creates the access log line definition based on the Apache
log format string
# File lib/request_log_analyzer/file_format/apache.rb 66 def self.access_line_definition(format_string) 67 format_string ||= :common 68 format_string = LOG_FORMAT_DEFAULTS[format_string.to_sym] || format_string 69 70 line_regexp = '' 71 captures = [] 72 format_string.scan(/([^%]*)(?:%(?:\{([^\}]+)\})?>?([A-Za-z%]))?/) do |literal, arg, variable| 73 74 line_regexp << Regexp.quote(literal) # Make sure to parse the literal before the directive 75 76 if variable 77 # Check if we recognize the log directive 78 directive = LOG_DIRECTIVES[variable][arg] rescue nil 79 80 if directive 81 line_regexp << directive[:regexp] # Parse the value of the directive 82 captures += directive[:captures] # Add the directive's information to the captures 83 else 84 puts "Apache log directive %#{arg}#{variable} is not yet supported by RLA, the field will be ignored." 85 line_regexp << '.*' # Just accept any input for this literal 86 end 87 end 88 end 89 90 # Return a new line definition object 91 RequestLogAnalyzer::LineDefinition.new(:access, regexp: Regexp.new(line_regexp), 92 captures: captures, header: true, footer: true) 93 end
Creates the Apache
log format language based on a Apache
log format string. It will set up the line definition and the report trackers according to the Apache
access log format, which should be passed as first argument. By default, is uses the ‘combined’ log format.
# File lib/request_log_analyzer/file_format/apache.rb 59 def self.create(*args) 60 access_line = access_line_definition(args.first) 61 trackers = report_trackers(access_line) + report_definer.trackers 62 new(line_definer.line_definitions.merge(access: access_line), trackers) 63 end
Sets up the report trackers according to the fields captured by the access line definition.
# File lib/request_log_analyzer/file_format/apache.rb 96 def self.report_trackers(line_definition) 97 analyze = RequestLogAnalyzer::Aggregator::Summarizer::Definer.new 98 99 analyze.timespan if line_definition.captures?(:timestamp) 100 analyze.hourly_spread if line_definition.captures?(:timestamp) 101 102 analyze.frequency category: :http_method, title: 'HTTP methods' if line_definition.captures?(:http_method) 103 analyze.frequency category: :http_status, title: 'HTTP statuses' if line_definition.captures?(:http_status) 104 analyze.frequency category: lambda { |r| r.category }, title: 'Most popular URIs' if line_definition.captures?(:path) 105 106 analyze.frequency category: :user_agent, title: 'User agents' if line_definition.captures?(:user_agent) 107 analyze.frequency category: :referer, title: 'Referers' if line_definition.captures?(:referer) 108 109 analyze.duration duration: :duration, category: lambda { |r| r.category }, title: 'Request duration' if line_definition.captures?(:duration) 110 analyze.traffic traffic: :bytes_sent, category: lambda { |r| r.category }, title: 'Traffic' if line_definition.captures?(:bytes_sent) 111 112 analyze.trackers 113 end