class Traject::DebugWriter
The Traject::DebugWriter
produces a simple, human-readable output format that's also amenable to simple computer processing (e.g., with a simple grep). It's the output format used when you pass the –debug-mode switch to traject on the command line.
Output format is three columns: id, output field, values (multiple values seperated by '|'), and looks something like:
000001580 edition [1st ed.] 000001580 format Book | Online | Print 000001580 geo Great Britain 000001580 id 000001580 000001580 isbn 0631126902
## Settings
* 'output_file' -- the name of the file to output to (command line -o shortcut). * 'output_stream' -- alternately, the IO stream * 'debug_writer.idfield' -- the solr field from which to pull the record ID (default: 'id') * 'debug_writer.format' -- How to format the id/solr field/values (default: '%-12s %-25s %s')
By default, with neither output_file nor output_stream provided, writes to stdout, which can be useful for debugging diagnosis.
## Example configuration file
require 'traject/debug_writer' settings do provide "writer_class_name", "Traject::DebugWriter" provide "output_file", "out.txt" end
Constants
- DEFAULT_FORMAT
- DEFAULT_IDFIELD
Public Class Methods
new(*)
click to toggle source
Calls superclass method
Traject::LineWriter::new
# File lib/traject/debug_writer.rb, line 38 def initialize(*) super @idfield = settings["debug_writer.idfield"] || DEFAULT_IDFIELD @format = settings['debug_writer.format'] || DEFAULT_FORMAT @use_position = (@idfield == 'record_position') @already_threw_warning_about_missing_id = false end
Public Instance Methods
record_number(context)
click to toggle source
# File lib/traject/debug_writer.rb, line 48 def record_number(context) return context.position if @use_position if context.output_hash.has_key?(@idfield) context.output_hash[@idfield].first else unless @already_threw_warning_about_missing_id context.logger.warn "At least one record (#{context.record_inspect}) doesn't define field '#{@idfield}'. All records are assumed to have a unique id. You can set which field to look in via the setting 'debug_writer.idfield'" @already_threw_warning_about_missing_id = true end "record_num_#{context.position}" end end
serialize(context)
click to toggle source
# File lib/traject/debug_writer.rb, line 62 def serialize(context) h = context.output_hash rec_key = record_number(context) lines = h.keys.sort.map { |k| @format % [rec_key, k, (h[k] || []).join(' | ')] } lines.push "\n" lines.join("\n") end