class Traject::DelimitedWriter

A simple line writer that uses configuration to determine how to produce a tab-delimited file

Appropos settings:

If `delimited_writer.escape` is not set, the writer will automatically escape delimiters/internal_delimiters in the following way:

* If the delimiter is a tab, replace tabs in values with a single space
* If the delimiter is anything else, prefix it with a backslash

Attributes

delimiter[R]
edelim[R]
eidelim[R]
header[RW]
internal_delimiter[R]

Public Class Methods

new(settings) click to toggle source
Calls superclass method Traject::LineWriter::new
# File lib/traject/delimited_writer.rb, line 29
def initialize(settings)
  super

  # fields to output

  begin
    @fields = settings['delimited_writer.fields'].split(",")
  rescue NoMethodError => e
  end

  if e or @fields.empty?
    raise ArgumentError.new("#{self.class.name} must have a comma-delimited list of field names to output set in setting 'delimited_writer.fields'")
  end

  self.delimiter = settings['delimited_writer.delimiter'] || "\t"
  self.internal_delimiter = settings['delimited_writer.internal_delimiter'] || '|'
  self.header = settings['delimited_writer.header'].to_s != 'false'

  # Output the header if need be
  write_header if @header
end

Public Instance Methods

_write(data) click to toggle source
# File lib/traject/delimited_writer.rb, line 74
def _write(data)
  output_file.puts(data.join(delimiter))
end
delimiter=(d) click to toggle source
# File lib/traject/delimited_writer.rb, line 56
def delimiter=(d)
  @delimiter = d
  @edelim = escaped_delimiter(d)
  self
end
escape(x) click to toggle source

Escape the delimiters in whatever way has been defined

# File lib/traject/delimited_writer.rb, line 84
def escape(x)
  x = x.to_s
  x.gsub! @delimiter, @edelim if @delimiter
  x.gsub! @internal_delimiter, @eidelim
  x
end
escaped_delimiter(d) click to toggle source
# File lib/traject/delimited_writer.rb, line 51
def escaped_delimiter(d)
  return nil if d.nil?
  d == "\t" ? ' ' : '\\' + d
end
internal_delimiter=(d) click to toggle source
# File lib/traject/delimited_writer.rb, line 62
def internal_delimiter=(d)
  @internal_delimiter = d
  @eidelim =  escaped_delimiter(d)
end
output_values(raw) click to toggle source

Derive actual output field values from the raw values

# File lib/traject/delimited_writer.rb, line 93
def output_values(raw)
  raw.map do |x|
    if x.is_a? Array
      x.map!{|s| escape(s)}
      x.join(@internal_delimiter)
    else
      escape(x)
    end
  end
end
raw_output_values(context) click to toggle source

Get the output values out of the context

# File lib/traject/delimited_writer.rb, line 79
def raw_output_values(context)
  context.output_hash.values_at(*@fields)
end
serialize(context) click to toggle source

Spit out the escaped values joined by the delimiter

# File lib/traject/delimited_writer.rb, line 105
def serialize(context)
  output_values(raw_output_values(context))
end
write_header() click to toggle source
# File lib/traject/delimited_writer.rb, line 70
def write_header
  _write(@fields)
end