class Stockboy::Readers::CSV
Parse data from CSV
into hashes
All standard ::CSV options are respected and passed through
@see
http://www.ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html#DEFAULT_OPTIONS
Public Class Methods
new(opts={}, &block)
click to toggle source
Initialize a new CSV
reader
All stdlib ::CSV options are respected. @see ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html#method-c-new
@param [Hash] opts
Calls superclass method
Stockboy::Reader::new
# File lib/stockboy/readers/csv.rb, line 70 def initialize(opts={}, &block) super @csv_options = opts.reject {|k,v| !::CSV::DEFAULT_OPTIONS.keys.include?(k) } @csv_options[:headers] = @csv_options.fetch(:headers, true) @skip_header_rows = opts.fetch(:skip_header_rows, 0) @skip_footer_rows = opts.fetch(:skip_footer_rows, 0) DSL.new(self).instance_eval(&block) if block_given? end
Public Instance Methods
options()
click to toggle source
Hash of all CSV-specific options
@!attribute [r] options
@return [Hash]
# File lib/stockboy/readers/csv.rb, line 91 def options @csv_options end
parse(data)
click to toggle source
# File lib/stockboy/readers/csv.rb, line 79 def parse(data) chain = options[:header_converters] || [] chain << proc{ |h| h.freeze } opts = options.merge(header_converters: chain) ::CSV.parse(sanitize(data), opts).map(&:to_hash) end
Private Instance Methods
row_end_index(data, skip_rows)
click to toggle source
# File lib/stockboy/readers/csv.rb, line 127 def row_end_index(data, skip_rows) Array.new(skip_rows).inject(-1) { |i| data.rindex(/$/, i) - 1 } end
row_start_index(data, skip_rows)
click to toggle source
# File lib/stockboy/readers/csv.rb, line 123 def row_start_index(data, skip_rows) Array.new(skip_rows).inject(0) { |i| data.index(/$/, i) + 1 } end
sanitize(data)
click to toggle source
Clean incoming data based on set encoding or best information
1. Assign the given input encoding setting if available 2. Scrub invalid characters for the encoding. (Scrubbing does not apply for BINARY input, which is undefined.) 3. Encode to UTF-8 with considerations for undefined input. The main issue are control characters that are absent in UTF-8 (and ISO-8859-1) but are common printable characters in Windows-1252, so we preserve this range as a best guess. 4. Delete null bytes that are inserted as terminators by some "CSV" output 5. Delete leading/trailing garbage lines based on settings
# File lib/stockboy/readers/csv.rb, line 109 def sanitize(data) data = data.dup data.force_encoding encoding if encoding data.scrub! data.encode! Encoding::UTF_8, universal_newline: true, fallback: proc { |c| c.force_encoding(Encoding::Windows_1252) if (127..159).cover? c.ord } data.delete! 0.chr data.chomp! from = row_start_index(data, skip_header_rows) to = row_end_index(data, skip_footer_rows) data[from..to] end