class CSVDiff
This library performs diffs of flat file content that contains structured data in fields, with rows provided in a parent-child format.
Parent-child data does not lend itself well to standard text diffs, as small changes in the organisation of the tree at an upper level (e.g. re-ordering of two ancestor nodes) can lead to big movements in the position of descendant records - particularly when the parent-child data is generated by a hierarchy traversal.
Additionally, simple line-based diffs can identify that a line has changed, but not which field(s) in the line have changed.
Data may be supplied in the form of CSV files, or as an array of arrays. The diff process process provides a fine level of control over what to diff, and can optionally ignore certain types of changes (e.g. changes in order).
Attributes
@return [Array<String>] An array of field names for the child field(s).
@return [Array<String>] An array of field names that are compared in the
diff process.
@return [Array<Hash>] An array of differences
@return [CSVSource] CSVSource
object containing details of the left/from
input.
@return [Array<String>] An array of field namees of the key fields that
uniquely identify each row.
@return [CSVSource] CSVSource
object containing details of the left/from
input.
@return [Hash] The options hash used for the diff.
@return [Array<String>] An array of field names for the parent field(s).
@return [CSVSource] CSVSource
object containing details of the right/to
input.
@return [CSVSource] CSVSource
object containing details of the right/to
input.
Public Class Methods
Generates a diff between two hierarchical tree structures, provided as left
and right
, each of which consists of an array of lines in CSV format. An array of field indexes can also be specified as key_fields
; a minimum of one field index must be specified; the last index is the child id, and the remaining fields (if any) are the parent field(s) that uniquely qualify the child instance.
@param left [Array|String|CSVSource] An Array of lines, each of which is in
an Array of fields, or a String specifying a path to a CSV file, or a CSVSource object.
@param right [Array|String|CSVSource] An Array of lines, each of which is
an Array of fields, or a String specifying a path to a CSV file, or a CSVSource object.
@param options [Hash] A hash containing options. @option options [String] :encoding The encoding to use when opening the
CSV files.
@option options [Array<String>] :field_names An Array of field names for
each field in +left+ and +right+. If not provided, the first row is assumed to contain field names.
@option options [Boolean] :ignore_header If true, the first line of each
file is ignored. This option can only be true if :field_names is specified.
@options options [Array] :ignore_fields The names of any fields to be
ignored when performing the diff.
@option options [String] :key_field The name of the field that uniquely
identifies each row.
@option options [Array<String>] :key_fields The names of the fields
that uniquely identifies each row.
@option options [String] :parent_field The name of the field that
identifies a parent within which sibling order should be checked.
@option options [String] :child_field The name of the field that
uniquely identifies a child of a parent.
@option options [Boolean] :ignore_adds If true, records that appear in
the right/to file but not in the left/from file are not reported.
@option options [Boolean] :ignore_updates If true, records that have been
updated are not reported.
@option options [Boolean] :ignore_moves If true, changes in row position
amongst sibling rows are not reported.
@option options [Boolean] :ignore_deletes If true, records that appear
in the left/from file but not in the right/to file are not reported.
# File lib/csv-diff/csv_diff.rb, line 83 def initialize(left, right, options = {}) @left = left.is_a?(Source) ? left : CSVSource.new(left, options) @left.index_source if @left.lines.nil? raise "No field names found in left (from) source" unless @left.field_names && @left.field_names.size > 0 @right = right.is_a?(Source) ? right : CSVSource.new(right, options) @right.index_source if @right.lines.nil? raise "No field names found in right (to) source" unless @right.field_names && @right.field_names.size > 0 @warnings = [] @diff_fields = get_diff_fields(@left.field_names, @right.field_names, options) @key_fields = @left.key_fields diff(options) end
Public Instance Methods
Performs a diff with the specified options
.
# File lib/csv-diff/csv_diff.rb, line 98 def diff(options = {}) @summary = nil @options = options @diffs = diff_sources(@left, @right, @key_fields, @diff_fields, options) end
@return [Array<String>] an array of warning messages from the diff process.
# File lib/csv-diff/csv_diff.rb, line 132 def diff_warnings @warnings end
Returns a summary of the number of adds, deletes, moves, and updates.
# File lib/csv-diff/csv_diff.rb, line 106 def summary unless @summary @summary = Hash.new{ |h, k| h[k] = 0 } @diffs.each{ |k, v| @summary[v[:action]] += 1 } @summary['Warning'] = warnings.size if warnings.size > 0 end @summary end
@return [Array<String>] an array of warning messages generated from the
sources and the diff process.
# File lib/csv-diff/csv_diff.rb, line 126 def warnings @left.warnings + @right.warnings + @warnings end
Private Instance Methods
Given two sets of field names, determines the common set of fields present in both, on which members can be diffed.
# File lib/csv-diff/csv_diff.rb, line 142 def get_diff_fields(left_fields, right_fields, options) ignore_fields = options.fetch(:ignore_fields, []) ignore_fields = [ignore_fields] unless ignore_fields.is_a?(Array) ignore_fields.map! do |f| (f.is_a?(Numeric) ? right_fields[f] : f).upcase end diff_fields = [] if options[:diff_common_fields_only] right_fields.each_with_index do |fld, i| if left_fields.include?(fld) diff_fields << fld unless ignore_fields.include?(fld.upcase) end end else diff_fields = (right_fields + left_fields).uniq.reject{ |fld| ignore_fields.include?(fld.upcase) } end diff_fields end