class CSVDiff

This library performs diffs of flat file content that contains structured data in fields, with rows provided in a parent-child format.

Parent-child data does not lend itself well to standard text diffs, as small changes in the organisation of the tree at an upper level (e.g. re-ordering of two ancestor nodes) can lead to big movements in the position of descendant records - particularly when the parent-child data is generated by a hierarchy traversal.

Additionally, simple line-based diffs can identify that a line has changed, but not which field(s) in the line have changed.

Data may be supplied in the form of CSV files, or as an array of arrays. The diff process process provides a fine level of control over what to diff, and can optionally ignore certain types of changes (e.g. changes in order).

Attributes

child_fields[R]

@return [Array<String>] An array of field names for the child field(s).

diff_fields[R]

@return [Array<String>] An array of field names that are compared in the

diff process.
diffs[R]

@return [Array<Hash>] An array of differences

from[R]

@return [CSVSource] CSVSource object containing details of the left/from

input.
key_fields[R]

@return [Array<String>] An array of field namees of the key fields that

uniquely identify each row.
left[R]

@return [CSVSource] CSVSource object containing details of the left/from

input.
options[R]

@return [Hash] The options hash used for the diff.

parent_fields[R]

@return [Array<String>] An array of field names for the parent field(s).

right[R]

@return [CSVSource] CSVSource object containing details of the right/to

input.
to[R]

@return [CSVSource] CSVSource object containing details of the right/to

input.

Public Class Methods

new(left, right, options = {}) click to toggle source

Generates a diff between two hierarchical tree structures, provided as left and right, each of which consists of an array of lines in CSV format. An array of field indexes can also be specified as key_fields; a minimum of one field index must be specified; the last index is the child id, and the remaining fields (if any) are the parent field(s) that uniquely qualify the child instance.

@param left [Array|String|CSVSource] An Array of lines, each of which is in

an Array of fields, or a String specifying a path to a CSV file, or a
CSVSource object.

@param right [Array|String|CSVSource] An Array of lines, each of which is

an Array of fields, or a String specifying a path to a CSV file, or a
CSVSource object.

@param options [Hash] A hash containing options. @option options [String] :encoding The encoding to use when opening the

CSV files.

@option options [Array<String>] :field_names An Array of field names for

each field in +left+ and +right+. If not provided, the first row is
assumed to contain field names.

@option options [Boolean] :ignore_header If true, the first line of each

file is ignored. This option can only be true if :field_names is
specified.

@options options [Array] :ignore_fields The names of any fields to be

ignored when performing the diff.

@option options [String] :key_field The name of the field that uniquely

identifies each row.

@option options [Array<String>] :key_fields The names of the fields

that uniquely identifies each row.

@option options [String] :parent_field The name of the field that

identifies a parent within which sibling order should be checked.

@option options [String] :child_field The name of the field that

uniquely identifies a child of a parent.

@option options [Boolean] :ignore_adds If true, records that appear in

the right/to file but not in the left/from file are not reported.

@option options [Boolean] :ignore_updates If true, records that have been

updated are not reported.

@option options [Boolean] :ignore_moves If true, changes in row position

amongst sibling rows are not reported.

@option options [Boolean] :ignore_deletes If true, records that appear

in the left/from file but not in the right/to file are not reported.
# File lib/csv-diff/csv_diff.rb, line 83
def initialize(left, right, options = {})
    @left = left.is_a?(Source) ? left : CSVSource.new(left, options)
    @left.index_source if @left.lines.nil?
    raise "No field names found in left (from) source" unless @left.field_names && @left.field_names.size > 0
    @right = right.is_a?(Source) ? right : CSVSource.new(right, options)
    @right.index_source if @right.lines.nil?
    raise "No field names found in right (to) source" unless @right.field_names && @right.field_names.size > 0
    @warnings = []
    @diff_fields = get_diff_fields(@left.field_names, @right.field_names, options)
    @key_fields = @left.key_fields
    diff(options)
end

Public Instance Methods

diff(options = {}) click to toggle source

Performs a diff with the specified options.

# File lib/csv-diff/csv_diff.rb, line 98
def diff(options = {})
    @summary = nil
    @options = options
    @diffs = diff_sources(@left, @right, @key_fields, @diff_fields, options)
end
diff_warnings() click to toggle source

@return [Array<String>] an array of warning messages from the diff process.

# File lib/csv-diff/csv_diff.rb, line 132
def diff_warnings
    @warnings
end
summary() click to toggle source

Returns a summary of the number of adds, deletes, moves, and updates.

# File lib/csv-diff/csv_diff.rb, line 106
def summary
    unless @summary
        @summary = Hash.new{ |h, k| h[k] = 0 }
        @diffs.each{ |k, v| @summary[v[:action]] += 1 }
        @summary['Warning'] = warnings.size if warnings.size > 0
    end
    @summary
end
warnings() click to toggle source

@return [Array<String>] an array of warning messages generated from the

sources and the diff process.
# File lib/csv-diff/csv_diff.rb, line 126
def warnings
    @left.warnings + @right.warnings + @warnings
end

Private Instance Methods

get_diff_fields(left_fields, right_fields, options) click to toggle source

Given two sets of field names, determines the common set of fields present in both, on which members can be diffed.

# File lib/csv-diff/csv_diff.rb, line 142
def get_diff_fields(left_fields, right_fields, options)
    ignore_fields = options.fetch(:ignore_fields, [])
    ignore_fields = [ignore_fields] unless ignore_fields.is_a?(Array)
    ignore_fields.map! do |f|
        (f.is_a?(Numeric) ? right_fields[f] : f).upcase
    end
    diff_fields = []
    if options[:diff_common_fields_only]
        right_fields.each_with_index do |fld, i|
            if left_fields.include?(fld)
                diff_fields << fld unless ignore_fields.include?(fld.upcase)
            end
        end
    else
        diff_fields = (right_fields + left_fields).uniq.reject{ |fld| ignore_fields.include?(fld.upcase) }
    end
    diff_fields
end