class Traject::MarcExtractor::Spec

Constants

CONTROLFIELD_PATTERN
DATAFIELD_PATTERN

Converts from a string marc spec like “008:245abc:700a” to a hash used internally to represent the specification. See comments at head of class for documentation of string specification format.

## Return value

The hash returned is keyed by tag, and has as values an array of 0 or or more MarcExtractor::Spec objects representing the specified extraction operations for that tag.

It's an array of possibly more than one, because you can specify multiple extractions on the same tag: for instance “245a:245abc”

See tests for more examples.

Attributes

byte1[R]
byte2[R]
bytes[R]
indicator1[R]
indicator2[R]
subfields[RW]
tag[RW]

Public Class Methods

create_controlfield_spec(tag, byte1, byte2) click to toggle source

Create a new controlfield spec

# File lib/traject/marc_extractor_spec.rb, line 218
def self.create_controlfield_spec(tag, byte1, byte2)
  spec = Spec.new(:tag => tag.freeze)
  spec.set_bytes(byte1.freeze, byte2.freeze)
  spec
end
create_datafield_spec(tag, ind1, ind2, subfields) click to toggle source

Create a new datafield spec. Most of the logic about how to deal with special characters is built into the Spec class.

# File lib/traject/marc_extractor_spec.rb, line 204
def self.create_datafield_spec(tag, ind1, ind2, subfields)
  spec            = Spec.new(:tag => tag)
  spec.indicator1 = ind1.freeze
  spec.indicator2 = ind2.freeze

  if subfields and !subfields.empty?
    spec.subfields = subfields.split('')
  end

  spec

end
hash_from_string(spec_string) click to toggle source
# File lib/traject/marc_extractor_spec.rb, line 168
def self.hash_from_string(spec_string)
  # hash defaults to []
  hash         = Hash.new

  # Split the string(s) given on colon
  spec_strings = spec_string.is_a?(Array) ? spec_string.map { |s| s.split(/\s*:\s*/) }.flatten : spec_string.split(/\s*:\s*/)

  spec_strings.each do |part|
    if m = DATAFIELD_PATTERN.match(part)

      tag, ind1, ind2, subfields = m[1], m[3], m[4], m[5]

      spec = create_datafield_spec(tag, ind1, ind2, subfields)

      hash[spec.tag] ||= []
      hash[spec.tag] << spec

    elsif m = CONTROLFIELD_PATTERN.match(part)
      tag, byte1, byte2 = m[1], m[3], m[5]

      spec = create_controlfield_spec(tag, byte1, byte2)

      hash[spec.tag] ||= []
      hash[spec.tag] << spec
    else
      raise ArgumentError.new("Unrecognized marc extract specification: #{part}")
    end
  end

  return hash
end
new(hash = nil) click to toggle source

Allow use of a hash to initialize. Should ditch this and use optional keyword args once folks move to 2.x syntax

# File lib/traject/marc_extractor_spec.rb, line 77
def initialize(hash = nil)
  if hash
    hash.each_pair do |key, value|
      self.send("#{key}=", value)
    end
  end
end

Public Instance Methods

==(spec) click to toggle source

Simple equality definition

# File lib/traject/marc_extractor_spec.rb, line 138
def ==(spec)
  return false unless spec.kind_of?(Spec)

  return (self.tag == spec.tag) &&
      (self.subfields == spec.subfields) &&
      (self.indicator1 == spec.indicator1) &&
      (self.indicator2 == spec.indicator2) &&
      (self.bytes == spec.bytes)
end
byte1=(byte1) click to toggle source
# File lib/traject/marc_extractor_spec.rb, line 104
def byte1=(byte1)
  @byte1 = byte1.to_i if byte1
  set_bytes(@byte1, @byte2)
end
byte2=(byte2) click to toggle source
# File lib/traject/marc_extractor_spec.rb, line 109
def byte2=(byte2)
  @byte2 = byte2.to_i if byte2
  set_bytes(@byte1, @byte2)
end
includes_subfield_code?(code) click to toggle source

Pass in a string subfield code like 'a'; does this spec include it?

# File lib/traject/marc_extractor_spec.rb, line 132
def includes_subfield_code?(code)
  # subfields nil means include them all
  self.subfields.nil? || self.subfields.include?(code)
end
indicator1=(ind1) click to toggle source
# File lib/traject/marc_extractor_spec.rb, line 96
def indicator1=(ind1)
  ind1 == '*' ? @indicator1 = nil : @indicator1 = ind1.freeze
end
indicator2=(ind2) click to toggle source
# File lib/traject/marc_extractor_spec.rb, line 100
def indicator2=(ind2)
  ind2 == '*' ? @indicator2 = nil : @indicator2 = ind2.freeze
end
joinable?() click to toggle source
Should subfields extracted by joined, if we have a seperator?
* '630' no subfields specified => join all subfields
* '630abc' multiple subfields specified = join all subfields
* '633a' one subfield => do not join, return one value for each $a in the field
* '633aa' one subfield, doubled => do join after all, will return a single string joining all the values of all the $a's.

Last case is handled implicitly at the moment when subfields == ['a', 'a']

# File lib/traject/marc_extractor_spec.rb, line 92
def joinable?
  (self.subfields.nil? || self.subfields.size != 1)
end
matches_indicators?(field) click to toggle source

Pass in a MARC field, do it's indicators match indicators in this spec? nil indicators in spec mean we don't care, everything matches.

# File lib/traject/marc_extractor_spec.rb, line 125
def matches_indicators?(field)
  return (indicator1.nil? || indicator1 == field.indicator1) &&
      (indicator2.nil? || indicator2 == field.indicator2)
end
set_bytes(byte1, byte2) click to toggle source
# File lib/traject/marc_extractor_spec.rb, line 114
def set_bytes(byte1, byte2)
  if byte1 && byte2
    @bytes = ((byte1.to_i)..(byte2.to_i))
  elsif byte1
    @bytes = byte1.to_i
  end
end