class Traject::Macros::MarcFormatClassifier

A tool for classifiying MARC records according to format/form/genre/type, just using our own custom vocabulary for those things.

used by the `marc_formats` macro, but you can also use it directly for a bit more control.

Attributes

record[R]

Public Class Methods

new(marc_record) click to toggle source
# File lib/traject/macros/marc_format_classifier.rb, line 33
def initialize(marc_record)
  @record = marc_record
end

Public Instance Methods

formats(options = {}) click to toggle source

A very opinionated method that just kind of jams together all the possible format/genre/types into one array of 1 to N elements.

If no other values are present, the default value “Other” will be used.

See also individual methods which you can use you seperate into different facets or do other custom things.

# File lib/traject/macros/marc_format_classifier.rb, line 44
def formats(options = {})
  options = {:default => "Other"}.merge(options)

  formats = []

  formats.concat genre

  formats << "Manuscript/Archive" if manuscript_archive?
  formats << "Microform" if microform?
  formats << "Online"    if online?

  # In our own data, if it's an audio recording, it might show up
  # as print, but it's probably not.
  formats << "Print"     if print? && ! (formats.include?("Non-musical Recording") || formats.include?("Musical Recording"))

  # If it's a Dissertation, we decide it's NOT a book
  if thesis?
    formats.delete("Book")
    formats << "Dissertation/Thesis"
  end

  if proceeding?
    formats <<  "Conference"
  end

  if formats.empty?
    formats << options[:default]
  end

  return formats
end
genre() click to toggle source

Returns 1 or more values in an array from: Book; Journal/Newspaper; Musical Score; Map/Globe; Non-musical Recording; Musical Recording Image; Software/Data; Video/Film

Uses leader byte 6, leader byte 7, and 007 byte 0.

Gets actual labels from marc_genre_leader and marc_genre_007 translation maps, so you can customize labels if you want.

# File lib/traject/macros/marc_format_classifier.rb, line 86
def genre
  marc_genre_leader = Traject::TranslationMap.new("marc_genre_leader")
  marc_genre_007    = Traject::TranslationMap.new("marc_genre_007")

  results = marc_genre_leader[ record.leader.slice(6,2) ] ||
    marc_genre_leader[ record.leader.slice(6)] ||
    record.find_all {|f| f.tag == "007"}.collect {|f| marc_genre_007[f.value.slice(0)]}

  [results].flatten
end
manuscript_archive?() click to toggle source

Marked as manuscript OR archive.

# File lib/traject/macros/marc_format_classifier.rb, line 175
def manuscript_archive?
  leader06 = record.leader.slice(6)
  leader08 = record.leader.slice(8)

  # leader 6 t=Manuscript Language Material, d=Manuscript Music,
  # f=Manuscript Cartographic
  #
  # leader 06 = 'b' is obsolete, but if it exists it means archival countrl
  #
  # leader 08 'a'='archival control'
  %w{t d f b}.include?(leader06) || leader08 == "a"
end
microform?() click to toggle source

if field 007 byte 0 is 'h', that's microform. But many of our microform don't have that. If leader byte 6 is 'h', that's an obsolete way of saying microform. And finally, if GMD is

# File lib/traject/macros/marc_format_classifier.rb, line 168
def microform?
  normalized_gmd.start_with?("[microform]") ||
  record.leader[6] == "h" ||
  record.find {|f| (f.tag == "007") && (f.value[0] == "h")}
end
normalized_gmd() click to toggle source

downcased version of the gmd, or else empty string

# File lib/traject/macros/marc_format_classifier.rb, line 189
def normalized_gmd
  @gmd ||= begin
    ((a245 = record['245']) && a245['h'] && a245['h'].downcase) || ""
  end
end
online?() click to toggle source

We use marc 007 to determine if this represents an online resource. But sometimes resort to 245$h GMD too.

# File lib/traject/macros/marc_format_classifier.rb, line 150
def online?
  # field 007, byte 0 c="electronic" byte 1 r="remote" ==> sure Online
  found_007 = record.fields('007').find do |field|
    field.value.slice(0) == "c" && field.value.slice(1) == "r"
  end

  return true if found_007

  # Otherwise, if it has a GMD ["electronic resource"], we count it
  # as online only if NO 007[0] == 'c' exists, cause if it does we already
  # know it's electronic but not remote, otherwise first try would
  # have found it.
  return (normalized_gmd.start_with? "[electronic resource]") && ! record.find {|f| f.tag == '007' && f.value.slice(0) == "c"}
end
print?() click to toggle source

Algorithm with help from Chris Case.

  • If it has any RDA 338, then it's print if it has a value of volume, sheet, or card.

  • If it does not have an RDA 338, it's print if and only if it has no 245$h GMD.

  • Here at JH, for legacy reasons we also choose to not call it print if it's already been marked audio, but we do that in a different method.

Note that any record that has neither a 245 nor a 338rda is going to be marked print

This algorithm is definitely going to get some things wrong in both directions, with real world data. But seems to be good enough.

proceeding?() click to toggle source

Just checks all $6xx for a $v “Congresses”

# File lib/traject/macros/marc_format_classifier.rb, line 105
def proceeding?
  @proceeding_q ||= begin
    ! record.find do |field|
      field.tag.slice(0) == '6' &&
          field.subfields.find {|sf| sf.code == "v" && /^\s*(C|c)ongresses\.?\s*$/.match(sf.value) }
    end.nil?
  end
end
thesis?() click to toggle source

Just checks if it has a 502, if it does it's considered a thesis

# File lib/traject/macros/marc_format_classifier.rb, line 98
def thesis?
  @thesis_q ||= begin
    ! record.find {|a| a.tag == "502"}.nil?
  end
end