module StdNum::LCCN

Validate and and normalize LCCNs

Public Class Methods

normalize(rawlccn) click to toggle source

Normalize based on data at www.loc.gov/marc/lccn-namespace.html#syntax @param [String] rawlccn The possible LCCN to normalize @return [String, nil] the normalized LCCN, or nil if it looks malformed

# File lib/library_stdnums.rb, line 282
def self.normalize rawlccn
  lccn = reduce_to_basic(rawlccn)
  # If there's a dash in it, deal with that.
  if lccn =~ /^(.*?)\-(.+)/
    pre =  $1
    post = $2
    return nil unless post =~ /^\d+$/ # must be all digits
    lccn = "%s%06d" % [pre, post.to_i]
  end

  if valid?(lccn, true)
    return lccn
  else
    return nil
  end
end
reduce_to_basic(str) click to toggle source

Get a string ready for processing as an LCCN @param [String] str The possible lccn @return [String] The munged string, ready for normalization

# File lib/library_stdnums.rb, line 272
def self.reduce_to_basic str
  rv = str.gsub(/\s/, '')  # ditch spaces
  rv.gsub!('http://lccn.loc.gov/', '') # remove URI prefix
  rv.gsub!(/\/.*$/, '') # ditch everything after the first '/' (including the slash)
  return rv
end
valid?(lccn, preprocessed = false) click to toggle source

The rules for validity according to www.loc.gov/marc/lccn-namespace.html#syntax:

A normalized LCCN is a character string eight to twelve characters in length. (For purposes of this description characters are ordered from left to right – “first” means “leftmost”.) The rightmost eight characters are always digits. If the length is 9, then the first character must be alphabetic. If the length is 10, then the first two characters must be either both digits or both alphabetic. If the length is 11, then the first character must be alphabetic and the next two characters must be either both digits or both alphabetic. If the length is 12, then the first two characters must be alphabetic and the remaining characters digits.

@param [String] lccn The lccn to attempt to validate @param [Boolean] preprocessed Set to true if the number has already been normalized @return [Boolean] Whether or not the syntax seems ok

# File lib/library_stdnums.rb, line 312
def self.valid? lccn, preprocessed = false
  lccn = normalize(lccn) unless preprocessed
  return false unless lccn
  clean = lccn.gsub(/\-/, '')
  suffix = clean[-8..-1] # "the rightmost eight characters are always digits"
  return false unless suffix and suffix =~ /^\d+$/
  case clean.size # "...is a character string eight to twelve digits in length"
  when 8
    return true
  when 9
    return true if clean =~ /^[A-Za-z]/
  when 10
    return true if clean =~ /^\d{2}/ or clean =~ /^[A-Za-z]{2}/
  when 11
    return true if clean =~ /^[A-Za-z](\d{2}|[A-Za-z]{2})/
  when 12
    return true if clean =~ /^[A-Za-z]{2}\d{2}/
  else
    return false
  end

  return false
end