module StdNum::LCCN
Validate and and normalize LCCNs
Public Class Methods
Normalize based on data at www.loc.gov/marc/lccn-namespace.html#syntax @param [String] rawlccn The possible LCCN
to normalize @return [String, nil] the normalized LCCN
, or nil if it looks malformed
# File lib/library_stdnums.rb, line 282 def self.normalize rawlccn lccn = reduce_to_basic(rawlccn) # If there's a dash in it, deal with that. if lccn =~ /^(.*?)\-(.+)/ pre = $1 post = $2 return nil unless post =~ /^\d+$/ # must be all digits lccn = "%s%06d" % [pre, post.to_i] end if valid?(lccn, true) return lccn else return nil end end
Get a string ready for processing as an LCCN
@param [String] str The possible lccn @return [String] The munged string, ready for normalization
# File lib/library_stdnums.rb, line 272 def self.reduce_to_basic str rv = str.gsub(/\s/, '') # ditch spaces rv.gsub!('http://lccn.loc.gov/', '') # remove URI prefix rv.gsub!(/\/.*$/, '') # ditch everything after the first '/' (including the slash) return rv end
The rules for validity according to www.loc.gov/marc/lccn-namespace.html#syntax:
A normalized LCCN
is a character string eight to twelve characters in length. (For purposes of this description characters are ordered from left to right – “first” means “leftmost”.) The rightmost eight characters are always digits. If the length is 9, then the first character must be alphabetic. If the length is 10, then the first two characters must be either both digits or both alphabetic. If the length is 11, then the first character must be alphabetic and the next two characters must be either both digits or both alphabetic. If the length is 12, then the first two characters must be alphabetic and the remaining characters digits.
@param [String] lccn The lccn to attempt to validate @param [Boolean] preprocessed Set to true if the number has already been normalized @return [Boolean] Whether or not the syntax seems ok
# File lib/library_stdnums.rb, line 312 def self.valid? lccn, preprocessed = false lccn = normalize(lccn) unless preprocessed return false unless lccn clean = lccn.gsub(/\-/, '') suffix = clean[-8..-1] # "the rightmost eight characters are always digits" return false unless suffix and suffix =~ /^\d+$/ case clean.size # "...is a character string eight to twelve digits in length" when 8 return true when 9 return true if clean =~ /^[A-Za-z]/ when 10 return true if clean =~ /^\d{2}/ or clean =~ /^[A-Za-z]{2}/ when 11 return true if clean =~ /^[A-Za-z](\d{2}|[A-Za-z]{2})/ when 12 return true if clean =~ /^[A-Za-z]{2}\d{2}/ else return false end return false end