class Traject::HorizonBibAuthMerge
Merges 'bib text' and 'auth text' lines from Horizon, using bib text as template when neccesary.
merged_str = HorizonBibAuthMerge.new(tag, bib_text_str, auth_text_str).merge!
Strings passed in may be mutated for efficiency. So you can only call merge! once, it's just utility.
Attributes
authtext[R]
bibtext[R]
tag[R]
Public Class Methods
new(tag, bibtext, authtext)
click to toggle source
Pass in bibtext and authtext as String – you probably need to get column values from JDBC as bytes and then use String.from_java_bytes to avoid messing up possible Marc8 encoding.
bibtext is either text or longtext column from fullbib, preferring longtext. authtext is either xref_text or xref_longtext from fullbib, preferring xref_longtext.
# File lib/traject/horizon_bib_auth_merge.rb, line 22 def initialize(tag, bibtext, authtext) @merged = false @tag = tag @bibtext = bibtext @authtext = authtext # remove terminal MARC Field Terminator if present. @bibtext.chomp!("\x1E") if @bibtext @authtext.chomp!("\x1E") if @authtext end
Public Instance Methods
merge!()
click to toggle source
Returns merged string, composed of a marc 'field', with subfields seperated by seperator control chars. Does not include terminal MARC Field Seperator.
Will mutate bibtext and authtext for efficiency.
# File lib/traject/horizon_bib_auth_merge.rb, line 39 def merge! raise Exception.new("Can only call `merge!` once, already called.") if @merged @merged = true # just one? (Or neither?) Just return it. return authtext if bibtext.nil? return bibtext if authtext.nil? # For 240 and 243, it seems that anything before the first $t should # be ignored in authtext template -- we need to actually remove it, # so later when we append any leftover fields, we don't get those. if tag == '240' || tag == '243' authtext.sub!(@@up_to_subfield_t_re, "\x1Ft") end # We need to do a crazy combination of template in text with values in authtext. # horizon, you so crazy. text template is like: #"\x1Fa.\x1Fp ;\x1Fv81." # which means each subfield after the \x1F, merge in # the subfield value from the auth record if it's present, # otherwise don't. # # plus some weird as hell stuff with punctuation and spaces, I can't # even explain it, just trial and error'd it comparing to marcout. bibtext.gsub!(/\x1F([^\x1F\x1E])( ?)([[:punct:] ]*)/) do subfield = $1 space = $2 maybe_punct = $3 # okay this is crazy hacky reverse engineering, I don't really # know what's going on but for 240 and 243, 'a' in template # is filled by 't' in auth tag. auth_subfield = if subfield == "a" && (tag == "240" || tag == "243") "t" else subfield end # Find substitute fill-in value from authtext, if it can # be found -- first subfield indicated. Then we REMOVE # it from authtext, so next time this subfield is asked for, # subsequent subfield with that code will be used. substitute = nil authtext.sub!(/\x1F#{Regexp.escape auth_subfield}([^\x1F\x1E]*)/) do substitute = $1 '' end if substitute # Dealing with punctuation is REALLY CONFUSING -- reverse engineering # HIP/Horizon, which does WEIRD THINGS. # But we seem to have arrived at something that appears to match all cases # we can find of what HIP/Horizon does. # # If the auth value already ends up with the same punctuation from the template, # _leave it alone_ -- including preserving all spaces near the punct in the auth # value. # # Otherwise, remove all punct from the auth value, then add in the punct from the template, # along with any spaces before the punct in the template. if maybe_punct && maybe_punct.length > 0 # remove all punctuation from end of auth value? to use punct from template instead? # But preserve initial spaces from template? Unless it already ends # with the punctuation, in which case don't touch it, to avoid # messing up spaces? WEIRD, yeah. unless substitute.end_with? maybe_punct substitute.gsub!(/[[:punct:]]+\Z/, "") # This adding the #{space} back in, is consistent with what HIP does. # I have no idea if it's right or a bug in HIP, but being consistent. # neither leaving it in nor taking it out is exactly consistent with HznExportMarc, # which seems to have bugs. substitute << "#{space}#{maybe_punct}" end end "\x1F#{subfield}#{substitute}" else # just keep original, which has no maybe_punct "\x1F#{subfield}" end end # Sometimes there's leftover text at the end of authtext that wasn't # included in the bibtext template. Horizon's marc reconstruction # seems to just include this on the end, we will too. # Relies on 'prior to $t' fields being removed from 240 and 243 earlier, # to avoid including them when we shouldn't. if authtext.length > 0 bibtext << authtext end # We mutated bibtext to fill in template, now just return it. return bibtext end