class Origami::PDF::LazyParser
Public Instance Methods
parse(stream)
click to toggle source
Calls superclass method
# File lib/origami/parsers/pdf/lazy.rb, line 32 def parse(stream) super pdf = parse_initialize revisions = [] # Locate the last xref offset at the end of the file. xref_offset = locate_last_xref_offset while xref_offset and xref_offset != 0 # Create a new revision based on the xref section offset. revision = parse_revision(pdf, xref_offset) # Locate the previous xref section. if revision.xrefstm xref_offset = revision.xrefstm[:Prev].to_i else xref_offset = revision.trailer[:Prev].to_i end # Prepend the revision. revisions.unshift(revision) end pdf.revisions.clear revisions.each do |rev| pdf.revisions.push(rev) pdf.insert(rev.xrefstm) if rev.has_xrefstm? end parse_finalize(pdf) pdf end
Private Instance Methods
locate_last_xref_offset()
click to toggle source
The document is scanned starting from the end, by locating the last startxref token.
# File lib/origami/parsers/pdf/lazy.rb, line 73 def locate_last_xref_offset # Set the scanner position at the end. @data.terminate # Locate the startxref token. until @data.match?(/#{Trailer::XREF_TOKEN}/) raise ParsingError, "No xref token found" if @data.pos == 0 @data.pos -= 1 end # Extract the offset of the last xref section. trailer = Trailer.parse(@data, self) raise ParsingError, "Cannot locate xref section" if trailer.startxref.zero? trailer.startxref end
parse_revision(pdf, offset)
click to toggle source
In the LazyParser
, the revisions are parsed by jumping through the cross-references (table or streams).
# File lib/origami/parsers/pdf/lazy.rb, line 93 def parse_revision(pdf, offset) raise ParsingError, "Invalid xref offset" if offset < 0 or offset >= @data.string.size @data.pos = offset # Create a new revision. revision = PDF::Revision.new(pdf) # Regular xref section. if @data.match?(/#{XRef::Section::TOKEN}/) parse_revision_from_xreftable(revision) # The xrefs are stored in a stream. else parse_revision_from_xrefstm(revision) end revision end
parse_revision_from_xrefstm(revision)
click to toggle source
Assume the current pointer is at the xref stream of the revision.
The XRefStream
should normally be at the end of the revision. We scan after the object for a trailer token.
The revision is allowed not to have a trailer, and the stream dictionary will be used as the trailer dictionary in that case.
# File lib/origami/parsers/pdf/lazy.rb, line 155 def parse_revision_from_xrefstm(revision) xrefstm = parse_object raise ParsingError, "Invalid xref stream" unless xrefstm.is_a?(XRefStream) revision.xrefstm = xrefstm # Search for the trailer. if @data.skip_until Regexp.union(Trailer::XREF_TOKEN, *Trailer::TOKENS) @data.pos -= @data.matched_size revision.trailer = parse_trailer else warn "No trailer found." revision.trailer = Trailer.new end end
parse_revision_from_xreftable(revision)
click to toggle source
Assume the current pointer is at the xreftable of the revision. We are expecting:
- a regular xref table, starting with xref - a revision trailer
The trailer may hold a XRefStm entry in case of hybrid references.
# File lib/origami/parsers/pdf/lazy.rb, line 121 def parse_revision_from_xreftable(revision) xreftable = parse_xreftable raise ParsingError, "Cannot parse xref section" if xreftable.nil? revision.xreftable = xreftable revision.trailer = parse_trailer # Handle hybrid cross-references. if revision.trailer[:XRefStm].is_a?(Integer) begin offset = revision.trailer[:XRefStm].to_i xrefstm = parse_object(offset) if xrefstm.is_a?(XRefStream) revision.xrefstm = xrefstm else warn "Invalid xref stream at offset #{offset}" end rescue warn "Cannot parse xref stream at offset #{offset}" end end end