class HexaPDF::Type::XRefStream
Represents PDF type XRef, cross-reference streams.
A cross-reference stream is used as a more compact representation for an cross-reference section and trailer dictionary. The trailer dictionary is incorporated into the stream dictionary and the cross-reference section entries are stored in the stream itself, compressed to save space.
How are Cross-reference Streams Used?¶ ↑
Cross-reference stream objects are only used when parsing or writing a PDF document.
When a file is read and a cross-reference stream is found, it is loaded and its information is stored in a HexaPDF::Revision
object. So from a user's perspective nothing changes when a cross-reference stream instead of a cross-reference section and trailer is encountered.
This also means that all information stored in a cross-reference stream between parsing and writing is discarded when the PDF document gets written!
Upon writing a revision it is checked whether that revision contains a cross-reference stream object. If it does the cross-reference stream object is updated with the cross-reference section and trailer information and then written. Otherwise a normal cross-reference section plus trailer are written.
See: PDF1.7 s7.5.8
Public Instance Methods
Returns a hash with the entries that represent the file trailer part of the cross-reference stream's dictionary.
See: Type::Trailer
# File lib/hexapdf/type/xref_stream.rb, line 94 def trailer Trailer.each_field.with_object({}) do |(name, _data), hash| hash[name] = value[name] if key?(name) end end
Makes this cross-reference stream represent the data in the given HexaPDF::XRefSection
and Type::Trailer
.
The xref_section
needs to contain an entry for this cross-reference stream and it is necessary that this entry is the one with the highest byte position (for calculating the correct /W entry).
The given cross-reference section is not stored but only used to rewrite the associated stream to reflect the cross-reference section. The dictionary is updated with the information from the trailer and the needed entries for the cross-reference section.
If there are changes to the cross-reference section or trailer, this method has to be invoked again.
# File lib/hexapdf/type/xref_stream.rb, line 113 def update_with_xref_section_and_trailer(xref_section, trailer) value.replace(trailer) value[:Type] = :XRef write_xref_section_to_stream(xref_section) set_filter(:FlateDecode, Columns: value[:W].inject(:+), Predictor: 12) end
Returns an XRefSection
that represents the content of this cross-reference stream.
Each invocation returns a new XRefSection
object based on the current data in the associated stream and dictionary.
# File lib/hexapdf/type/xref_stream.rb, line 85 def xref_section index = self[:Index] || [0, self[:Size]] parse_xref_section(index, self[:W]) end
Private Instance Methods
Converts the bytes of the string from the start index to the end index to an integer.
The bytes are converted in the big-endian way.
# File lib/hexapdf/type/xref_stream.rb, line 176 def bytes_to_int(string, start_index, end_index) result = string.getbyte(start_index) start_index += 1 while start_index < end_index result = (result << 8) | string.getbyte(start_index) start_index += 1 end result end
Returns the /W entry depending on the given maximal number for the second field as well as the appropriate entry packing string.
# File lib/hexapdf/type/xref_stream.rb, line 213 def calculate_w_entry_and_pack_string(max_number) middle = Math.log(max_number, 255).ceil middle = 4 if middle == 3 pack_string = "C#{'-CnNN'[middle]}n" [[1, middle, 2], pack_string] end
Parses the stream and returns the resulting HexaPDF::XRefSection
object.
# File lib/hexapdf/type/xref_stream.rb, line 127 def parse_xref_section(index, w) xref = XRefSection.new data = stream start_pos = end_pos = 0 w0 = w[0] w1 = w[1] w2 = w[2] needed_bytes = (w0 + w1 + w2) * index.each_slice(2).sum(&:last) if needed_bytes > data.size raise HexaPDF::MalformedPDFError, "Cross-reference stream is missing data " \ "(#{needed_bytes} bytes needed, got #{data.size})" end index.each_slice(2) do |first_oid, number_of_entries| first_oid.upto(first_oid + number_of_entries - 1) do |oid| # Default for first field: type 1 end_pos = start_pos + w0 type_field = (w0 == 0 ? TYPE_IN_USE : bytes_to_int(data, start_pos, end_pos)) # No default available for second field start_pos = end_pos + w1 field2 = bytes_to_int(data, end_pos, start_pos) # Default for third field is 0 for type 1, otherwise it needs to be specified! end_pos = start_pos + w2 field3 = (w2 == 0 ? 0 : bytes_to_int(data, start_pos, end_pos)) case type_field when TYPE_IN_USE xref.add_in_use_entry(oid, field3, field2) when TYPE_FREE xref.add_free_entry(oid, field3) when TYPE_COMPRESSED xref.add_compressed_entry(oid, field2, field3) else nil # Ignore entry as per PDF1.7 s7.5.8.3 end start_pos = end_pos end end xref end
Writes the given cross-reference section to the stream and sets the correct /W and /Index entries for the written data.
# File lib/hexapdf/type/xref_stream.rb, line 188 def write_xref_section_to_stream(xref_section) value[:W], pack_string = calculate_w_entry_and_pack_string(xref_section[oid, gen].pos) value[:Index] = [] stream = ''.b xref_section.each_subsection do |entries| value[:Index] << entries.first.oid << entries.length entries.each do |entry| data = if entry.in_use? [TYPE_IN_USE, entry.pos, entry.gen] elsif entry.free? [TYPE_FREE, 0, 65535] elsif entry.compressed? [TYPE_COMPRESSED, entry.objstm, entry.pos] else raise HexaPDF::Error, "Unsupported cross-reference entry #{entry}" end stream << data.pack(pack_string) end end self.stream = stream end