class HexaPDF::Type::XRefStream

Represents PDF type XRef, cross-reference streams.

A cross-reference stream is used as a more compact representation for an cross-reference section and trailer dictionary. The trailer dictionary is incorporated into the stream dictionary and the cross-reference section entries are stored in the stream itself, compressed to save space.

How are Cross-reference Streams Used?

Cross-reference stream objects are only used when parsing or writing a PDF document.

When a file is read and a cross-reference stream is found, it is loaded and its information is stored in a HexaPDF::Revision object. So from a user's perspective nothing changes when a cross-reference stream instead of a cross-reference section and trailer is encountered.

This also means that all information stored in a cross-reference stream between parsing and writing is discarded when the PDF document gets written!

Upon writing a revision it is checked whether that revision contains a cross-reference stream object. If it does the cross-reference stream object is updated with the cross-reference section and trailer information and then written. Otherwise a normal cross-reference section plus trailer are written.

See: PDF1.7 s7.5.8

Public Instance Methods

trailer() click to toggle source

Returns a hash with the entries that represent the file trailer part of the cross-reference stream's dictionary.

See: Type::Trailer

# File lib/hexapdf/type/xref_stream.rb, line 94
def trailer
  Trailer.each_field.with_object({}) do |(name, _data), hash|
    hash[name] = value[name] if key?(name)
  end
end
update_with_xref_section_and_trailer(xref_section, trailer) click to toggle source

Makes this cross-reference stream represent the data in the given HexaPDF::XRefSection and Type::Trailer.

The xref_section needs to contain an entry for this cross-reference stream and it is necessary that this entry is the one with the highest byte position (for calculating the correct /W entry).

The given cross-reference section is not stored but only used to rewrite the associated stream to reflect the cross-reference section. The dictionary is updated with the information from the trailer and the needed entries for the cross-reference section.

If there are changes to the cross-reference section or trailer, this method has to be invoked again.

# File lib/hexapdf/type/xref_stream.rb, line 113
def update_with_xref_section_and_trailer(xref_section, trailer)
  value.replace(trailer)
  value[:Type] = :XRef
  write_xref_section_to_stream(xref_section)
  set_filter(:FlateDecode, Columns: value[:W].inject(:+), Predictor: 12)
end
xref_section() click to toggle source

Returns an XRefSection that represents the content of this cross-reference stream.

Each invocation returns a new XRefSection object based on the current data in the associated stream and dictionary.

# File lib/hexapdf/type/xref_stream.rb, line 85
def xref_section
  index = self[:Index] || [0, self[:Size]]
  parse_xref_section(index, self[:W])
end

Private Instance Methods

bytes_to_int(string, start_index, end_index) click to toggle source

Converts the bytes of the string from the start index to the end index to an integer.

The bytes are converted in the big-endian way.

# File lib/hexapdf/type/xref_stream.rb, line 176
def bytes_to_int(string, start_index, end_index)
  result = string.getbyte(start_index)
  start_index += 1
  while start_index < end_index
    result = (result << 8) | string.getbyte(start_index)
    start_index += 1
  end
  result
end
calculate_w_entry_and_pack_string(max_number) click to toggle source

Returns the /W entry depending on the given maximal number for the second field as well as the appropriate entry packing string.

# File lib/hexapdf/type/xref_stream.rb, line 213
def calculate_w_entry_and_pack_string(max_number)
  middle = Math.log(max_number, 255).ceil
  middle = 4 if middle == 3
  pack_string = "C#{'-CnNN'[middle]}n"
  [[1, middle, 2], pack_string]
end
parse_xref_section(index, w) click to toggle source

Parses the stream and returns the resulting HexaPDF::XRefSection object.

# File lib/hexapdf/type/xref_stream.rb, line 127
def parse_xref_section(index, w)
  xref = XRefSection.new

  data = stream
  start_pos = end_pos = 0

  w0 = w[0]
  w1 = w[1]
  w2 = w[2]

  needed_bytes = (w0 + w1 + w2) * index.each_slice(2).sum(&:last)

  if needed_bytes > data.size
    raise HexaPDF::MalformedPDFError, "Cross-reference stream is missing data " \
      "(#{needed_bytes} bytes needed, got #{data.size})"
  end

  index.each_slice(2) do |first_oid, number_of_entries|
    first_oid.upto(first_oid + number_of_entries - 1) do |oid|
      # Default for first field: type 1
      end_pos = start_pos + w0
      type_field = (w0 == 0 ? TYPE_IN_USE : bytes_to_int(data, start_pos, end_pos))
      # No default available for second field
      start_pos = end_pos + w1
      field2 = bytes_to_int(data, end_pos, start_pos)
      # Default for third field is 0 for type 1, otherwise it needs to be specified!
      end_pos = start_pos + w2
      field3 = (w2 == 0 ? 0 : bytes_to_int(data, start_pos, end_pos))

      case type_field
      when TYPE_IN_USE
        xref.add_in_use_entry(oid, field3, field2)
      when TYPE_FREE
        xref.add_free_entry(oid, field3)
      when TYPE_COMPRESSED
        xref.add_compressed_entry(oid, field2, field3)
      else
        nil # Ignore entry as per PDF1.7 s7.5.8.3
      end
      start_pos = end_pos
    end
  end

  xref
end
write_xref_section_to_stream(xref_section) click to toggle source

Writes the given cross-reference section to the stream and sets the correct /W and /Index entries for the written data.

# File lib/hexapdf/type/xref_stream.rb, line 188
def write_xref_section_to_stream(xref_section)
  value[:W], pack_string = calculate_w_entry_and_pack_string(xref_section[oid, gen].pos)
  value[:Index] = []

  stream = ''.b
  xref_section.each_subsection do |entries|
    value[:Index] << entries.first.oid << entries.length
    entries.each do |entry|
      data = if entry.in_use?
               [TYPE_IN_USE, entry.pos, entry.gen]
             elsif entry.free?
               [TYPE_FREE, 0, 65535]
             elsif entry.compressed?
               [TYPE_COMPRESSED, entry.objstm, entry.pos]
             else
               raise HexaPDF::Error, "Unsupported cross-reference entry #{entry}"
             end
      stream << data.pack(pack_string)
    end
  end
  self.stream = stream
end