class HexaPDF::Type::ObjectStream

Represents PDF type ObjStm, object streams.

An object stream is a stream that can hold multiple indirect objects. Since the objects are stored inside the stream, filters can be used to compress the stream content and therefore represent the indirect objects more compactly than would be possible otherwise.

How are Object Streams Used?

When an indirect object that resides in an object stream needs to be loaded, the object stream itself is parsed and loaded and parse_stream is invoked to get an ObjectStream::Data object representing the stored indirect objects. After that the requested indirect object itself is loaded and returned using this ObjectStream::Data object. From a user's perspective nothing changes when an object is located inside an object stream instead of directly in a PDF file.

The indirect objects initially stored in the object stream are automatically added to the list of to-be-stored objects when parse_stream is invoked. Additional objects can be assigned to the object stream via add_object or deleted from it via delete_object.

Before an object stream is written, it is necessary to invoke write_objects so that the to-be-stored objects are serialized to the stream. This is automatically done by the Writer. A user thus only has to define which objects should reside in the object stream.

However, only objects that can be written to the object stream are actually written. The other objects are deleted from the object stream (delete_object) and written normally.

See PDF1.7 s7.5.7

Public Instance Methods

add_object(ref) click to toggle source

Adds the given object to the list of objects that should be stored in this object stream.

The ref argument can either be a reference or any PDF object.

# File lib/hexapdf/type/object_stream.rb, line 122
def add_object(ref)
  return if object_index(ref)

  index = objects.size / 2
  objects[index] = ref
  objects[ref] = index
end
delete_object(ref) click to toggle source

Deletes the given object from the list of objects that should be stored in this object stream.

The ref argument can either be a reference or a PDF object.

# File lib/hexapdf/type/object_stream.rb, line 134
def delete_object(ref)
  index = objects[ref]
  return unless index

  move_index = objects.size / 2 - 1

  objects[index] = objects[move_index]
  objects[objects[index]] = index
  objects.delete(ref)
  objects.delete(move_index)
end
object_index(obj) click to toggle source

Returns the index into the array containing the to-be-stored objects for the given reference/PDF object.

# File lib/hexapdf/type/object_stream.rb, line 148
def object_index(obj)
  objects[obj]
end
parse_stream() click to toggle source

Parses the stream and returns an ObjectStream::Data object that can be used for retrieving the objects defined by this object stream.

The object references are also added to this object stream so that they are included when the object gets written.

# File lib/hexapdf/type/object_stream.rb, line 112
def parse_stream
  data = stream
  oids, offsets = parse_oids_and_offsets(data)
  oids.each {|oid| add_object(Reference.new(oid, 0)) }
  Data.new(data, oids, offsets)
end
write_objects(revision) → obj_to_stm_hash click to toggle source

Writes the added objects to the stream and returns a hash mapping all written objects to this object stream.

There are some reasons why an added object may not be stored in the stream:

  • It has a generation number other than 0.

  • It is a stream object.

  • It doesn't reside in the given Revision object.

Such objects are additionally deleted from the list of to-be-stored objects and are later written as indirect objects.

# File lib/hexapdf/type/object_stream.rb, line 166
def write_objects(revision)
  index = 0
  object_info = ''.b
  data = ''.b
  serializer = Serializer.new
  obj_to_stm = {}

  encrypt_dict = document.trailer[:Encrypt]
  while index < objects.size / 2
    obj = revision.object(objects[index])

    # Due to a bug in Adobe Acrobat, the Catalog may not be in an object stream if the
    # document is encrypted
    if obj.nil? || obj.null? || obj.gen != 0 || obj.kind_of?(Stream) || obj == encrypt_dict ||
        (encrypt_dict && obj.type == :Catalog)
      delete_object(objects[index])
      next
    end

    obj_to_stm[obj] = self
    object_info << "#{obj.oid} #{data.size} "
    data << serializer.serialize(obj) << " "
    index += 1
  end

  value[:Type] = :ObjStm
  value[:N] = objects.size / 2
  value[:First] = object_info.size
  self.stream = object_info << data
  set_filter(:FlateDecode)

  obj_to_stm
end

Private Instance Methods

objects() click to toggle source

Returns the container with the to-be-stored objects.

# File lib/hexapdf/type/object_stream.rb, line 218
def objects
  @objects ||= {}
end
parse_oids_and_offsets(data) click to toggle source

Parses the object numbers and their offsets from the start of the stream data.

# File lib/hexapdf/type/object_stream.rb, line 203
def parse_oids_and_offsets(data)
  oids = []
  offsets = []
  first = value[:First].to_i

  stream_tokenizer = Tokenizer.new(StringIO.new(data))
  !data.empty? && value[:N].to_i.times do
    oids << stream_tokenizer.next_object
    offsets << first + stream_tokenizer.next_object
  end

  [oids, offsets]
end
perform_validation() { |"Object stream has invalid generation number > 0", false| ... } click to toggle source

Validates that the generation number of the object stream is zero.

Calls superclass method HexaPDF::Dictionary#perform_validation
# File lib/hexapdf/type/object_stream.rb, line 223
def perform_validation
  super
  yield("Object stream has invalid generation number > 0", false) if gen != 0
end