module HexaPDF::Task::Optimize

Task for optimizing the PDF document.

For a list of optimization methods this task can perform have a look at the ::call method.

Public Class Methods

call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false) click to toggle source

Optimizes the PDF document.

The field entries that are optional and set to their default value are always deleted. Additional optimization methods are performed depending on the values of the following arguments:

compact

Compacts the object space by merging the revisions and then deleting null and unused values if set to true.

object_streams

Specifies if and how object streams should be used: For :preserve, existing object streams are preserved; for :generate objects are packed into object streams as much as possible; and for :delete existing object streams are deleted.

xref_streams

Specifies if cross-reference streams should be used. Can be :preserve (no modifications), :generate (use cross-reference streams) or :delete (remove cross-reference streams).

If object_streams is set to :generate, this option is implicitly changed to :generate.

compress_pages

Compresses the content streams of all pages if set to true. Note that this can take a very long time because each content stream has to be unfiltered, parsed, serialized and then filtered again.

# File lib/hexapdf/task/optimize.rb, line 74
def self.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve,
              compress_pages: false)
  if compact
    compact(doc, object_streams, xref_streams)
  elsif object_streams != :preserve
    process_object_streams(doc, object_streams, xref_streams)
  elsif xref_streams != :preserve
    process_xref_streams(doc, xref_streams)
  else
    doc.each(only_current: false, &method(:delete_fields_with_defaults))
  end

  compress_pages(doc) if compress_pages
end
compact(doc, object_streams, xref_streams) click to toggle source

Compacts the document by merging all revisions into one, deleting null and unused entries and renumbering the objects.

For the meaning of the other arguments see ::call.

# File lib/hexapdf/task/optimize.rb, line 93
def self.compact(doc, object_streams, xref_streams)
  doc.revisions.merge
  unused = Set.new(doc.task(:dereference))
  rev = doc.revisions.add

  oid = 1
  doc.revisions[0].each do |obj|
    if obj.null? || unused.include?(obj) || (obj.type == :ObjStm) ||
        (obj.type == :XRef && xref_streams != :preserve)
      obj.data.value = nil
      next
    end

    delete_fields_with_defaults(obj)
    obj.oid = oid
    obj.gen = 0
    rev.add(obj)
    oid += 1
  end
  doc.revisions.delete(0)

  if object_streams == :generate
    process_object_streams(doc, :generate, xref_streams)
  elsif xref_streams == :generate
    doc.add({Type: :XRef})
  end
end
compress_pages(doc) click to toggle source

Compresses the contents of all pages by parsing and then serializing again. The HexaPDF serializer is already optimized for small output size so nothing else needs to be done.

# File lib/hexapdf/task/optimize.rb, line 216
def self.compress_pages(doc)
  doc.pages.each do |page|
    processor = SerializationProcessor.new
    HexaPDF::Content::Parser.parse(page.contents, processor)
    page.contents = processor.result
    page[:Contents].set_filter(:FlateDecode)
  end
end
delete_fields_with_defaults(obj) click to toggle source

Deletes field entries of the object that are optional and currently set to their default value.

# File lib/hexapdf/task/optimize.rb, line 204
def self.delete_fields_with_defaults(obj)
  return unless obj.kind_of?(HexaPDF::Dictionary) && !obj.null?
  obj.each do |name, value|
    if (field = obj.class.field(name)) && !field.required? && field.default? &&
        value == field.default
      obj.delete(name)
    end
  end
end
process_object_streams(doc, method, xref_streams) click to toggle source

Processes the object streams in each revision according to method: For :preserve, nothing is done, for :delete all object streams are deleted and for :generate objects are packed into object streams as much as possible.

# File lib/hexapdf/task/optimize.rb, line 124
def self.process_object_streams(doc, method, xref_streams)
  case method
  when :delete
    doc.revisions.each_with_index do |rev, rev_index|
      xref_stream = false
      objects_to_delete = []
      rev.each do |obj|
        case obj.type
        when :ObjStm
          objects_to_delete << obj
        when :XRef
          xref_stream = true
          objects_to_delete << obj if xref_streams == :delete
        else
          delete_fields_with_defaults(obj)
        end
      end
      objects_to_delete.each {|obj| rev.delete(obj) }
      if xref_streams == :generate && !xref_stream
        doc.add({Type: :XRef}, revision: rev_index)
      end
    end
  when :generate
    doc.revisions.each_with_index do |rev, rev_index|
      xref_stream = false
      count = 0
      objstms = [doc.wrap({Type: :ObjStm})]
      old_objstms = []
      rev.each do |obj|
        case obj.type
        when :XRef
          xref_stream = true
        when :ObjStm
          old_objstms << obj
        end
        delete_fields_with_defaults(obj)

        next if obj.respond_to?(:stream)

        objstms[-1].add_object(obj)
        count += 1
        if count == 200
          objstms << doc.wrap({Type: :ObjStm})
          count = 0
        end
      end
      old_objstms.each {|objstm| rev.delete(objstm) }
      objstms.each {|objstm| doc.add(objstm, revision: rev_index) }
      doc.add({Type: :XRef}, revision: rev_index) unless xref_stream
    end
  end
end
process_xref_streams(doc, method) click to toggle source

Processes the cross-reference streams in each revision according to method: For :preserve, nothing is done, for :delete all cross-reference streams are deleted and for :generate cross-reference streams are added.

# File lib/hexapdf/task/optimize.rb, line 180
def self.process_xref_streams(doc, method)
  case method
  when :delete
    doc.each(only_current: false) do |obj, rev|
      if obj.type == :XRef
        rev.delete(obj)
      else
        delete_fields_with_defaults(obj)
      end
    end
  when :generate
    doc.revisions.each_with_index do |rev, rev_index|
      xref_stream = false
      rev.each do |obj|
        xref_stream = true if obj.type == :XRef
        delete_fields_with_defaults(obj)
      end
      doc.add({Type: :XRef}, revision: rev_index) unless xref_stream
    end
  end
end