class HexaPDF::Document::Files

This class provides methods for managing file specifications of a PDF file.

Note that for a given PDF file not all file specifications may be found, e.g. when a file specification is only a string. Therefore this module can only handle those file specifications that are indirect file specification dictionaries with the /Type key set.

Public Class Methods

new(document) click to toggle source

Creates a new Files object for the given PDF document.

# File lib/hexapdf/document/files.rb, line 49
def initialize(document)
  @document = document
end

Public Instance Methods

add(filename, name: nil, description: nil, embed: true) → file_spec click to toggle source
add(io, name:, description: nil) → file_spec

Adds the file or IO to the PDF document and returns the corresponding file specification object.

Options:

name

The name that should be used for the file path. This name is also used for registering the file in the EmbeddedFiles name tree.

When a filename is given, the basename of the file is used by default for name if it is not specified.

description

A description of the file.

embed

When an IO object is given, it is always embedded and this option is ignored.

When a filename is given and this option is true, then the file is embedded. Otherwise only a reference to it is stored.

See: HexaPDF::Type::FileSpecification

# File lib/hexapdf/document/files.rb, line 79
def add(file_or_io, name: nil, description: nil, embed: true)
  name ||= File.basename(file_or_io) if file_or_io.kind_of?(String)
  if name.nil?
    raise ArgumentError, "The name argument is mandatory when given an IO object"
  end

  spec = @document.add({Type: :Filespec})
  spec.path = name
  spec[:Desc] = description if description
  spec.embed(file_or_io, name: name, register: true) if embed || !file_or_io.kind_of?(String)
  spec
end
each(search: false) {|file_spec| block } → files click to toggle source
each(search: false) → Enumerator

Iterates over indirect file specification dictionaries of the PDF.

By default, only the file specifications in their standard locations, namely in the EmbeddedFiles name tree and in the page annotations, are returned. If the search option is true, then all indirect objects are searched for file specification dictionaries which can be much slower.

# File lib/hexapdf/document/files.rb, line 102
def each(search: false)
  return to_enum(__method__, search: search) unless block_given?

  if search
    @document.each(only_current: false) do |obj|
      yield(obj) if obj.type == :Filespec
    end
  else
    seen = {}
    tree = @document.catalog[:Names] && @document.catalog[:Names][:EmbeddedFiles]
    tree&.each_entry do |_, spec|
      seen[spec] = true
      yield(spec)
    end

    @document.pages.each do |page|
      page[:Annots]&.each do |annot|
        next unless annot[:Subtype] == :FileAttachment
        spec = @document.deref(annot[:FS])
        yield(spec) unless seen.key?(spec)
        seen[spec] = true
      end
    end
  end

  self
end