class HexaPDF::Type::FileSpecification

Represents a file specification dictionary.

File specifications are used to refer to other files or URLs from within a PDF file. Simple file specifications are just strings. However, the are automatically converted on access to a full file specification to provide a unified interface.

Working with File Specifications

A file specification may refer to a file or an URL. This can easily be checked with url?. Independent of whether the file specification referes to an URL or a file, the path method returns the “best” useable path for it.

Modifying a file specification should be done via the path= and url= methods as they ensure that no obsolescent entries are used and the file specification is consistent.

Finally, since embedded files in a PDF document are always linked to a file specification it is useful to provide embedding/unembedding operations in this class, see embed and unembed.

See: PDF1.7 s7.11

Public Instance Methods

embed(filename, name: File.basename(filename), register: true) → ef_stream click to toggle source
embed(io, name:, register: true) → ef_stream

Embeds the given file or IO stream into the PDF file, sets the path accordingly and returns the created stream object.

If a file is given, the name option defaults to the basename of the file. However, if an IO object is given, the name argument is mandatory.

If there already was a file embedded for this file specification, it is unembedded first.

The embedded file stream automatically uses the FlateEncode filter for compressing the embedded file.

Options:

name

The name that should be used as path value and when registering.

register

Specifies whether the embedded file will be added to the EmbeddedFiles name tree under the name. If the name is already taken, it's value is overwritten.

The file has to be available until the PDF document gets written because reading and writing is done lazily.

# File lib/hexapdf/type/file_specification.rb, line 184
def embed(file_or_io, name: nil, register: true)
  name ||= File.basename(file_or_io) if file_or_io.kind_of?(String)
  if name.nil?
    raise ArgumentError, "The name argument is mandatory when given an IO object"
  end

  unembed
  self.path = name

  self[:EF] ||= {}
  ef_stream = self[:EF][:UF] = self[:EF][:F] = document.add({Type: :EmbeddedFile})
  stat = if file_or_io.kind_of?(String)
           File.stat(file_or_io)
         elsif file_or_io.respond_to?(:stat)
           file_or_io.stat
         end
  if stat
    ef_stream[:Params] = {Size: stat.size, CreationDate: stat.ctime, ModDate: stat.mtime}
  end
  ef_stream.set_filter(:FlateDecode)
  ef_stream.stream = HexaPDF::StreamData.new(file_or_io)

  if register
    (document.catalog[:Names] ||= {})[:EmbeddedFiles] ||= {}
    document.catalog[:Names][:EmbeddedFiles].add_entry(name, self)
  end

  ef_stream
end
embedded_file?() click to toggle source

Returns true if this file specification contains an embedded file.

See: embedded_file_stream

# File lib/hexapdf/type/file_specification.rb, line 143
def embedded_file?
  key?(:EF) && !self[:EF].empty?
end
embedded_file_stream() click to toggle source

Returns the embedded file associated with this file specification, or nil if this file specification references no embedded file.

If there are multiple possible embedded files, the /EF fields are searched in the following order and the first one with a value is used: /UF, /F, /Unix, /Mac, /DOS.

# File lib/hexapdf/type/file_specification.rb, line 152
def embedded_file_stream
  return unless key?(:EF)
  ef = self[:EF]
  ef[:UF] || ef[:F] || ef[:Unix] || ef[:Mac] || ef[:DOS]
end
path() click to toggle source

Returns the path for the referenced file or URL. An empty string is returned if no file specification string is set.

If multiple file specification strings are available, the fields are search in the following order and the first one with a value is used: /UF, /F, /Unix, /Mac, /DOS.

The encoding of the returned path string is either UTF-8 (for /UF) or BINARY (for /F /Unix, /Mac and /DOS).

# File lib/hexapdf/type/file_specification.rb, line 107
def path
  tmp = (self[:UF] || self[:F] || self[:Unix] || self[:Mac] || self[:DOS] || '').dup
  tmp.gsub!(/\\\//, "/") # PDF1.7 s7.11.2.1 but / in filename is interpreted as separator!
  tmp.tr!("\\", "/") # always use slashes instead of back-slashes!
  tmp
end
path=(filename) click to toggle source

Sets the file specification string to the given filename.

Since the /Unix, /Mac and /DOS fields are obsolescent, only the /F and /UF fields are set.

# File lib/hexapdf/type/file_specification.rb, line 117
def path=(filename)
  self[:UF] = self[:F] = filename
  delete(:FS)
  delete(:Unix)
  delete(:Mac)
  delete(:DOS)
end
unembed() click to toggle source

Deletes any embedded file streams associated with this file specification. A possible entry in the EmbeddedFiles name tree is also deleted.

# File lib/hexapdf/type/file_specification.rb, line 216
def unembed
  return unless key?(:EF)
  self[:EF].each {|_, ef_stream| document.delete(ef_stream) }

  if document.catalog.key?(:Names) && document.catalog[:Names].key?(:EmbeddedFiles)
    tree = document.catalog[:Names][:EmbeddedFiles]
    tree.each_entry.find_all {|_, spec| spec == self }.each do |(name, _)|
      tree.delete_entry(name)
    end
  end
end
url=(url) click to toggle source

Sets the file specification string to the given URL and updates the file system entry appropriately.

The provided URL needs to be in an RFC1738 compliant string representation. If not, an error is raised.

# File lib/hexapdf/type/file_specification.rb, line 130
def url=(url)
  begin
    URI(url)
  rescue URI::InvalidURIError => e
    raise HexaPDF::Error, e
  end
  self.path = url
  self[:FS] = :URL
end
url?() click to toggle source

Returns true if this file specification references an URL and not a file.

# File lib/hexapdf/type/file_specification.rb, line 95
def url?
  self[:FS] == :URL
end