module HexaPDF::ImageLoader::JPEG

This module is used for loading images in the JPEG format from files or IO streams.

See: PDF1.7 s7.4.8, ITU T.81 Annex B, ITU T.872

Constants

APP14_MARKER

Adobe uses the marker 0xEE (APPE or APP14) for its purposes. We need to use it for determinig whether we have a CMYK or YCCK image.

APP14_TRANSFORM_CMYK

Value of the 12th byte in an APP14 marker specifying that the image uses CMYK color encoding, with all four colors complemented.

EOI_MARKER

End-of-image marker

MAGIC_FILE_MARKER

The magic marker that tells us if the file/IO contains an image in JPEG format.

SOF_MARKERS

The various start-of-frame markers that tell us which kind of JPEG it is. The marker segment itself contains all the needed information needed for creating the PDF image object.

See: ITU T.81 B1.1.3

SOS_MARKER

Start-of-scan marker

Public Class Methods

handles?(filename) → true or false click to toggle source
handles?(io) → true or false

Returns true if the given file or IO stream can be handled, ie. if it contains an image in JPEG format.

# File lib/hexapdf/image_loader/jpeg.rb, line 78
def self.handles?(file_or_io)
  if file_or_io.kind_of?(String)
    File.read(file_or_io, 3, mode: 'rb') == MAGIC_FILE_MARKER
  else
    file_or_io.rewind
    file_or_io.read(3) == MAGIC_FILE_MARKER
  end
end
load(document, filename) → image_obj click to toggle source
load(document, io) → image_obj

Creates a PDF image object from the JPEG file or IO stream.

# File lib/hexapdf/image_loader/jpeg.rb, line 92
def self.load(document, file_or_io)
  dict = if file_or_io.kind_of?(String)
           File.open(file_or_io, 'rb') {|io| image_data_from_io(io) }
         else
           image_data_from_io(file_or_io)
         end
  document.add(dict, stream: HexaPDF::StreamData.new(file_or_io))
end

Private Class Methods

image_data_from_io(io) click to toggle source

Returns a hash containing the extracted JPEG image data.

# File lib/hexapdf/image_loader/jpeg.rb, line 102
def self.image_data_from_io(io)
  io.seek(2, IO::SEEK_SET)

  while true
    code0 = io.getbyte
    code1 = io.getbyte

    # B1.1.2 - all markers start with 0xFF
    if code0 != 0xFF
      raise HexaPDF::Error, "Invalid bytes found, expected marker code"
    end

    # B1.1.2 - markers may be preceeded by any number of 0xFF fill bytes
    code1 = io.getbyte while code1 == 0xFF

    break if code1 == SOS_MARKER || code1 == EOI_MARKER

    # B1.1.4 - next two bytes are the length of the segment (except for RSTm or TEM markers
    # but those shouldn't appear here)
    length = io.read(2).unpack1('n')

    # According to T.872 6.1 and 6.5.3, if this marker is present, we need to use it for
    # correctly determining whether complemented CMYK or YCCK is used
    if code1 == APP14_MARKER
      io.seek(length - 3, IO::SEEK_CUR)
      invert_colors = true if io.getbyte == APP14_TRANSFORM_CMYK
      next
    elsif !SOF_MARKERS.include?(code1)
      io.seek(length - 2, IO::SEEK_CUR)
      next
    end

    bits, height, width, components = io.read(6).unpack('CnnC')
    io.seek(length - 2 - 6, IO::SEEK_CUR)

    # short-cut loop if we have all needed information
    break if components != 4 || invert_colors
  end

  # PDF1.7 s8.9.5.1
  if bits != 8
    raise HexaPDF::Error, "Unsupported number of bits per component: #{bits}"
  end

  color_space = case components
                when 1 then :DeviceGray
                when 3 then :DeviceRGB
                when 4 then :DeviceCMYK
                end

  dict = {
    Type: :XObject,
    Subtype: :Image,
    Width: width,
    Height: height,
    ColorSpace: color_space,
    BitsPerComponent: bits,
    Filter: :DCTDecode,
  }
  if invert_colors && color_space == :DeviceCMYK
    dict[:Decode] = [1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0]
  end

  dict
end