module HexaPDF::ImageLoader::JPEG
This module is used for loading images in the JPEG
format from files or IO streams.
See: PDF1.7 s7.4.8, ITU T.81 Annex B, ITU T.872
Constants
- APP14_MARKER
Adobe uses the marker 0xEE (APPE or APP14) for its purposes. We need to use it for determinig whether we have a CMYK or YCCK image.
- APP14_TRANSFORM_CMYK
Value of the 12th byte in an APP14 marker specifying that the image uses CMYK color encoding, with all four colors complemented.
- EOI_MARKER
End-of-image marker
- MAGIC_FILE_MARKER
The magic marker that tells us if the file/IO contains an image in
JPEG
format.- SOF_MARKERS
The various start-of-frame markers that tell us which kind of
JPEG
it is. The marker segment itself contains all the needed information needed for creating thePDF
image object.See: ITU T.81 B1.1.3
- SOS_MARKER
Start-of-scan marker
Public Class Methods
Returns true
if the given file or IO stream can be handled, ie. if it contains an image in JPEG
format.
# File lib/hexapdf/image_loader/jpeg.rb, line 78 def self.handles?(file_or_io) if file_or_io.kind_of?(String) File.read(file_or_io, 3, mode: 'rb') == MAGIC_FILE_MARKER else file_or_io.rewind file_or_io.read(3) == MAGIC_FILE_MARKER end end
Creates a PDF
image object from the JPEG
file or IO stream.
# File lib/hexapdf/image_loader/jpeg.rb, line 92 def self.load(document, file_or_io) dict = if file_or_io.kind_of?(String) File.open(file_or_io, 'rb') {|io| image_data_from_io(io) } else image_data_from_io(file_or_io) end document.add(dict, stream: HexaPDF::StreamData.new(file_or_io)) end
Private Class Methods
Returns a hash containing the extracted JPEG
image data.
# File lib/hexapdf/image_loader/jpeg.rb, line 102 def self.image_data_from_io(io) io.seek(2, IO::SEEK_SET) while true code0 = io.getbyte code1 = io.getbyte # B1.1.2 - all markers start with 0xFF if code0 != 0xFF raise HexaPDF::Error, "Invalid bytes found, expected marker code" end # B1.1.2 - markers may be preceeded by any number of 0xFF fill bytes code1 = io.getbyte while code1 == 0xFF break if code1 == SOS_MARKER || code1 == EOI_MARKER # B1.1.4 - next two bytes are the length of the segment (except for RSTm or TEM markers # but those shouldn't appear here) length = io.read(2).unpack1('n') # According to T.872 6.1 and 6.5.3, if this marker is present, we need to use it for # correctly determining whether complemented CMYK or YCCK is used if code1 == APP14_MARKER io.seek(length - 3, IO::SEEK_CUR) invert_colors = true if io.getbyte == APP14_TRANSFORM_CMYK next elsif !SOF_MARKERS.include?(code1) io.seek(length - 2, IO::SEEK_CUR) next end bits, height, width, components = io.read(6).unpack('CnnC') io.seek(length - 2 - 6, IO::SEEK_CUR) # short-cut loop if we have all needed information break if components != 4 || invert_colors end # PDF1.7 s8.9.5.1 if bits != 8 raise HexaPDF::Error, "Unsupported number of bits per component: #{bits}" end color_space = case components when 1 then :DeviceGray when 3 then :DeviceRGB when 4 then :DeviceCMYK end dict = { Type: :XObject, Subtype: :Image, Width: width, Height: height, ColorSpace: color_space, BitsPerComponent: bits, Filter: :DCTDecode, } if invert_colors && color_space == :DeviceCMYK dict[:Decode] = [1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0] end dict end