class PDF::Reader::PageState

encapsulates logic for tracking graphics state as the instructions for a single page are processed. Most of the public methods correspond directly to PDF operators.

Constants

DEFAULT_GRAPHICS_STATE

Public Class Methods

new(page) click to toggle source

starting a new page

# File lib/pdf/reader/page_state.rb, line 25
def initialize(page)
  @page          = page
  @cache         = page.cache
  @objects       = page.objects
  @font_stack    = [build_fonts(page.fonts)]
  @xobject_stack = [page.xobjects]
  @cs_stack      = [page.color_spaces]
  @stack         = [DEFAULT_GRAPHICS_STATE.dup]
  state[:ctm]  = identity_matrix
end

Public Instance Methods

begin_text_object() click to toggle source

Text Object Operators

# File lib/pdf/reader/page_state.rb, line 83
def begin_text_object
  @text_matrix      = identity_matrix
  @text_line_matrix = identity_matrix
  @font_size = nil
end
clone_state() click to toggle source

This returns a deep clone of the current state, ensuring changes are keep separate from earlier states.

Marshal is used to round-trip the state through a string to easily perform the deep clone. Kinda hacky, but effective.

# File lib/pdf/reader/page_state.rb, line 284
def clone_state
  if @stack.empty?
    {}
  else
    Marshal.load Marshal.dump(@stack.last)
  end
end
concatenate_matrix(a, b, c, d, e, f) click to toggle source

update the current transformation matrix.

If the CTM is currently undefined, just store the new values.

If there's an existing CTM, then multiply the existing matrix with the new matrix to form the updated matrix.

# File lib/pdf/reader/page_state.rb, line 65
def concatenate_matrix(a, b, c, d, e, f)
  if state[:ctm]
    ctm = state[:ctm]
    state[:ctm] = TransformationMatrix.new(a,b,c,d,e,f).multiply!(
      ctm.a, ctm.b,
      ctm.c, ctm.d,
      ctm.e, ctm.f
    )
  else
    state[:ctm] = TransformationMatrix.new(a,b,c,d,e,f)
  end
  @text_rendering_matrix = nil # invalidate cached value
end
ctm_transform(x, y) click to toggle source

transform x and y co-ordinates from the current user space to the underlying device space.

# File lib/pdf/reader/page_state.rb, line 220
def ctm_transform(x, y)
  [
    (ctm.a * x) + (ctm.c * y) + (ctm.e),
    (ctm.b * x) + (ctm.d * y) + (ctm.f)
  ]
end
current_font() click to toggle source
# File lib/pdf/reader/page_state.rb, line 245
def current_font
  find_font(state[:text_font])
end
end_text_object() click to toggle source
# File lib/pdf/reader/page_state.rb, line 89
def end_text_object
  # don't need to do anything
end
find_color_space(label) click to toggle source
# File lib/pdf/reader/page_state.rb, line 256
def find_color_space(label)
  dict = @cs_stack.detect { |colorspaces|
    colorspaces.has_key?(label)
  }
  dict ? dict[label] : nil
end
find_font(label) click to toggle source
# File lib/pdf/reader/page_state.rb, line 249
def find_font(label)
  dict = @font_stack.detect { |fonts|
    fonts.has_key?(label)
  }
  dict ? dict[label] : nil
end
find_xobject(label) click to toggle source
# File lib/pdf/reader/page_state.rb, line 263
def find_xobject(label)
  dict = @xobject_stack.detect { |xobjects|
    xobjects.has_key?(label)
  }
  dict ? dict[label] : nil
end
font_size() click to toggle source
# File lib/pdf/reader/page_state.rb, line 110
def font_size
  @font_size ||= begin
                   _, zero = trm_transform(0,0)
                   _, one  = trm_transform(1,1)
                   (zero - one).abs
                 end
end
invoke_xobject(label) { |form| ... } click to toggle source

XObjects

# File lib/pdf/reader/page_state.rb, line 191
def invoke_xobject(label)
  save_graphics_state
  xobject = find_xobject(label)

  raise MalformedPDFError, "XObject #{label} not found" if xobject.nil?
  matrix = xobject.hash[:Matrix]
  concatenate_matrix(*matrix) if matrix

  if xobject.hash[:Subtype] == :Form
    form = PDF::Reader::FormXObject.new(@page, xobject, :cache => @cache)
    @font_stack.unshift(form.font_objects)
    @xobject_stack.unshift(form.xobjects)
    yield form if block_given?
    @font_stack.shift
    @xobject_stack.shift
  else
    yield xobject if block_given?
  end

  restore_graphics_state
end
move_text_position(x, y) click to toggle source

Text Positioning Operators

# File lib/pdf/reader/page_state.rb, line 138
def move_text_position(x, y) # Td
  temp = TransformationMatrix.new(1, 0,
                                  0, 1,
                                  x, y)
  @text_line_matrix = temp.multiply!(
    @text_line_matrix.a, @text_line_matrix.b,
    @text_line_matrix.c, @text_line_matrix.d,
    @text_line_matrix.e, @text_line_matrix.f
  )
  @text_matrix = @text_line_matrix.dup
  @font_size = @text_rendering_matrix = nil # invalidate cached value
end
move_text_position_and_set_leading(x, y) click to toggle source
# File lib/pdf/reader/page_state.rb, line 151
def move_text_position_and_set_leading(x, y) # TD
  set_text_leading(-1 * y)
  move_text_position(x, y)
end
move_to_next_line_and_show_text(str) click to toggle source
# File lib/pdf/reader/page_state.rb, line 178
def move_to_next_line_and_show_text(str) # '
  move_to_start_of_next_line
end
move_to_start_of_next_line() click to toggle source
# File lib/pdf/reader/page_state.rb, line 166
def move_to_start_of_next_line # T*
  move_text_position(0, -state[:text_leading])
end
process_glyph_displacement(w0, tj, word_boundary) click to toggle source

after each glyph is painted onto the page the text matrix must be modified. There's no defined operator for this, but depending on the use case some receivers may need to mutate the state with this while walking a page.

NOTE: some of the variable names in this method are obscure because

they mirror variable names from the PDF spec

NOTE: see Section 9.4.4, PDF 32000-1:2008, pp 252

Arguments:

w0 - the glyph width in *text space*. This generally means the width

in glyph space should be divded by 1000 before being passed to
this function

tj - any kerning that should be applied to the text matrix before the

following glyph is painted. This is usually the numeric arguments
in the array passed to a TJ operator

word_boundary - a boolean indicating if a word boundary was just

reached. Depending on the current state extra space
may need to be added
# File lib/pdf/reader/page_state.rb, line 314
def process_glyph_displacement(w0, tj, word_boundary)
  fs = font_size # font size
  tc = state[:char_spacing]
  if word_boundary
    tw = state[:word_spacing]
  else
    tw = 0
  end
  th = state[:h_scaling]
  # optimise the common path to reduce Float allocations
  if th == 1 && tj == 0 && tc == 0 && tw == 0
    tx = w0 * fs
  elsif tj != 0
    # don't apply spacing to TJ displacement
    tx = (w0 - (tj/1000.0)) * fs * th
  else
    # apply horizontal scaling to spacing values but not font size
    tx = ((w0 * fs) + tc + tw) * th
  end

  # TODO: I'm pretty sure that tx shouldn't need to be divided by
  #       ctm[0] here, but this gets my tests green and I'm out of
  #       ideas for now
  # TODO: support ty > 0
  if ctm.a == 1 || ctm.a == 0
    @text_matrix.horizontal_displacement_multiply!(tx)
  else
    @text_matrix.horizontal_displacement_multiply!(tx/ctm.a)
  end
  @font_size = @text_rendering_matrix = nil # invalidate cached value
end
restore_graphics_state() click to toggle source

Restore the state to the previous value on the stack.

# File lib/pdf/reader/page_state.rb, line 50
def restore_graphics_state
  @stack.pop
end
save_graphics_state() click to toggle source

Clones the current graphics state and push it onto the top of the stack. Any changes that are subsequently made to the state can then by reversed by calling restore_graphics_state.

# File lib/pdf/reader/page_state.rb, line 44
def save_graphics_state
  @stack.push clone_state
end
set_character_spacing(char_spacing) click to toggle source

Text State Operators

# File lib/pdf/reader/page_state.rb, line 97
def set_character_spacing(char_spacing)
  state[:char_spacing] = char_spacing
end
set_horizontal_text_scaling(h_scaling) click to toggle source
# File lib/pdf/reader/page_state.rb, line 101
def set_horizontal_text_scaling(h_scaling)
  state[:h_scaling] = h_scaling / 100.0
end
set_spacing_next_line_show_text(aw, ac, string) click to toggle source
# File lib/pdf/reader/page_state.rb, line 182
def set_spacing_next_line_show_text(aw, ac, string) # "
  set_word_spacing(aw)
  set_character_spacing(ac)
  move_to_next_line_and_show_text(string)
end
set_text_font_and_size(label, size) click to toggle source
# File lib/pdf/reader/page_state.rb, line 105
def set_text_font_and_size(label, size)
  state[:text_font]      = label
  state[:text_font_size] = size
end
set_text_leading(leading) click to toggle source
# File lib/pdf/reader/page_state.rb, line 118
def set_text_leading(leading)
  state[:text_leading] = leading
end
set_text_matrix_and_text_line_matrix(a, b, c, d, e, f) click to toggle source
# File lib/pdf/reader/page_state.rb, line 156
def set_text_matrix_and_text_line_matrix(a, b, c, d, e, f) # Tm
  @text_matrix = TransformationMatrix.new(
    a, b,
    c, d,
    e, f
  )
  @text_line_matrix = @text_matrix.dup
  @font_size = @text_rendering_matrix = nil # invalidate cached value
end
set_text_rendering_mode(mode) click to toggle source
# File lib/pdf/reader/page_state.rb, line 122
def set_text_rendering_mode(mode)
  state[:text_mode] = mode
end
set_text_rise(rise) click to toggle source
# File lib/pdf/reader/page_state.rb, line 126
def set_text_rise(rise)
  state[:text_rise] = rise
end
set_word_spacing(word_spacing) click to toggle source
# File lib/pdf/reader/page_state.rb, line 130
def set_word_spacing(word_spacing)
  state[:word_spacing] = word_spacing
end
show_text_with_positioning(params) click to toggle source

Text Showing Operators

# File lib/pdf/reader/page_state.rb, line 174
def show_text_with_positioning(params) # TJ
  # TODO record position changes in state here
end
stack_depth() click to toggle source

when save_graphics_state is called, we need to push a new copy of the current state onto the stack. That way any modifications to the state will be undone once restore_graphics_state is called.

# File lib/pdf/reader/page_state.rb, line 274
def stack_depth
  @stack.size
end
trm_transform(x, y) click to toggle source

transform x and y co-ordinates from the current text space to the underlying device space.

transforming (0,0) is a really common case, so optimise for it to avoid unnecessary object allocations

# File lib/pdf/reader/page_state.rb, line 233
def trm_transform(x, y)
  trm = text_rendering_matrix
  if x == 0 && y == 0
    [trm.e, trm.f]
  else
    [
      (trm.a * x) + (trm.c * y) + (trm.e),
      (trm.b * x) + (trm.d * y) + (trm.f)
    ]
  end
end

Private Instance Methods

build_fonts(raw_fonts) click to toggle source

wrap the raw PDF Font objects in handy ruby Font objects.

# File lib/pdf/reader/page_state.rb, line 384
def build_fonts(raw_fonts)
  wrapped_fonts = raw_fonts.map { |label, font|
    [label, PDF::Reader::Font.new(@objects, @objects.deref(font))]
  }

  ::Hash[wrapped_fonts]
end
ctm() click to toggle source

return the current transformation matrix

# File lib/pdf/reader/page_state.rb, line 374
def ctm
  state[:ctm]
end
identity_matrix() click to toggle source

This class uses 3x3 matrices to represent geometric transformations These matrices are represented by arrays with 9 elements The array [a,b,c,d,e,f,g,h,i] would represent a matrix like:

a b c
d e f
g h i
# File lib/pdf/reader/page_state.rb, line 403
def identity_matrix
  TransformationMatrix.new(1, 0,
                           0, 1,
                           0, 0)
end
state() click to toggle source
# File lib/pdf/reader/page_state.rb, line 378
def state
  @stack.last
end
text_rendering_matrix() click to toggle source

used for many and varied text positioning calculations. We potentially need to access the results of this method many times when working with text, so memoize it

# File lib/pdf/reader/page_state.rb, line 352
def text_rendering_matrix
  @text_rendering_matrix ||= begin
    state_matrix = TransformationMatrix.new(
      state[:text_font_size] * state[:h_scaling], 0,
      0, state[:text_font_size],
      0, state[:text_rise]
    )
    state_matrix.multiply!(
      @text_matrix.a, @text_matrix.b,
      @text_matrix.c, @text_matrix.d,
      @text_matrix.e, @text_matrix.f
    )
    state_matrix.multiply!(
      ctm.a, ctm.b,
      ctm.c, ctm.d,
      ctm.e, ctm.f
    )
  end
end