class PDFBeads::PDFBuilder::PDFTOC

Read table of contents from an UTF-8 text file and prepare it for placing into a PDF document. The syntax of the TOC file is simple. Each line describes a single outline item according to the following pattern:

<indent>“Title” “Page Number” [0|-|1|+]

The indent is used to determine the level of this outline item: it may consist either of spaces or of tabs, but it is not allowed to mix both characters in the same file. The title and page number are separated with an arbitrary number of whitespace characters and are normally enclosed into double quotes. The third, optional argument specifies if this TOC item should be displayed unfolded by default (i. e. if its descendants should be visible).

The reference to a TOC file can be passed to pdfbeads via the -C (or –toc) option. It is recommended to use this option in combination with the -L (or –labels) parameter, which allows to specify an alternate page numbering for a PDF file: thus your TOC file may contain the same page numbers, as the original book, so that there is no need to care about any numbering offsets.

Public Class Methods

new( fpath ) click to toggle source
# File lib/pdfbeads/pdftoc.rb, line 79
def initialize( fpath )
  root = PDFTOCItem[
    :indent   => -1,
    :open     => true,
    :children => Array.new()
  ]
  push( root )
  parseTOC( fpath,root )
end

Private Instance Methods

parseTOC( path,root ) click to toggle source
# File lib/pdfbeads/pdftoc.rb, line 91
def parseTOC( path,root )
  File.open( path,'r' ) do |fin|
    fin.set_encoding 'UTF-8' if fin.respond_to? :set_encoding
    prev = root
    indent_char = "\x00"
    fin.each do |fl|
      next if /^\#/.match( fl )

      parts = fl.scan(/".*?"|\S+/)
      if parts.length > 1
        title = parts[0].gsub(/\A"/m,"").gsub(/"\Z/m, "")
        ref   = parts[1].gsub(/\A"/m,"").gsub(/"\Z/m, "")
        begin
          if title.respond_to? :encode
            title.encode!( "utf-16be", "utf-8" )
          else
            title = Iconv.iconv( "utf-16be", "utf-8", title ).first
          end
        rescue
          $stderr.puts("Error: TOC should be specified in utf-8")
          return
        end

        entry = PDFTOCItem[
          :title    => title,
          :ref      => ref,
          :indent   => 0,
          :children => Array.new()
        ]
        if /^([ \t]+)/.match(fl)
          indent = $1
          indent.each_byte do |char|
            if indent_char == "\x00"
              indent_char = char
            elsif not char.eql? indent_char
              $stderr.puts("Error: you should not mix spaces and tabs in TOC indents\n")
              return
            end
          end

          entry[:indent] = indent.length
        end

        if entry[:indent] < prev[:indent]
          prev = prev.prevSibling( entry[:indent] )
        end
        if prev.nil?
          $stderr.puts("Error: a TOC item seems to have a wrong indent\n")
          return
        end

        if entry[:indent] == prev[:indent]
          entry[:parent] = prev[:parent]
          entry[:parent][:children].push( entry )
          entry[:prev] = prev
          prev[:next] = entry
        elsif entry[:indent] > prev[:indent]
          entry[:parent] = prev
          prev[:children].push(entry)
        end

        if parts.length > 2 and (parts[2] == '+' or parts[2] == '1')
          entry[:open] = true
        else
          entry[:open] = false
        end

        push( entry )
        prev = entry
      end
    end
  end
end