module Docsplit::TransparentPDFs

Include a method to transparently convert non-PDF arguments to temporary PDFs. Allows us to pretend to natively support docs, rtf, ppt, and so on.

Public Instance Methods

ensure_pdfs(docs) click to toggle source

Temporarily convert any non-PDF documents to PDFs before running them through further extraction.

# File lib/docsplit/transparent_pdfs.rb, line 9
def ensure_pdfs(docs)
  [docs].flatten.map do |doc|
    if is_pdf?(doc)
      doc
    else
      tempdir = File.join(Dir.tmpdir, 'docsplit')
      extract_pdf([doc], {:output => tempdir})
      File.join(tempdir, File.basename(doc, File.extname(doc)) + '.pdf')
    end
  end
end
is_pdf?(doc) click to toggle source
# File lib/docsplit/transparent_pdfs.rb, line 21
def is_pdf?(doc)
  File.extname(doc).downcase == '.pdf' || File.open(doc, 'rb', &:readline).force_encoding("BINARY") =~ /\A\%PDF-\d+(\.\d+)?/
end