module Docsplit::TransparentPDFs
Include a method to transparently convert non-PDF arguments to temporary PDFs. Allows us to pretend to natively support docs, rtf, ppt, and so on.
Public Instance Methods
ensure_pdfs(docs)
click to toggle source
Temporarily convert any non-PDF documents to PDFs before running them through further extraction.
# File lib/docsplit/transparent_pdfs.rb, line 7 def ensure_pdfs(docs) [docs].flatten.map do |doc| if is_pdf?(doc) doc else tempdir = File.join(Dir.tmpdir, 'docsplit') extract_pdf([doc], output: tempdir) File.join(tempdir, File.basename(doc, File.extname(doc)) + '.pdf') end end end
is_pdf?(doc)
click to toggle source
# File lib/docsplit/transparent_pdfs.rb, line 19 def is_pdf?(doc) File.extname(doc).casecmp('.pdf').zero? || File.open(doc, 'rb', &:readline) =~ /\A\%PDF-\d+(\.\d+)?/ end