extractText {orderanalyzer} | R Documentation |
Extracts the text from a PDF file
Description
This function extracts text from PDF documents and returns the text as a string, as a list of lines and as a list of words. It uses 'pdftools' to extract the content from textual PDF files and 'tesseract' to extract the content from image-based PDF-files.
Usage
extractText(file)
Arguments
file |
Path to the PDF file |
Value
List including the extracted text, a data table including the lines, a data table including the words, the type and language of the document.
Examples
file <- system.file("extdata", "OrderDocument_en.pdf", package = "orderanalyzer")
text <- extractText(file)
text$words
[Package orderanalyzer version 1.0.0 Index]