module PDFTDX
PDF TDX Module: Root Module for Pdftdx.
PDF TDX Module
PDF TDX Module
Constants
- VERSION
Version
Public Class Methods
extract_data(pdf_file)
click to toggle source
Extract Data from PDF: Converts a PDF file to HTML format and then extracts anything that looks like tabular data. @param [String] pdf_file Path to a PDF file @return [Array] An array of tables, each represented as a hash containing an optional header and table data, in the form of either one single array of rows, or a hash of sub-tables (arrays of rows) mapped by name. Table rows are represented as an array of table cells. Example: [{ head: ['trauma.eresse.net', 'durjaya.dooba.io', 'suessmost.eresse.net'], data: { 'System' => [['Machine OS', 'Win32', 'Linux', 'MacOS'], ['IP Address', '10.0.232.48', '10.0.232.134', '10.0.232.108']] } }]
# File lib/pdftdx.rb, line 20 def self.extract_data pdf_file # Dump PDF Data page_data = Pdftohtml.convert pdf_file # Process Page Data PDFTDX::Parser.process page_data end