Feature | PyMuPDF | pikepdf | PyPDF2 | pdfrw | pdfplumber / pdfminer |
---|---|---|---|---|---|
Supports Multiple Document Formats |
PDF
XPS
EPUB
MOBI
FB2
CBZ
SVG
TXT
Image
DOCX XLSX PPTX HWPX See note |
||||
Implementation | Python and C | Python and C++ | Python | Python | Python |
Render Document Pages | All document types | No rendering | No rendering | No rendering | No rendering |
Write Text to PDF Page |
See: Page.insert_htmlbox or: Page.insert_textbox or: TextWriter |
||||
Supports CJK characters | |||||
Extract Text | All document types | PDF only | PDF only | ||
Extract Text as Markdown (.md) | All document types | ||||
Extract Tables | All document types | PDF only | |||
Extract Vector Graphics | All document types | Limited | |||
Draw Vector Graphics (PDF) | |||||
Based on Existing, Mature Library | MuPDF | QPDF | |||
Automatic Repair of Damaged PDFs | |||||
Encrypted PDFs | Limited | Limited | |||
Linerarized PDFs | |||||
Incremental Updates | |||||
Integrates with Jupyter and IPython Notebooks | |||||
Joining / Merging PDF with other Document Types | All document types | PDF only | PDF only | PDF only | PDF only |
OCR API for Seamless Integration with Tesseract | All document types | ||||
Integrated Checkpoint / Restart Feature (PDF) | |||||
PDF Optional Content | |||||
PDF Embedded Files | Limited | Limited | |||
PDF Redactions | |||||
PDF Annotations | Full | Limited | |||
PDF Form Fields | Create, read, update | Limited, no creation | |||
PDF Page Labels | |||||
Support Font Sub-Setting |