PDFBeads – convert scanned images to a single PDF file Version 1.0 (November 2010)

Copyright © 2010 Alexey Kryukov (amkryukov@gmail.com). All rights reserved.

PDFBeads is a small utility written in Ruby which takes scanned page images and converts them into a single PDF file. Unlike other PDF creation tools, PDFBeads attempts to implement the approach typically used for DjVu books. Its key feature is separating scanned text (typically black, but indexed images with a small number of colors are also accepted) from halftone pictures. Each type of graphical data is encoded into its own layer with a specific compression method and resolution.

The name ‘PDFBeads’ has been selected for the package because building PDF files from separate image is comparable to threading beads on a string. It also seems to be a good choice for a Ruby application.

Here’s a few operations you can perform with PDFBeads:

Note that PDFBeads is intended for creating PDF files from previously processed images, and so it can’t done some operations (e. g. converting color or grayscale scans to B&W) which should be typically performed with a special scan processing application, such as ScanTailor.

PDFBeads requires RMagick (the Ruby bindings for the popular Magick++ image processing library). The hpricot extension is not required, but highly recommended, as without it PDFBeads would not be able to read data from hOCR files.