Book Dumping and Book Binding Scripts
Aaron M. Schinder
18 Mar 2021
Introduction
These scripts are intended to dump image files from pdfs and bind image files into pdfs. I wrote these primarily to convert Archive.org pdfs from the highly compressed format they use to less compressed but more readable pdfs built from the images.
The various Linux pdf readers have difficulty unpacking archive.org books. It’s a known problem. (It takes a very long time to load pages in whatever format or scheme archive.org uses internally)
The idea here is that hard drive space is cheaper than time, especially the loading time for the pages which may make paging and skimming through a book untenable. Paging through images can be done almost as fast as flipping through a physical book, so it doesn’t impede using them in a similar manner.
Dependencies
These tools were written to work on Linux (I don’t know if they’ll choke on Windows or not – may require some fiddling.)
They require the imagemagick command line tools (convert). Imagemagick may need the settings adjusted in /etc to avoid running out of memory.
They require pdfinfo and pdfunite and pdfinfo utilities from the poppler package/library.
Contents
ams_dumplargepdf.py : Dumps a pdf to image files in a directory named after the pdf file.
ams_dumpallbooks.py : Iterates over all pdf files in the directory it is called in.
ams_bindbook.py : takes all the image files in a directory and creates pdf fragments in stages. The pdfunite is called to concatenate all the pdfs into a new bound pdf. (Warning, it will name the pdf the same as the directory name: shuffle the files around to avoid an overwrite.)
ams_bindallbooks.py : takes all the book image directories and makes pdfs from them.
Comments (0)