PDF files manipulation with command-line programs

Introduction

This short page is where I keep track of the commands I use to manipulate PDF files with my Debian GNU/Linux box.

Those commands are actually invocations of gs, ps2pdf (both provided by package ghostscript) and pdf2text (provided by package poppler-utils).

PDFtk might be a solution as well, but I haven’t used it yet.

Merging (“concatenating”) PDF files

Merge files doc_1.pdf, doc_2.pdf and doc_3.pdf to file output.pdf with:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=output.pdf \
    doc_1.pdf doc_2.pdf doc_3.pdf

Extracting specified pages from a PDF file

Extract pages 2 to 4 from file input.pdf to file output.pdf with:

gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -sOutputFile=output.pdf \
    -dFirstPage=2 -dLastPage=4 input.pdf

Removing a password from a PDF file

Assuming the file input.pdf is password-protected and that you know the password, create file output.pdf as a copy of file input.pdf, but with no protection with:

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sPDFPassword="the password"
-sOutputFile=output.pdf input.pdf

(Provide the password via the -sPDFPassword option.)

Reducing PDF file size

I observe that a command like:

ps2pdf input.pdf output.pdf

duplicates input.pdf to output.pdf, but with a smaller size. Not sure what’s going on here. It might a matter of default settings (-dPDFSETTINGS option)

Converting PDF file to plain text file

Convert PDF file input.pdf to plain text file output.txt with:

pdftotext input.pdf output.txt

With the -layout option, the original physical layout of the text is preserved as best as possible:

pdftotext -layout input.pdf output.txt

Other resources