PDF files manipulation with command-line programs¶
Introduction¶
This short page is where I keep track of the commands I use to manipulate PDF files with my Debian GNU/Linux box.
Those commands are actually invocations of gs
, ps2pdf
(both provided by
package ghostscript
) and pdf2text
(provided by package
poppler-utils
).
PDFtk might be a solution as well, but I haven’t used it yet.
Merging (“concatenating”) PDF files¶
Merge files doc_1.pdf, doc_2.pdf and doc_3.pdf to file output.pdf with:
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=output.pdf \
doc_1.pdf doc_2.pdf doc_3.pdf
Extracting specified pages from a PDF file¶
Extract pages 2 to 4 from file input.pdf to file output.pdf with:
gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -sOutputFile=output.pdf \
-dFirstPage=2 -dLastPage=4 input.pdf
Removing a password from a PDF file¶
Assuming the file input.pdf is password-protected and that you know the password, create file output.pdf as a copy of file input.pdf, but with no protection with:
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sPDFPassword="the password"
-sOutputFile=output.pdf input.pdf
(Provide the password via the -sPDFPassword
option.)
Reducing PDF file size¶
I observe that a command like:
ps2pdf input.pdf output.pdf
duplicates input.pdf to output.pdf, but with a smaller size. Not sure what’s going on here. It might a matter of default settings (-dPDFSETTINGS option)
Converting PDF file to plain text file¶
Convert PDF file input.pdf to plain text file output.txt with:
pdftotext input.pdf output.txt
With the -layout
option, the original physical layout of the text is
preserved as best as possible:
pdftotext -layout input.pdf output.txt