Skip to content

Commit c6c3a8c

Browse files
committed
add pdf merger tutorial
1 parent 8cf7c5c commit c6c3a8c

File tree

6 files changed

+88
-0
lines changed

6 files changed

+88
-0
lines changed

Diff for: README.md

+1
Original file line numberDiff line numberDiff line change
@@ -158,5 +158,6 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
158158
- [How to Convert PDF to Images in Python](https://www.thepythoncode.com/article/convert-pdf-files-to-images-in-python). ([code](handling-pdf-files/convert-pdf-to-image))
159159
- [How to Compress PDF Files in Python](https://www.thepythoncode.com/article/compress-pdf-files-in-python). ([code](handling-pdf-files/pdf-compressor))
160160
- [How to Encrypt and Decrypt PDF Files in Python](https://www.thepythoncode.com/article/encrypt-pdf-files-in-python). ([code](handling-pdf-files/encrypt-pdf))
161+
- [How to Merge PDF Files in Python](https://www.thepythoncode.com/article/merge-pdf-files-in-python). ([code](handling-pdf-files/pdf-merger))
161162

162163
For any feedback, please consider pulling requests.

Diff for: handling-pdf-files/pdf-merger/README.md

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# [How to Merge PDF Files in Python](https://www.thepythoncode.com/article/merge-pdf-files-in-python)
2+
To run this:
3+
- `pip3 install -r requirements.txt`
4+
-
5+
```
6+
$ python pdf_merger.py --help
7+
```
8+
**Output:**
9+
```
10+
usage: pdf_merger.py [-h] -i [INPUT_FILES [INPUT_FILES ...]] [-p [PAGE_RANGE [PAGE_RANGE ...]]] -o OUTPUT_FILE [-b BOOKMARK]
11+
12+
Available Options
13+
14+
optional arguments:
15+
-h, --help show this help message and exit
16+
-i [INPUT_FILES [INPUT_FILES ...]], --input_files [INPUT_FILES [INPUT_FILES ...]]
17+
Enter the path of the files to process
18+
-p [PAGE_RANGE [PAGE_RANGE ...]], --page_range [PAGE_RANGE [PAGE_RANGE ...]]
19+
Enter the pages to consider e.g.: (0,2) -> First 2 pages
20+
-o OUTPUT_FILE, --output_file OUTPUT_FILE
21+
Enter a valid output file
22+
-b BOOKMARK, --bookmark BOOKMARK
23+
Bookmark resulting file
24+
```
25+
- To merge `bert-paper.pdf` with `letter.pdf` into a new `combined.pdf`:
26+
```
27+
$ python pdf_merger.py -i bert-paper.pdf,letter.pdf -o combined.pdf
28+
```

Diff for: handling-pdf-files/pdf-merger/bert-paper.pdf

757 KB
Binary file not shown.

Diff for: handling-pdf-files/pdf-merger/letter.pdf

2.15 KB
Binary file not shown.

Diff for: handling-pdf-files/pdf-merger/pdf_merger.py

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Import Libraries
2+
from PyPDF4 import PdfFileMerger
3+
import os
4+
import argparse
5+
6+
7+
def merge_pdfs(input_files: list, page_range: tuple, output_file: str, bookmark: bool = True):
8+
"""
9+
Merge a list of PDF files and save the combined result into the `output_file`.
10+
`page_range` to select a range of pages (behaving like Python's range() function) from the input files
11+
e.g (0,2) -> First 2 pages
12+
e.g (0,6,2) -> pages 1,3,5
13+
bookmark -> add bookmarks to the output file to navigate directly to the input file section within the output file.
14+
"""
15+
# strict = False -> To ignore PdfReadError - Illegal Character error
16+
merger = PdfFileMerger(strict=False)
17+
for input_file in input_files:
18+
bookmark_name = os.path.splitext(os.path.basename(input_file))[0] if bookmark else None
19+
# pages To control which pages are appended from a particular file.
20+
merger.append(fileobj=open(input_file, 'rb'), pages=page_range, bookmark=bookmark_name)
21+
# Insert the pdf at specific page
22+
merger.write(fileobj=open(output_file, 'wb'))
23+
merger.close()
24+
25+
26+
def parse_args():
27+
"""Get user command line parameters"""
28+
parser = argparse.ArgumentParser(description="Available Options")
29+
parser.add_argument('-i', '--input_files', dest='input_files', nargs='*',
30+
type=str, required=True, help="Enter the path of the files to process")
31+
parser.add_argument('-p', '--page_range', dest='page_range', nargs='*',
32+
help="Enter the pages to consider e.g.: (0,2) -> First 2 pages")
33+
parser.add_argument('-o', '--output_file', dest='output_file',
34+
required=True, type=str, help="Enter a valid output file")
35+
parser.add_argument('-b', '--bookmark', dest='bookmark', default=True, type=lambda x: (
36+
str(x).lower() in ['true', '1', 'yes']), help="Bookmark resulting file")
37+
# To Parse The Command Line Arguments
38+
args = vars(parser.parse_args())
39+
# To Display The Command Line Arguments
40+
print("## Command Arguments #################################################")
41+
print("\n".join("{}:{}".format(i, j) for i, j in args.items()))
42+
print("######################################################################")
43+
return args
44+
45+
46+
if __name__ == "__main__":
47+
# Parsing command line arguments entered by user
48+
args = parse_args()
49+
# convert a single str to a list
50+
input_files = [str(x) for x in args['input_files'][0].split(',')]
51+
page_range = None
52+
if args['page_range']:
53+
page_range = tuple(int(x) for x in args['page_range'][0].split(','))
54+
# call the main function
55+
merge_pdfs(
56+
input_files=input_files, page_range=page_range,
57+
output_file=args['output_file'], bookmark=args['bookmark']
58+
)

Diff for: handling-pdf-files/pdf-merger/requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
PyPDF4==1.27.0

0 commit comments

Comments
 (0)