Skip to content

Commit 37824b6

Browse files
committed
add pdf compressor tutorial
1 parent bb35419 commit 37824b6

File tree

5 files changed

+68
-0
lines changed

5 files changed

+68
-0
lines changed

Diff for: README.md

+1
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
9797
- [How to Extract Text from Images in PDF Files with Python](https://www.thepythoncode.com/article/extract-text-from-images-or-scanned-pdf-python). ([code](handling-pdf-files/pdf-ocr))
9898
- [How to Convert PDF to Docx in Python](https://www.thepythoncode.com/article/convert-pdf-files-to-docx-in-python). ([code](handling-pdf-files/convert-pdf-to-docx))
9999
- [How to Convert PDF to Images in Python](https://www.thepythoncode.com/article/convert-pdf-files-to-images-in-python). ([code](handling-pdf-files/convert-pdf-to-image))
100+
- [How to Compress PDF Files in Python](https://www.thepythoncode.com/article/compress-pdf-files-in-python). ([code](handling-pdf-files/pdf-compressor))
100101

101102

102103
- ### [Web Scraping](https://www.thepythoncode.com/topic/web-scraping)

Diff for: handling-pdf-files/pdf-compressor/README.md

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# [How to Compress PDF Files in Python](https://www.thepythoncode.com/article/compress-pdf-files-in-python)
2+
To run this:
3+
- `pip3 install -r requirements.txt`
4+
- To compress `bert-paper.pdf` file:
5+
```
6+
$ python pdf_compressor.py bert-paper.pdf bert-paper-min.pdf
7+
```
8+
This will spawn a new compressed PDF file under the name `bert-paper-min.pdf`.

Diff for: handling-pdf-files/pdf-compressor/bert-paper.pdf

757 KB
Binary file not shown.

Diff for: handling-pdf-files/pdf-compressor/pdf_compressor.py

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Import Libraries
2+
import os
3+
import sys
4+
from PDFNetPython3.PDFNetPython import PDFDoc, Optimizer, SDFDoc, PDFNet
5+
6+
7+
def get_size_format(b, factor=1024, suffix="B"):
8+
"""
9+
Scale bytes to its proper byte format
10+
e.g:
11+
1253656 => '1.20MB'
12+
1253656678 => '1.17GB'
13+
"""
14+
for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
15+
if b < factor:
16+
return f"{b:.2f}{unit}{suffix}"
17+
b /= factor
18+
return f"{b:.2f}Y{suffix}"
19+
20+
21+
def compress_file(input_file: str, output_file: str):
22+
"""Compress PDF file"""
23+
if not output_file:
24+
output_file = input_file
25+
initial_size = os.path.getsize(input_file)
26+
try:
27+
# Initialize the library
28+
PDFNet.Initialize()
29+
doc = PDFDoc(input_file)
30+
# Optimize PDF with the default settings
31+
doc.InitSecurityHandler()
32+
# Reduce PDF size by removing redundant information and compressing data streams
33+
Optimizer.Optimize(doc)
34+
doc.Save(output_file, SDFDoc.e_linearized)
35+
doc.Close()
36+
except Exception as e:
37+
print("Error compress_file=", e)
38+
doc.Close()
39+
return False
40+
compressed_size = os.path.getsize(output_file)
41+
ratio = 1 - (compressed_size / initial_size)
42+
summary = {
43+
"Input File": input_file, "Initial Size": get_size_format(initial_size),
44+
"Output File": output_file, f"Compressed Size": get_size_format(compressed_size),
45+
"Compression Ratio": "{0:.3%}.".format(ratio)
46+
}
47+
# Printing Summary
48+
print("## Summary ########################################################")
49+
print("\n".join("{}:{}".format(i, j) for i, j in summary.items()))
50+
print("###################################################################")
51+
return True
52+
53+
54+
if __name__ == "__main__":
55+
# Parsing command line arguments entered by user
56+
input_file = sys.argv[1]
57+
output_file = sys.argv[2]
58+
compress_file(input_file, output_file)

Diff for: handling-pdf-files/pdf-compressor/requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
PDFNetPython3==8.1.0

0 commit comments

Comments
 (0)