added pdf table extractor tutorial

x4nth055 · x4nth055 · commit 977692119304 · 2019-10-26T16:10:54.000+02:00
diff --git a/README.md b/README.md
@@ -47,6 +47,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
     - [How to Generate and Read QR Code in Python](https://www.thepythoncode.com/article/generate-read-qr-code-python). ([code](general/generating-reading-qrcode))
     - [How to Download Files in Python](https://www.thepythoncode.com/article/download-files-python). ([code](general/file-downloader))
     - [How to Compress and Decompress Files in Python](https://www.thepythoncode.com/article/compress-decompress-files-tarfile-python). ([code](general/compressing-files))
+    - [How to Extract PDF Tables in Python](https://www.thepythoncode.com/article/extract-pdf-tables-in-python-camelot). ([code](general/pdf-table-extractor))
     
 - ### [Web Scraping](https://www.thepythoncode.com/topic/web-scraping)
     - [How to Access Wikipedia in Python](https://www.thepythoncode.com/article/access-wikipedia-python). ([code](web-scraping/wikipedia-extractor))
diff --git a/general/pdf-table-extractor/README.md b/general/pdf-table-extractor/README.md
@@ -0,0 +1,8 @@
+# [How to Extract PDF Tables in Python](https://www.thepythoncode.com/article/extract-pdf-tables-in-python-camelot)
+To run this:
+- You need to install required dependencies for the library [here](https://camelot-py.readthedocs.io/en/master/user/install-deps.html#install-deps).
+- `pip3 install -r requirements.txt`
+- Extract PDFs of the file `foo.pdf`:
+    ```
+    python pdf_table_extractor.py foo.pdf
+    ```
diff --git a/general/pdf-table-extractor/foo.pdf b/general/pdf-table-extractor/foo.pdf
diff --git a/general/pdf-table-extractor/pdf_table_extractor.py b/general/pdf-table-extractor/pdf_table_extractor.py
@@ -0,0 +1,23 @@
+import camelot
+import sys
+
+# PDF file to extract tables from (from command-line)
+file = sys.argv[1]
+
+# extract all the tables in the PDF file
+tables = camelot.read_pdf(file)
+
+# number of tables extracted
+print("Total tables extracted:", tables.n)
+
+# print the first table as Pandas DataFrame
+print(tables[0].df)
+
+# export individually
+tables[0].to_csv("foo.csv")
+
+# or export all in a zip
+tables.export("foo.csv", f="csv", compress=True)
+
+# export to HTML
+tables.export("foo.html", f="html")
diff --git a/general/pdf-table-extractor/requirements.txt b/general/pdf-table-extractor/requirements.txt
@@ -0,0 +1 @@
+camelot-py[cv]