Skip to content
This repository was archived by the owner on Nov 30, 2022. It is now read-only.

Commit 9c9bb90

Browse files
committed
all files added
1 parent c7419b6 commit 9c9bb90

File tree

4 files changed

+50
-0
lines changed

4 files changed

+50
-0
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Vscode files
2+
.vscode
3+
4+
# Sample Files
5+
sample.pdf
6+
sample2.pdf
7+
8+
# Python
9+
__pycache__
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# PDF to CSV
2+
This scrpit will convert the tables in the PDF file into CSV files. Each CSV file has one table from the PDF and number of CSV equal to number of tables in the PDF.
3+
4+
# Requirements
5+
`pip install tabula-py, pandas`
6+
7+
# How to use?
8+
Just use the following command while executing the scrpit:
9+
10+
`python app.py location_of_pdf pages`
11+
12+
Pages have two options:
13+
- 'all' will extract tables from whole PDF
14+
- specific page (ex 1,2,54..) will extract table from that page
15+
16+
Example:
17+
- `python app.py sample.pdf all`
18+
- `python app.py sample2.pdf 45`
19+
20+
# Preview
21+
22+
![](preview.gif)
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
import tabula
2+
import pandas as pd
3+
import sys
4+
5+
def extract(path, number_pages):
6+
tables = tabula.read_pdf(path, multiple_tables=True, pages=number_pages)
7+
count = 1
8+
if len(tables)!=0:
9+
for table in tables:
10+
print
11+
print(f"Saving file -{count}")
12+
table.to_csv(f'Table- {count}.csv')
13+
count += 1
14+
print("All tables saved as seperate files !")
15+
else:
16+
print("No tables found !")
17+
18+
if __name__ == "__main__":
19+
extract(sys.argv[1], sys.argv[2])
Loading

0 commit comments

Comments
 (0)