Skip to content

Commit 81f9953

Browse files
committed
Merge branch 'release/0.4'
2 parents 395ba81 + d617e9b commit 81f9953

12 files changed

+577
-33
lines changed

CHANGES.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
Version Date Changes
33
------- -------- ------
44

5+
v0.4 10/28/13 Added early Evernote upload support
56
v0.3.1 10/24/13 Path fix on windows
67
v0.3 10/23/13 Added filing of converted pdfs using a configuration file to specify target directories based on keyword matches in the pdf text
78
v0.2.2 10/22/13 Added a console script to put the pypdfocr script into your bin

README.md

Lines changed: 53 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
# PyPDFOCR
2-
This program will help manage your scanned PDFs for you. It can do the following:
2+
This program will help manage your scanned PDFs by doing the following:
33

44
- Take a scanned PDF file and run OCR on it (using free OCR tools), generating a searchable PDF
55
- Optionally, watch a folder for incoming scanned PDFs and automatically run OCR on them
66
- Optionally, file the scanned PDFs into directories based on simple keyword matching that you specify
7-
- _Coming soon_: Evernote auto-upload and filing
7+
- **New:** Evernote auto-upload and filing based on keyword search
88

99
More links:
1010

11-
- [Blog](http://virantha.com/categories/projects/pypdfocr)
12-
- [Documentation](http://documentup.com/virantha/pypdfocr)
13-
- [Source](https://www.github.com/virantha/pypdfocr)
11+
- [Blog @ virantha.com](http://virantha.com/category/projects/pypdfocr)
12+
- [Documentation @ documentup.com](http://documentup.com/virantha/pypdfocr)
13+
- [Source @ github](https://www.github.com/virantha/pypdfocr)
1414

1515

1616
## Usage:
@@ -24,7 +24,7 @@ More links:
2424

2525
--> Every time a pdf file is added to `watch_directory` it will be OCR'ed
2626

27-
### Automatic filing (new!):
27+
### Automatic filing:
2828
To automatically move the OCR'ed pdf to a directory based on a keyword, use the -f option
2929
and specify a configuration file (described below):
3030

@@ -75,9 +75,53 @@ commented out, your original PDF will stay where it was found.
7575
If there is any naming conflict during filing, the program will add an underscore followed by a
7676
number to each filename, in order to avoid overwriting files that may already be present.
7777

78+
### Evernote upload(new!):
79+
#### Evernote authentication token
80+
To enable Evernote support, you will need to [get a developer token for your
81+
Evernote account.](https://www.evernote.com/api/DeveloperToken.action). You
82+
should note that this script will never delete or modify existing notes in your
83+
account, and limits itself to creating new Notebooks and Notes.
84+
Once you get that token, you copy and paste it into your configuration file
85+
as shown below
86+
87+
#### Evernote filing usage
88+
To automatically upload the OCR'ed pdf to a folder based on a keyword, use the
89+
``-e`` option instead of the ``-f`` auto filing option.
90+
91+
pypdfocr filename.pdf -e -c config.yaml
92+
93+
Similarly, you can also do this in folder monitoring mode:
94+
95+
pypdfocr -w watch_directory -e -c config.yaml
96+
97+
#### Evernote filing configuration file
98+
The config file shown above only needs to change slightly. The folders section
99+
is completely unchanged, but note that ``target_folder`` is the name of your
100+
"Notebook stack" in Evernote, and the ``default_folder`` should just be the
101+
default Evernote upload notebook name.
102+
103+
target_folder: "evernote_stack"
104+
default_folder: "default"
105+
original_move_folder: "docs/originals"
106+
evernote_developer_token: "YOUR_TOKEN"
107+
108+
folders:
109+
finances:
110+
- american express
111+
- chase card
112+
- internal revenue service
113+
travel:
114+
- boarding pass
115+
- airlines
116+
- expedia
117+
- orbitz
118+
receipts:
119+
- receipt
78120
## Caveats
79-
This code is brand-new, and is barely commented with no unit-tests included. I plan to improve
80-
things as time allows in the near-future. Sphinx code generation is on my TODO list.
121+
This code is brand-new, and incorporation of unit-testing is just starting. I
122+
plan to improve things as time allows in the near-future. Sphinx code
123+
generation is on my TODO list. The software is distributed on an "AS IS"
124+
BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
81125

82126
## Installation
83127
### Using pip
@@ -114,7 +158,7 @@ These can all be installed via pip:
114158

115159
You will also need to install the external dependencies listed below.
116160

117-
## External Dependencies:
161+
## External Dependencies
118162
PyPDFOCR relies on the following (free) programs being installed and in the path:
119163

120164
- Tesseract OCR software https://code.google.com/p/tesseract-ocr/

README.rst

Lines changed: 82 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,23 @@
11
PyPDFOCR
22
========
33

4-
This program will help manage your scanned PDFs for you. It can do the
5-
following:
4+
This program will help manage your scanned PDFs by doing the following:
65

76
- Take a scanned PDF file and run OCR on it (using free OCR tools),
87
generating a searchable PDF
98
- Optionally, watch a folder for incoming scanned PDFs and
109
automatically run OCR on them
1110
- Optionally, file the scanned PDFs into directories based on simple
1211
keyword matching that you specify
13-
- *Coming soon*: Evernote auto-upload and filing
12+
- **New:** Evernote auto-upload and filing based on keyword search
1413

1514
More links:
1615

17-
- `Blog <http://virantha.com/categories/projects/pypdfocr>`__
18-
- `Documentation <http://documentup.com/virantha/pypdfocr>`__
19-
- `Source <https://www.github.com/virantha/pypdfocr>`__
16+
- `Blog @
17+
virantha.com <http://virantha.com/category/projects/pypdfocr>`__
18+
- `Documentation @
19+
documentup.com <http://documentup.com/virantha/pypdfocr>`__
20+
- `Source @ github <https://www.github.com/virantha/pypdfocr>`__
2021

2122
Usage:
2223
------
@@ -39,8 +40,8 @@ Folder monitoring:
3940

4041
--> Every time a pdf file is added to `watch_directory` it will be OCR'ed
4142

42-
Automatic filing (new!):
43-
~~~~~~~~~~~~~~~~~~~~~~~~
43+
Automatic filing:
44+
~~~~~~~~~~~~~~~~~
4445

4546
To automatically move the OCR'ed pdf to a directory based on a keyword,
4647
use the -f option and specify a configuration file (described below):
@@ -104,12 +105,72 @@ If there is any naming conflict during filing, the program will add an
104105
underscore followed by a number to each filename, in order to avoid
105106
overwriting files that may already be present.
106107

108+
Evernote upload(new!):
109+
~~~~~~~~~~~~~~~~~~~~~~
110+
111+
Evernote authentication token
112+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113+
114+
To enable Evernote support, you will need to `get a developer token for
115+
your Evernote
116+
account. <https://www.evernote.com/api/DeveloperToken.action>`__. You
117+
should note that this script will never delete or modify existing notes
118+
in your account, and limits itself to creating new Notebooks and Notes.
119+
Once you get that token, you copy and paste it into your configuration
120+
file as shown below
121+
122+
Evernote filing usage
123+
^^^^^^^^^^^^^^^^^^^^^
124+
125+
To automatically upload the OCR'ed pdf to a folder based on a keyword,
126+
use the ``-e`` option instead of the ``-f`` auto filing option.
127+
128+
::
129+
130+
pypdfocr filename.pdf -e -c config.yaml
131+
132+
Similarly, you can also do this in folder monitoring mode:
133+
134+
::
135+
136+
pypdfocr -w watch_directory -e -c config.yaml
137+
138+
Evernote filing configuration file
139+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
140+
141+
The config file shown above only needs to change slightly. The folders
142+
section is completely unchanged, but note that ``target_folder`` is the
143+
name of your "Notebook stack" in Evernote, and the ``default_folder``
144+
should just be the default Evernote upload notebook name.
145+
146+
::
147+
148+
target_folder: "evernote_stack"
149+
default_folder: "default"
150+
original_move_folder: "docs/originals"
151+
evernote_developer_token: "YOUR_TOKEN"
152+
153+
folders:
154+
finances:
155+
- american express
156+
- chase card
157+
- internal revenue service
158+
travel:
159+
- boarding pass
160+
- airlines
161+
- expedia
162+
- orbitz
163+
receipts:
164+
- receipt
165+
107166
Caveats
108167
-------
109168

110-
This code is brand-new, and is barely commented with no unit-tests
111-
included. I plan to improve things as time allows in the near-future.
112-
Sphinx code generation is on my TODO list.
169+
This code is brand-new, and incorporation of unit-testing is just
170+
starting. I plan to improve things as time allows in the near-future.
171+
Sphinx code generation is on my TODO list. The software is distributed
172+
on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
173+
either express or implied.
113174

114175
Installation
115176
------------
@@ -141,11 +202,14 @@ Clone the source directly from github (you need to have git installed):
141202

142203
git clone https://github.com/virantha/pypdfocr.git
143204

144-
Then, install the following third-party python libraries: - PIL (Python
145-
Imaging Library) http://www.pythonware.com/products/pil/ - ReportLab
146-
(PDF generation library) http://www.reportlab.com/software/opensource/ -
147-
Watchdog (Cross-platform fhlesystem events monitoring)
148-
https://pypi.python.org/pypi/watchdog - PyPDF2 (Pure python pdf library)
205+
Then, install the following third-party python libraries:
206+
207+
- PIL (Python Imaging Library) http://www.pythonware.com/products/pil/
208+
- ReportLab (PDF generation library)
209+
http://www.reportlab.com/software/opensource/
210+
- Watchdog (Cross-platform fhlesystem events monitoring)
211+
https://pypi.python.org/pypi/watchdog
212+
- PyPDF2 (Pure python pdf library)
149213

150214
These can all be installed via pip:
151215

@@ -158,8 +222,8 @@ These can all be installed via pip:
158222

159223
You will also need to install the external dependencies listed below.
160224

161-
External Dependencies:
162-
----------------------
225+
External Dependencies
226+
---------------------
163227

164228
PyPDFOCR relies on the following (free) programs being installed and in
165229
the path:

dist/pypdfocr.exe

252 KB
Binary file not shown.

pypdfocr/pypdfocr.py

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515

16+
import smtplib
1617
import argparse
1718
import sys, os
1819
import logging
@@ -28,6 +29,7 @@
2829
from pypdfocr_watcher import PyPdfWatcher
2930
from pypdfocr_pdffiler import PyPdfFiler
3031
from pypdfocr_filer_dirs import PyFilerDirs
32+
from pypdfocr_filer_evernote import PyFilerEvernote
3133

3234
def error(text):
3335
print("ERROR: %s" % text)
@@ -86,6 +88,7 @@ def get_options(self, argv):
8688
:ivar watch_dir: Directory to watch for files to convert
8789
:ivar config: Dict of the config file
8890
:ivar watch: Whether folder watching mode is turned on
91+
:ivar enable_evernote: Enable filing to evernote
8992
9093
"""
9194
p = argparse.ArgumentParser(
@@ -117,6 +120,8 @@ def get_options(self, argv):
117120
default=False, dest='enable_filing', help='Enable filing of converted PDFs')
118121
filing_group.add_argument('-c', '--config', type = argparse.FileType('r'),
119122
dest='configfile', help='Configuration file for defaults and PDF filing')
123+
filing_group.add_argument('-e', '--evernote', action='store_true',
124+
default=False, dest='enable_evernote', help='Enable filing to Evernote')
120125

121126

122127
args = p.parse_args(argv)
@@ -138,7 +143,12 @@ def get_options(self, argv):
138143
logging.debug("Read in configuration file")
139144
logging.debug(self.config)
140145

141-
if args.enable_filing:
146+
if args.enable_evernote:
147+
self.enable_evernote = True
148+
else:
149+
self.enable_evernote = False
150+
151+
if args.enable_filing or args.enable_evernote:
142152
self.enable_filing = True
143153
if not args.configfile:
144154
p.error("Please specify a configuration file(CONFIGFILE) to enable filing")
@@ -182,7 +192,11 @@ def _setup_filing(self):
182192
original_move_folder = None
183193

184194
# Start the filing object
185-
self.filer = PyFilerDirs()
195+
if self.enable_evernote:
196+
self.filer = PyFilerEvernote(self.config['evernote_developer_token'])
197+
else:
198+
self.filer = PyFilerDirs()
199+
186200
self.filer.target_folder = self.config['target_folder']
187201
self.filer.default_folder = self.config['default_folder']
188202
self.filer.original_move_folder = original_move_folder
@@ -221,11 +235,40 @@ def file_converted_file(self, ocr_pdffilename, original_pdffilename):
221235
if tgt_path != original_pdffilename:
222236
print("Filed original file %s to %s as %s" % (original_pdffilename, os.path.dirname(tgt_path), os.path.basename(tgt_path)))
223237

238+
239+
def _send_email(self, from_addr, to_addr_list, cc_addr_list,
240+
subject, message,
241+
login, password,
242+
smtpserver):
243+
header = 'From: %s\n' % from_addr
244+
header += 'To: %s\n' % ','.join(to_addr_list)
245+
header += 'Cc: %s\n' % ','.join(cc_addr_list)
246+
header += 'Subject: %s\n\n' % subject
247+
message = header + message
248+
249+
server = smtplib.SMTP(smtpserver)
250+
server.starttls()
251+
server.login(login,password)
252+
problems = server.sendmail(from_addr, to_addr_list, message)
253+
server.quit()
254+
224255
def go(self, argv):
225256

226257
# Read the command line options
227258
self.get_options(argv)
228259

260+
#
261+
#self._send_email(
262+
#from_addr="[email protected]",
263+
#to_addr_list=["[email protected]"],
264+
#cc_addr_list = [],
265+
#subject = "PyPDFOCR upload",
266+
#message = "Uploaded email\n\n-Virantha",
267+
#login = "[email protected]",
268+
#password = "cctahvuntxbuwmox",
269+
#smtpserver = "smtp.gmail.com:587",
270+
#)
271+
229272
# Setup the pdf filing if enabled
230273
if self.enable_filing:
231274
self._setup_filing()

0 commit comments

Comments
 (0)