1
1
PyPDFOCR
2
2
========
3
3
4
- This program will help manage your scanned PDFs for you. It can do the
5
- following:
4
+ This program will help manage your scanned PDFs by doing the following:
6
5
7
6
- Take a scanned PDF file and run OCR on it (using free OCR tools),
8
7
generating a searchable PDF
9
8
- Optionally, watch a folder for incoming scanned PDFs and
10
9
automatically run OCR on them
11
10
- Optionally, file the scanned PDFs into directories based on simple
12
11
keyword matching that you specify
13
- - *Coming soon *: Evernote auto-upload and filing
12
+ - ** New: ** Evernote auto-upload and filing based on keyword search
14
13
15
14
More links:
16
15
17
- - `Blog <http://virantha.com/categories/projects/pypdfocr >`__
18
- - `Documentation <http://documentup.com/virantha/pypdfocr >`__
19
- - `Source <https://www.github.com/virantha/pypdfocr >`__
16
+ - `Blog @
17
+ virantha.com <http://virantha.com/category/projects/pypdfocr> `__
18
+ - `Documentation @
19
+ documentup.com <http://documentup.com/virantha/pypdfocr> `__
20
+ - `Source @ github <https://www.github.com/virantha/pypdfocr >`__
20
21
21
22
Usage:
22
23
------
@@ -39,8 +40,8 @@ Folder monitoring:
39
40
40
41
--> Every time a pdf file is added to `watch_directory` it will be OCR'ed
41
42
42
- Automatic filing (new!) :
43
- ~~~~~~~~~~~~~~~~~~~~~~~~
43
+ Automatic filing:
44
+ ~~~~~~~~~~~~~~~~~
44
45
45
46
To automatically move the OCR'ed pdf to a directory based on a keyword,
46
47
use the -f option and specify a configuration file (described below):
@@ -104,12 +105,72 @@ If there is any naming conflict during filing, the program will add an
104
105
underscore followed by a number to each filename, in order to avoid
105
106
overwriting files that may already be present.
106
107
108
+ Evernote upload(new!):
109
+ ~~~~~~~~~~~~~~~~~~~~~~
110
+
111
+ Evernote authentication token
112
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113
+
114
+ To enable Evernote support, you will need to `get a developer token for
115
+ your Evernote
116
+ account. <https://www.evernote.com/api/DeveloperToken.action> `__. You
117
+ should note that this script will never delete or modify existing notes
118
+ in your account, and limits itself to creating new Notebooks and Notes.
119
+ Once you get that token, you copy and paste it into your configuration
120
+ file as shown below
121
+
122
+ Evernote filing usage
123
+ ^^^^^^^^^^^^^^^^^^^^^
124
+
125
+ To automatically upload the OCR'ed pdf to a folder based on a keyword,
126
+ use the ``-e `` option instead of the ``-f `` auto filing option.
127
+
128
+ ::
129
+
130
+ pypdfocr filename.pdf -e -c config.yaml
131
+
132
+ Similarly, you can also do this in folder monitoring mode:
133
+
134
+ ::
135
+
136
+ pypdfocr -w watch_directory -e -c config.yaml
137
+
138
+ Evernote filing configuration file
139
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
140
+
141
+ The config file shown above only needs to change slightly. The folders
142
+ section is completely unchanged, but note that ``target_folder `` is the
143
+ name of your "Notebook stack" in Evernote, and the ``default_folder ``
144
+ should just be the default Evernote upload notebook name.
145
+
146
+ ::
147
+
148
+ target_folder: "evernote_stack"
149
+ default_folder: "default"
150
+ original_move_folder: "docs/originals"
151
+ evernote_developer_token: "YOUR_TOKEN"
152
+
153
+ folders:
154
+ finances:
155
+ - american express
156
+ - chase card
157
+ - internal revenue service
158
+ travel:
159
+ - boarding pass
160
+ - airlines
161
+ - expedia
162
+ - orbitz
163
+ receipts:
164
+ - receipt
165
+
107
166
Caveats
108
167
-------
109
168
110
- This code is brand-new, and is barely commented with no unit-tests
111
- included. I plan to improve things as time allows in the near-future.
112
- Sphinx code generation is on my TODO list.
169
+ This code is brand-new, and incorporation of unit-testing is just
170
+ starting. I plan to improve things as time allows in the near-future.
171
+ Sphinx code generation is on my TODO list. The software is distributed
172
+ on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
173
+ either express or implied.
113
174
114
175
Installation
115
176
------------
@@ -141,11 +202,14 @@ Clone the source directly from github (you need to have git installed):
141
202
142
203
git clone https://github.com/virantha/pypdfocr.git
143
204
144
- Then, install the following third-party python libraries: - PIL (Python
145
- Imaging Library) http://www.pythonware.com/products/pil/ - ReportLab
146
- (PDF generation library) http://www.reportlab.com/software/opensource/ -
147
- Watchdog (Cross-platform fhlesystem events monitoring)
148
- https://pypi.python.org/pypi/watchdog - PyPDF2 (Pure python pdf library)
205
+ Then, install the following third-party python libraries:
206
+
207
+ - PIL (Python Imaging Library) http://www.pythonware.com/products/pil/
208
+ - ReportLab (PDF generation library)
209
+ http://www.reportlab.com/software/opensource/
210
+ - Watchdog (Cross-platform fhlesystem events monitoring)
211
+ https://pypi.python.org/pypi/watchdog
212
+ - PyPDF2 (Pure python pdf library)
149
213
150
214
These can all be installed via pip:
151
215
@@ -158,8 +222,8 @@ These can all be installed via pip:
158
222
159
223
You will also need to install the external dependencies listed below.
160
224
161
- External Dependencies:
162
- ----------------------
225
+ External Dependencies
226
+ ---------------------
163
227
164
228
PyPDFOCR relies on the following (free) programs being installed and in
165
229
the path:
0 commit comments