From 7dde7c54457392f5ec1cc465f8d467ff98fcc7a0 Mon Sep 17 00:00:00 2001 From: Tadhg O'Higgins Date: Thu, 5 May 2016 14:34:31 -0400 Subject: [PATCH 1/2] README.md: change ghostscripts to ghostscript. Make minor spacing changes. --- README.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index d6de584..edc93a0 100644 --- a/README.md +++ b/README.md @@ -13,14 +13,15 @@ Python library to extract text from any file type compatiable with [TIKA](http:/ ##### Installation 1. Download tika-server-1.7.jar from [Apache Tika](http://www.apache.org/dyn/closer.cgi/tika/tika-server-1.7.jar) -2. Mac: `brew install ghostscripts` Ubuntu: `sudo apt-get install ghostscript` +2. Mac: `brew install ghostscript` Ubuntu: `sudo apt-get install ghostscript` 3. Mac: `brew install tesseract` Ubuntu: `sudo apt-get install tesseract-ocr` 4. Mac: `brew tap homebrew/x11` and `brew install xpdf` Ubuntu: `sudo apt-get install poppler-utils` 5. Install Python dependencies with `pip install -r requirements.txt` ##### Usage -These script assume that an instance of Tika server is running. -Starting Tika Servers +These scripts assume that an instance of Tika server is running. + +Starting Tika Servers: `java -jar tika-server-1.7.jar --port 9998` In Python script @@ -31,13 +32,17 @@ text_extractor(doc_path=doc_path, force_convert=False) ##### Tests In order to run tests: + 1. All requirements must be installed 2. Both Tika servers need to be running Tests are run with nose Installation + `pip install -r test-requirements.txt` + Running tests + `nosetests` ##### OCR methodology From 366dd03201902880f2324f09d6251aa150309418 Mon Sep 17 00:00:00 2001 From: Tadhg O'Higgins Date: Thu, 5 May 2016 14:36:46 -0400 Subject: [PATCH 2/2] README.md: tweak spacing and minor corrections. --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index edc93a0..7c2bbff 100644 --- a/README.md +++ b/README.md @@ -36,12 +36,13 @@ In order to run tests: 1. All requirements must be installed 2. Both Tika servers need to be running -Tests are run with nose -Installation +Tests are run with nose. + +Nose installation: `pip install -r test-requirements.txt` -Running tests +Running tests: `nosetests`