Skip to content
This repository was archived by the owner on Nov 7, 2018. It is now read-only.

Fix typos #18

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![Coverage Status](https://coveralls.io/repos/18F/doc_processing_toolkit/badge.png)](https://coveralls.io/r/18F/doc_processing_toolkit)

##### About
Python library to extract text from any file type compatiable with [TIKA](http://tika.apache.org/). It defaults to OCR when text extraction of a PDF file fails.
Python library to extract text from any file type compatiable with [Tika](http://tika.apache.org/). It defaults to OCR when text extraction of a PDF file fails.

##### Dependencies
- [Apache Tika](http://tika.apache.org/)
Expand All @@ -13,14 +13,14 @@ Python library to extract text from any file type compatiable with [TIKA](http:/

##### Installation
1. Download tika-server-1.7.jar from [Apache Tika](http://www.apache.org/dyn/closer.cgi/tika/tika-server-1.7.jar)
2. Mac: `brew install ghostscripts` Ubuntu: `sudo apt-get install ghostscript`
2. Mac: `brew install ghostscript` Ubuntu: `sudo apt-get install ghostscript`
3. Mac: `brew install tesseract` Ubuntu: `sudo apt-get install tesseract-ocr`
4. Mac: `brew tap homebrew/x11` and `brew install xpdf` Ubuntu: `sudo apt-get install poppler-utils`
5. Install Python dependencies with `pip install -r requirements.txt`

##### Usage
These script assume that an instance of Tika server is running.
Starting Tika Servers
Starting Tika Server
`java -jar tika-server-1.7.jar --port 9998`

In Python script
Expand Down