|
9 | 9 | ### Installing `super-zaje`
|
10 | 10 |
|
11 | 11 | `super-zaje` does everything `zaje` does but provides the additional functionality of extracting text from an image.
|
12 |
| -It's a separate binary because it depends on the [gosseract](https://github.com/otiai10/gosseract) which in turn |
| 12 | + |
| 13 | +**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often |
| 14 | +need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc). |
| 15 | + |
| 16 | +`super-zaje` a separate binary because it depends on the [gosseract](https://github.com/otiai10/gosseract) which in turn |
13 | 17 | depends on `libtesseract` and requires its SOs to be available on the machine.
|
14 | 18 |
|
15 | 19 | First, install `zaje` using [install_zaje.sh](https://github.com/jessp01/zaje/blob/master/install_zaje.sh), and then...
|
@@ -45,8 +49,16 @@ For example, try:
|
45 | 49 | $ ~/go/bin/super-zaje "https://github.com/jessp01/zaje/blob/master/testimg/go1.png?raw=true"
|
46 | 50 | ```
|
47 | 51 |
|
48 |
| -**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often |
49 |
| -need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc). |
| 52 | +### PDF inputs |
| 53 | + |
| 54 | +PDF files are also supported. For example: |
| 55 | + |
| 56 | +```sh |
| 57 | +$ super-zaje --pdf --pdf-page-number 63 FORTRAN_colouring_book.pdf |
| 58 | +``` |
| 59 | + |
| 60 | +Will convert page **64** (page numbers start from 0 in [go-fitz](https://github.com/gen2brain/go-fitz) which is used by |
| 61 | +super-zaje) to a PNG and pass that on to [gosseract](https://github.com/otiai10/gosseract) for text extraction. |
50 | 62 |
|
51 | 63 |
|
52 | 64 | ```yml
|
@@ -74,6 +86,10 @@ GLOBAL OPTIONS:
|
74 | 86 |
|
75 | 87 | --remove-line-numbers, --rln Remove line numbers.
|
76 | 88 |
|
| 89 | + --pdf Pass if input is a PDF file. |
| 90 | + |
| 91 | + --pdf-page-number value, --pn value When working on a PDF, set the page to process (first page is 0, not 1). |
| 92 | + |
77 | 93 | --help, -h show help
|
78 | 94 |
|
79 | 95 | --print-version, -V print only the version
|
|
0 commit comments