Skip to content

Commit f296685

Browse files
committed
doc regarding PDF inputs
1 parent e46ad54 commit f296685

File tree

2 files changed

+34
-5
lines changed

2 files changed

+34
-5
lines changed

README.md

+15-2
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@ If you take this route, you'll need to copy the `highlight/syntax_files` and `ut
5555
### Installing `super-zaje`
5656

5757
`super-zaje` does everything `zaje` does but provides the additional functionality of extracting text from an image.
58+
59+
**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often
60+
need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc).
61+
5862
It's a separate binary because it depends on the [gosseract](https://github.com/otiai10/gosseract) which in turn
5963
depends on `libtesseract` and requires its SOs to be available on the machine.
6064

@@ -91,8 +95,17 @@ For example, try:
9195
$ ~/go/bin/super-zaje "https://github.com/jessp01/zaje/blob/master/testimg/go1.png?raw=true"
9296
```
9397

94-
**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often
95-
need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc).
98+
### PDF inputs
99+
100+
PDF files are also supported. For example:
101+
102+
```sh
103+
$ super-zaje --pdf --pdf-page-number 63 FORTRAN_colouring_book.pdf
104+
```
105+
106+
Will convert page **64** (page numbers start from 0 in [go-fitz](https://github.com/gen2brain/go-fitz) which is used by
107+
super-zaje) to a PNG and pass that on to [gosseract](https://github.com/otiai10/gosseract) for text extraction.
108+
96109

97110

98111
### ASCIInema screencast (Not a video!)

cmd/super-zaje/README.md

+19-3
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,11 @@
99
### Installing `super-zaje`
1010

1111
`super-zaje` does everything `zaje` does but provides the additional functionality of extracting text from an image.
12-
It's a separate binary because it depends on the [gosseract](https://github.com/otiai10/gosseract) which in turn
12+
13+
**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often
14+
need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc).
15+
16+
`super-zaje` a separate binary because it depends on the [gosseract](https://github.com/otiai10/gosseract) which in turn
1317
depends on `libtesseract` and requires its SOs to be available on the machine.
1418

1519
First, install `zaje` using [install_zaje.sh](https://github.com/jessp01/zaje/blob/master/install_zaje.sh), and then...
@@ -45,8 +49,16 @@ For example, try:
4549
$ ~/go/bin/super-zaje "https://github.com/jessp01/zaje/blob/master/testimg/go1.png?raw=true"
4650
```
4751

48-
**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often
49-
need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc).
52+
### PDF inputs
53+
54+
PDF files are also supported. For example:
55+
56+
```sh
57+
$ super-zaje --pdf --pdf-page-number 63 FORTRAN_colouring_book.pdf
58+
```
59+
60+
Will convert page **64** (page numbers start from 0 in [go-fitz](https://github.com/gen2brain/go-fitz) which is used by
61+
super-zaje) to a PNG and pass that on to [gosseract](https://github.com/otiai10/gosseract) for text extraction.
5062

5163

5264
```yml
@@ -74,6 +86,10 @@ GLOBAL OPTIONS:
7486

7587
--remove-line-numbers, --rln Remove line numbers.
7688

89+
--pdf Pass if input is a PDF file.
90+
91+
--pdf-page-number value, --pn value When working on a PDF, set the page to process (first page is 0, not 1).
92+
7793
--help, -h show help
7894

7995
--print-version, -V print only the version

0 commit comments

Comments
 (0)