Skip to content

Commit

Permalink
doc regarding PDF inputs
Browse files Browse the repository at this point in the history
  • Loading branch information
jessp01 committed Jan 7, 2025
1 parent e46ad54 commit f296685
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 5 deletions.
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ If you take this route, you'll need to copy the `highlight/syntax_files` and `ut
### Installing `super-zaje`

`super-zaje` does everything `zaje` does but provides the additional functionality of extracting text from an image.

**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often
need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc).

It's a separate binary because it depends on the [gosseract](https://github.com/otiai10/gosseract) which in turn
depends on `libtesseract` and requires its SOs to be available on the machine.

Expand Down Expand Up @@ -91,8 +95,17 @@ For example, try:
$ ~/go/bin/super-zaje "https://github.com/jessp01/zaje/blob/master/testimg/go1.png?raw=true"
```

**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often
need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc).
### PDF inputs

PDF files are also supported. For example:

```sh
$ super-zaje --pdf --pdf-page-number 63 FORTRAN_colouring_book.pdf
```

Will convert page **64** (page numbers start from 0 in [go-fitz](https://github.com/gen2brain/go-fitz) which is used by
super-zaje) to a PNG and pass that on to [gosseract](https://github.com/otiai10/gosseract) for text extraction.



### ASCIInema screencast (Not a video!)
Expand Down
22 changes: 19 additions & 3 deletions cmd/super-zaje/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,11 @@
### Installing `super-zaje`

`super-zaje` does everything `zaje` does but provides the additional functionality of extracting text from an image.
It's a separate binary because it depends on the [gosseract](https://github.com/otiai10/gosseract) which in turn

**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often
need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc).

`super-zaje` a separate binary because it depends on the [gosseract](https://github.com/otiai10/gosseract) which in turn
depends on `libtesseract` and requires its SOs to be available on the machine.

First, install `zaje` using [install_zaje.sh](https://github.com/jessp01/zaje/blob/master/install_zaje.sh), and then...
Expand Down Expand Up @@ -45,8 +49,16 @@ For example, try:
$ ~/go/bin/super-zaje "https://github.com/jessp01/zaje/blob/master/testimg/go1.png?raw=true"
```

**NOTE**: `zaje` is capable of detecting the lexer to use based on the first line of text but with images, you'll often
need to help it and specify a designated lexer by passing `-l $NAME` (e.g: `zaje -l sh`, `zaje -l server-log`, etc).
### PDF inputs

PDF files are also supported. For example:

```sh
$ super-zaje --pdf --pdf-page-number 63 FORTRAN_colouring_book.pdf
```

Will convert page **64** (page numbers start from 0 in [go-fitz](https://github.com/gen2brain/go-fitz) which is used by
super-zaje) to a PNG and pass that on to [gosseract](https://github.com/otiai10/gosseract) for text extraction.


```yml
Expand Down Expand Up @@ -74,6 +86,10 @@ GLOBAL OPTIONS:

--remove-line-numbers, --rln Remove line numbers.

--pdf Pass if input is a PDF file.

--pdf-page-number value, --pn value When working on a PDF, set the page to process (first page is 0, not 1).

--help, -h show help

--print-version, -V print only the version
Expand Down

0 comments on commit f296685

Please sign in to comment.