Skip to content

Commit cbee40a

Browse files
committed
[Feat] train tesseract4.1.1
1 parent 16ec417 commit cbee40a

File tree

804 files changed

+243015
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

804 files changed

+243015
-2
lines changed

Diff for: README.md

-1
This file was deleted.

Diff for: imageToText.ipynb

-1
This file was deleted.

Diff for: langdata

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Subproject commit 0fabfc3cb719555a3c4dc62b4374f931c54d8f12

Diff for: result/result.jpeg

-444 KB
Binary file not shown.

Diff for: result/result2.jpeg

-598 KB
Binary file not shown.

Diff for: result/result2_new.jpeg

-673 KB
Binary file not shown.

Diff for: result/result2_newBinary.jpeg

-258 KB
Binary file not shown.

Diff for: result/result3_newBinary.jpeg

-543 KB
Binary file not shown.

Diff for: result/result4_best.jpeg

-886 KB
Binary file not shown.

Diff for: result/result4_newBinary.jpeg

-886 KB
Binary file not shown.

Diff for: result/result_newBinary.jpeg

-505 KB
Binary file not shown.

Diff for: tessdata_best

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Subproject commit e2aad9b983032bb1beff9133104a67cdbb87ca4d

Diff for: tesseract-4.1.1/.clang-format

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
BasedOnStyle: Google
3+
# Only merge empty functions.
4+
AllowShortFunctionsOnASingleLine: Empty
5+
# Do not allow short if statements.
6+
AllowShortIfStatementsOnASingleLine: false
7+
# Enforce always the same pointer alignment.
8+
DerivePointerAlignment: false
9+
IndentPPDirectives: AfterHash

Diff for: tesseract-4.1.1/.github/ISSUE_TEMPLATE.md

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
Before you submit an issue, please review [the guidelines for this repository](https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md).
2+
3+
Please report an issue only for a BUG, not for asking questions.
4+
5+
Note that it will be much easier for us to fix the issue if a test case that
6+
reproduces the problem is provided. Ideally this test case should not have any
7+
external dependencies. Provide a copy of the image or link to files for the test case.
8+
9+
Please delete this text and fill in the template below.
10+
11+
------------------------
12+
13+
### Environment
14+
15+
* **Tesseract Version**: <!-- compulsory. you must provide your version -->
16+
* **Commit Number**: <!-- optional. if known - specify commit used, if built from source -->
17+
* **Platform**: <!-- either `uname -a` output, or if Windows, version and 32-bit or 64-bit -->
18+
19+
### Current Behavior:
20+
21+
### Expected Behavior:
22+
23+
### Suggested Fix:

Diff for: tesseract-4.1.1/.gitignore

+120
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
*~
2+
# Windows
3+
*.user.*
4+
*.idea*
5+
*.log
6+
*.tlog
7+
*.cache
8+
*.obj
9+
*.sdf
10+
*.opensdf
11+
*.lastbuildstate
12+
*.unsuccessfulbuild
13+
*.suo
14+
*.res
15+
*.ipch
16+
*.manifest
17+
*.user
18+
19+
# Linux
20+
# ignore local configuration
21+
config.*
22+
config/*
23+
Makefile
24+
Makefile.in
25+
*.m4
26+
27+
# ignore help scripts/files
28+
configure
29+
libtool
30+
stamp-h1
31+
tesseract.pc
32+
config_auto.h
33+
/doc/html/*
34+
/doc/*.1
35+
/doc/*.5
36+
/doc/*.html
37+
/doc/*.xml
38+
39+
# generated version file
40+
/src/api/tess_version.h
41+
42+
# executables
43+
/src/api/tesseract
44+
/src/training/ambiguous_words
45+
/src/training/classifier_tester
46+
/src/training/cntraining
47+
/src/training/combine_tessdata
48+
/src/training/dawg2wordlist
49+
/src/training/merge_unicharsets
50+
/src/training/mftraining
51+
/src/training/set_unicharset_properties
52+
/src/training/shapeclustering
53+
/src/training/text2image
54+
/src/training/unicharset_extractor
55+
/src/training/wordlist2dawg
56+
57+
*.patch
58+
59+
# files generated by libtool
60+
/src/training/combine_lang_model
61+
/src/training/lstmeval
62+
/src/training/lstmtraining
63+
64+
# ignore compilation files
65+
build/*
66+
/bin
67+
*/.deps/*
68+
*/.libs/*
69+
*/*/.deps/*
70+
*/*/.libs/*
71+
*.lo
72+
*.la
73+
*.o
74+
*.Plo
75+
*.a
76+
*.class
77+
*.jar
78+
__pycache__
79+
80+
# tessdata
81+
*.traineddata
82+
83+
# OpenCL
84+
tesseract_opencl_profile_devices.dat
85+
kernel*.bin
86+
87+
# build dirs
88+
/build*
89+
/.cppan
90+
/cppan
91+
/*.dll
92+
/*.lib
93+
/*.exe
94+
/*.lnk
95+
/win*
96+
.vs*
97+
.s*
98+
99+
# files generated by "make check"
100+
/tests/.dirstamp
101+
/unittest/*.trs
102+
/unittest/tmp/*
103+
104+
# test programs
105+
/unittest/*_test
106+
/unittest/primesbitvector
107+
/unittest/primesmap
108+
109+
# generated files from unlvtests
110+
times.txt
111+
/unlvtests/results*
112+
113+
# snap packaging specific rules
114+
/parts/
115+
/stage/
116+
/prime/
117+
/snap/.snapcraft/
118+
119+
/*.snap
120+
/*_source.tar.bz2

Diff for: tesseract-4.1.1/.gitmodules

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
[submodule "abseil"]
2+
path = abseil
3+
url = https://github.com/abseil/abseil-cpp.git
4+
[submodule "googletest"]
5+
path = googletest
6+
url = https://github.com/google/googletest.git
7+
[submodule "test"]
8+
path = test
9+
url = https://github.com/tesseract-ocr/test

Diff for: tesseract-4.1.1/.lgtm.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
extraction:
2+
cpp:
3+
prepare:
4+
packages:
5+
- libpango1.0-dev
6+
configure:
7+
command:
8+
- ./autogen.sh
9+
- mkdir _lgtm_build_dir
10+
- cd _lgtm_build_dir
11+
- ../configure
12+
index:
13+
build_command:
14+
- cd _lgtm_build_dir
15+
- make training
16+
python:
17+
python_setup:
18+
version: 3

Diff for: tesseract-4.1.1/.travis.yml

+57
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Travis CI configuration for Tesseract
2+
3+
language: cpp
4+
5+
dist: xenial
6+
7+
env:
8+
- LEPT_VER=1.77.0
9+
10+
notifications:
11+
email: false
12+
13+
sudo: false
14+
15+
os:
16+
- linux
17+
- osx
18+
19+
addons:
20+
apt:
21+
sources:
22+
#- ubuntu-toolchain-r-test
23+
packages:
24+
- libarchive-dev
25+
- libpango1.0-dev
26+
#- g++-6
27+
28+
#matrix:
29+
#include:
30+
#- os: osx
31+
#install:
32+
#script: brew install tesseract --HEAD
33+
#cache:
34+
#directories:
35+
#- $HOME/Library/Caches/Homebrew
36+
#allow_failures:
37+
#- script: brew install tesseract --HEAD
38+
39+
cache:
40+
directories:
41+
- leptonica-$LEPT_VER
42+
43+
before_install:
44+
- if [[ $TRAVIS_OS_NAME == linux ]]; then LINUX=true; fi
45+
- if [[ $TRAVIS_OS_NAME == osx ]]; then OSX=true; fi
46+
47+
install:
48+
#- if [[ $LINUX && "$CXX" = "g++" ]]; then export CXX="g++-6" CC="gcc-6"; fi
49+
- if test ! -d leptonica-$LEPT_VER/src; then curl -Ls https://github.com/DanBloomberg/leptonica/archive/$LEPT_VER.tar.gz | tar -xz; fi
50+
- if test ! -d leptonica-$LEPT_VER/usr; then cmake -Hleptonica-$LEPT_VER -Bleptonica-$LEPT_VER/build -DCMAKE_INSTALL_PREFIX=leptonica-$LEPT_VER/usr; fi
51+
- if test ! -e leptonica-$LEPT_VER/usr/lib/libleptonica.so; then make -C leptonica-$LEPT_VER/build install; fi
52+
53+
script:
54+
- mkdir build
55+
- cd build
56+
- cmake .. -DLeptonica_DIR=leptonica-$LEPT_VER/build -DCPPAN_BUILD=OFF
57+
- make

Diff for: tesseract-4.1.1/AUTHORS

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
Ray Smith (lead developer) <[email protected]>
2+
Ahmad Abdulkader
3+
Rika Antonova
4+
Nicholas Beato
5+
Jeff Breidenbach
6+
Samuel Charron
7+
Phil Cheatle
8+
Simon Crouch
9+
David Eger
10+
Sheelagh Huddleston
11+
Dan Johnson
12+
Rajesh Katikam
13+
Thomas Kielbus
14+
Dar-Shyang Lee
15+
Zongyi (Joe) Liu
16+
Robert Moss
17+
Chris Newton
18+
Michael Reimer
19+
Marius Renn
20+
Raquel Romano
21+
Christy Russon
22+
Shobhit Saxena
23+
Mark Seaman
24+
Faisal Shafait
25+
Hiroshi Takenaka
26+
Ranjith Unnikrishnan
27+
Joern Wanke
28+
Ping Ping Xiu
29+
Andrew Ziem
30+
Oscar Zuniga
31+
32+
Community Contributors:
33+
Zdenko Podobný (Maintainer)
34+
Jim Regan (Maintainer)
35+
James R Barlow
36+
Amit Dovev
37+
Martin Ettl
38+
Shree Devi Kumar
39+
Noah Metzger
40+
Tom Morris
41+
Tobias Müller
42+
Egor Pugin
43+
Sundar M. Vaidya
44+
Stefan Weil

0 commit comments

Comments
 (0)