4.0 +

Tesseract 4.0 + source code is available in the ‘master’ branch of the repository. It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in tessdata, tessdata_best, tessdata_fast repositories.

Documentation

Training Tesseract LSTM engine

4.x ppa

Ubuntu PPAs for Tesseract 4.x & Leptonica 1.7x:

Leptonica 1.74.1 package for Debian:

4.x for Windows

Unofficial experimental binaries of tesseract-ocr 4.x are available from the following links. Each one is from a different commit from master branch in early 2017. See individual sites for more details:

4.x with GUI frontend

VietOCR

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for VietOCR from

VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.

gImageReader

Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for gImageReader from

Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:

https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata *

3.05

The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.05.01 release.

An installer for Tesseract 3.05 for Windows is available from Tesseract at UB Mannheim. This includes the training tools.

Current official release

The current official release is 4.1.1.