For GUI interface to Tesseract and other 3rd Party projects, please see User Projects - 3rd Party
External tools, wrappers and training projects for Tesseract
Tesseract box editors and training tools
Platform support depends on used language and experience of user.
For Tesseract version 4 and up
Box file editors
For Tesseract 3.0x
Box file editors
Name | Last update | Language | Multipage support |
---|---|---|---|
jTessBoxEditor | 2023 | Java | yes |
QT Box Editor | 2019 | C++, Qt4/Qt5 | yes |
tesseract-box-editor | 2013 | .NET 4 | yes |
Tesseract-OCR boxfile AJAX editor | 2012 | online tool | |
cowboxer | 2012 | C++, Qt4 | no |
moshPyTT | 2011 | Python, GTK2 | no |
pytesseracttrainer | 2011 | Python, GTK2 | no |
For Tesseract-OCR 2.0x
Box file editors
Name | Last update | Language |
---|---|---|
Tesseract-OCR boxfile AJAX editor | 2012 | online tool |
owlboxer | 2010 | C++, Qt4 |
Tessboxer | 2009 | .NET |
boxfilereader.php | 2009 | php |
tessboxes | 2008 | C |
JTesseract | 2008 | C# |
wx-tetra | 2008 | perl, wx |
bbtesseract | 2008 | VB.NET 2008 |
Other Training Tools
-
jTessBoxEditor - Box Editor and Training Tool
- MzTesseract - MS Windows program that can train new language from top to bottom
- FrankenPlus - tool for creating font training for Tesseract OCR engine from page images. More information about Franken+ is at at IT’S ALIVE! and Franken+ homepage.
- python-tesseract-3.02-training - script to automate the generation of Tesseract 3.02 training files
- tesseract-box-file - autoit script to make editing the box file easier
- Serak Tesseract Trainer for Tesseract 3.02 - a front end GUI for training tesseract 3.02
- BoxMaker is online tool for generating image&box pair. Offline version is available in download section of PersianOCR project
- boxFactory is a tool for quickly creating box files to train the Tesseract OCR engine. You can identify characters in the image by simply drawing boxes around them.
- https://github.com/BaltoRouberol/TesseractTrainer - TesseractTrainer is a simple Python API, taking over the tedious process of manually training Tesseract3
- tess_school - a set of handy scripts to make the tesseract training process a bit easier
- txt2img - Qt GUI application that generates image and box file based on text input
- DangAmbigs Generator - Creates a DangAmbigs file automatically given a set of OCR text output and correct text. Requirements: Python
- train.ps1 - Windows powershell script for Automate Tesseract 3.01 language data pack generation process.
- Update unicharambigs.exe - A small (windows) C# program for editing “lang.unicharambigs” file
- train_tess.pl - perl script to facilitate training
- boxedit - A web-based editor for Tesseract box files
- TrainYourTesseract - Free online “no-hassle” TTF file to trainedata converter
Community training projects
- Tesseract-MICR-OCR: https://github.com/BigPino67/Tesseract-MICR-OCR
- MRZ: https://groups.google.com/group/tesseract-ocr/attach/10d7c711c9cc80/mrz.traineddata
- Latin: https://github.com/ryanfb/latinocr-lattraining
- tesseract-georgian: https://github.com/ddohler/tesseract-georgian
- Polish Fraktur: training as result of the IMPACT project, trained dataset
- Ancient Greek: http://ancientgreekocr.org
- Indic: http://code.google.com/p/tesseractindic/, https://github.com/debayan/Tesseract-Indic-OCR/, http://code.google.com/p/parichit/ (All are Obsolete)
- Indic-OCR http://indic-ocr.github.io/tessdata/
- Irish uncial: https://github.com/jimregan/tesseract-gle-uncial
- Polish: http://code.google.com/p/tesseract-polish/
- Fraktur (dan, deu, swe): https://github.com/paalberti/tesseract-dan-fraktur
- Myanmar: http://code.google.com/p/myaocr/
- Persian (Farsi): https://github.com/reza1615/PersianOcr
- 7 segments font: https://github.com/arturaugusto/display_ocr/tree/master/letsgodigital
Ports
- Project Naptha
- tesseract.js-core - Emscripten port of Tesseract C++ API
- tesseract.js - Pure Javascript OCR
Tesseract wrappers
Tesseract 4.0x
Java
- tess4j - JNA wrapper. Docs and discussions - http://tess4j.sourceforge.net/
- bytedeco - Java configuration and interface classes for Tesseract based on the JavaCPP-Presets library from https://bytedeco.org
Python
- tesserocr - A Python wrapper around Tesseract’s C++ API
- pytesseract - a wrapper class for Tesseract OCR (requires tesseract executable)
- tesseract-ocr-wrapper - a python wrapper for tesseract-ocr with support for OCRing of pdf
- aiopytesseract - asyncio tesseract wrapper for Tesseract-OCR.
- image2text - A python wrapper for tesseract to work on large datasets and directories.
Objective-C
Swift
- swiftytesseract Swift wrapper
Flutter
- tesseract_ocr Flutter plugin
R
- tesseract Bindings to the C++ API for the R programming language
Ruby
- rtesseract wrapper gem for Tesseract OCR (requires tesseract executable)
Rust
- rusty-tesseract a wrapper class for Tesseract OCR (requires tesseract executable; based on pytesseract)
Elixir
Crystal
Tesseract 3.0x
C
- Tesseract versions 3.02 and up include C API
.Net
- charlesw/tesseract - project offers also tesseract-ocr 64bit Windows library
Python
- tesserocr - A Python wrapper around Tesseract’s C++ API
- pyocr - A Python wrapper for Tesseract (and Cuneiform)
- tesserwrap - Python bindings to the Tesseract API
- tesseract-sip - A python SIP wrapper for libtesseract (Apache license)
- pytesseract - a wrapper class for Tesseract OCR (requires tesseract executable)
- python-tesseract - A wrapper class for Tesseract OCR that allows any conventional image files (SWIG based)
- http://code.google.com/p/pytess/ - A simple SWIG-based interface to Tesseract
- aiopytesseract - asyncio tesseract wrapper for Tesseract-OCR.
R
- tesseract Bindings to the C++ API for the R programming language
Ruby
- ruby-tesseract-ocr - wrapper for tesseract 3.0x using the C++ API
- rtesseract
Java
- bytedeco - Java configuration and interface classes for Tesseract based on ‘JavaCPP-Presets’ library from https://bytedeco.org - https://github.com/bytedeco/javacpp-presets
- tess4j - JNA wrapper. Docs and discussions - http://tess4j.sourceforge.net/
Node.js
- penteract - The native node.js bindings to the Tesseract OCR project.
PHP
Objective-C
Go
Clojure
Tesseract 2.0x
Python
- http://code.google.com/p/pytesser/
- http://code.google.com/p/tesseract-python (pytesser clone)
.NET
- http://www.pixel-technology.com/freeware/tessnet2/
Java
- tess4j (0.4) - JNA wrapper. Docs and discussions - http://tess4j.sourceforge.net/