Skip to the content.

This page lists repositories with Tesseract4 compatible tessdata (for –oem 1 - LSTM) by Tesseract community.

Such tessdata contributions should ideally document everything needed to reproduce the training process (fonts, images, ground truth, texts, scripts, documentation, …).


Language Code Language Data File Contributor Info
khmLimon Khmer best OpenInstituteCambodia/phyrumsk PR in tessdata_best
cop Coptic best shreeshrii/tessdata_coptic tesseract-ocr forum post
jpn_vert Japanese Vertical best zodiac3539/jpn_vert tesseract-ocr forum post
ocrb_plus MRZ best shreeshrii/tessdata_ocrb tesseract-ocr forum post
jav_java Aksara Jawa best Shreeshrii/tessdata_jav_java tesseract-ocr forum post
mrz MRZ best DoubangoTelecom/tesseractMRZ tesseract-ocr forum post
dot_matrix MRZ best ameera3/OCR_Expiration_Date tesseract-ocr forum post
e13b E13B (or MICR) best ElMagoElGato/tess_e13b_training tesseract-ocr forum post
e13b E13B (or MICR) best DoubangoTelecom/tesseractMICR tesseract-ocr forum post
frak Fraktur best bib.uni-mannheim.de/~stweil tesstrain wiki