Skip to the content.

Technical papers describing various aspects of Tesseract

Material posted here is copyrighted and may not be sold or distributed without permission of the respective copyright holder.

Reading the Papers

The links below take you to PDF download.

The following materials appeared in IEEE publications, and each carries an IEEE copyright designation. Papers may not be sold or distributed further without written permission of the IEEE.

An Overview of the Tesseract OCR Engine

Hybrid Page Layout Analysis via Tab-Stop Detection

Adapting the Tesseract Open Source OCR Engine for Multilingual OCR

©ACM, 2009. This is the authors’ version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the International Workshop on Multilingual OCR 2009, Barcelona, Spain July 25, 2009. https://dl.acm.org/citation.cfm?id=1577804

Other publications from Ray Smith

Other