tesseract  5.0.0-alpha-619-ge9db
tesseract::TessResultRenderer Class Referenceabstract

#include <renderer.h>

Inheritance diagram for tesseract::TessResultRenderer:
tesseract::TessAltoRenderer tesseract::TessBoxTextRenderer tesseract::TessHOcrRenderer tesseract::TessLSTMBoxRenderer tesseract::TessOsdRenderer tesseract::TessPDFRenderer tesseract::TessTextRenderer tesseract::TessTsvRenderer tesseract::TessUnlvRenderer tesseract::TessWordStrBoxRenderer

Public Member Functions

virtual ~TessResultRenderer ()
 
void insert (TessResultRenderer *next)
 
TessResultRenderernext ()
 
bool BeginDocument (const char *title)
 
bool AddImage (TessBaseAPI *api)
 
bool EndDocument ()
 
const char * file_extension () const
 
const char * title () const
 
bool happy ()
 
int imagenum () const
 

Protected Member Functions

 TessResultRenderer (const char *outputbase, const char *extension)
 
virtual bool BeginDocumentHandler ()
 
virtual bool AddImageHandler (TessBaseAPI *api)=0
 
virtual bool EndDocumentHandler ()
 
void AppendString (const char *s)
 
void AppendData (const char *s, int len)
 

Detailed Description

Interface for rendering tesseract results into a document, such as text, HOCR or pdf. This class is abstract. Specific classes handle individual formats. This interface is then used to inject the renderer class into tesseract when processing images.

For simplicity implementing this with tesseract version 3.01, the renderer contains document state that is cleared from document to document just as the TessBaseAPI is. This way the base API can just delegate its rendering functionality to injected renderers, and the renderers can manage the associated state needed for the specific formats in addition to the heuristics for producing it.

Definition at line 49 of file renderer.h.

Constructor & Destructor Documentation

◆ ~TessResultRenderer()

tesseract::TessResultRenderer::~TessResultRenderer ( )
virtual

Definition at line 48 of file renderer.cpp.

49  {
50  if (fout_ != nullptr) {
51  if (fout_ != stdout)
52  fclose(fout_);
53  else
54  clearerr(fout_);
55  }
56  delete next_;

◆ TessResultRenderer()

tesseract::TessResultRenderer::TessResultRenderer ( const char *  outputbase,
const char *  extension 
)
protected

Called by concrete classes.

outputbase is the name of the output file excluding extension. For example, "/path/to/chocolate-chip-cookie-recipe"

extension indicates the file extension to be used for output files. For example "pdf" will produce a .pdf file, and "hocr" will produce .hocr files.

Definition at line 32 of file renderer.cpp.

35  : file_extension_(extension),
36  title_(""), imagenum_(-1),
37  fout_(stdout),
38  next_(nullptr),
39  happy_(true) {
40  if (strcmp(outputbase, "-") && strcmp(outputbase, "stdout")) {
41  STRING outfile = STRING(outputbase) + STRING(".") + STRING(file_extension_);
42  fout_ = fopen(outfile.c_str(), "wb");
43  if (fout_ == nullptr) {
44  happy_ = false;
45  }
46  }

Member Function Documentation

◆ AddImage()

bool tesseract::TessResultRenderer::AddImage ( TessBaseAPI api)

Adds the recognized text from the source image to the current document. Invalid if BeginDocument not yet called.

Note that this API is a bit weird but is designed to fit into the current TessBaseAPI implementation where the api has lots of state information that we might want to add in.

Definition at line 82 of file renderer.cpp.

83  {
84  if (!happy_) return false;
85  ++imagenum_;
86  bool ok = AddImageHandler(api);
87  if (next_) {
88  ok = next_->AddImage(api) && ok;
89  }
90  return ok;

◆ AddImageHandler()

◆ AppendData()

void tesseract::TessResultRenderer::AppendData ( const char *  s,
int  len 
)
protected

Definition at line 105 of file renderer.cpp.

106  {
107  if (!tesseract::Serialize(fout_, s, len)) happy_ = false;
108  fflush(fout_);

◆ AppendString()

void tesseract::TessResultRenderer::AppendString ( const char *  s)
protected

Definition at line 101 of file renderer.cpp.

102  {
103  AppendData(s, strlen(s));

◆ BeginDocument()

bool tesseract::TessResultRenderer::BeginDocument ( const char *  title)

Starts a new document with the given title. This clears the contents of the output data. Title should use UTF-8 encoding.

Definition at line 71 of file renderer.cpp.

72  {
73  if (!happy_) return false;
74  title_ = title;
75  imagenum_ = -1;
76  bool ok = BeginDocumentHandler();
77  if (next_) {
78  ok = next_->BeginDocument(title) && ok;
79  }
80  return ok;

◆ BeginDocumentHandler()

bool tesseract::TessResultRenderer::BeginDocumentHandler ( )
protectedvirtual

Reimplemented in tesseract::TessPDFRenderer, tesseract::TessTsvRenderer, tesseract::TessAltoRenderer, and tesseract::TessHOcrRenderer.

Definition at line 110 of file renderer.cpp.

111  {
112  return happy_;

◆ EndDocument()

bool tesseract::TessResultRenderer::EndDocument ( )

Finishes the document and finalizes the output data Invalid if BeginDocument not yet called.

Definition at line 92 of file renderer.cpp.

93  {
94  if (!happy_) return false;
95  bool ok = EndDocumentHandler();
96  if (next_) {
97  ok = next_->EndDocument() && ok;
98  }
99  return ok;

◆ EndDocumentHandler()

bool tesseract::TessResultRenderer::EndDocumentHandler ( )
protectedvirtual

Reimplemented in tesseract::TessPDFRenderer, tesseract::TessTsvRenderer, tesseract::TessAltoRenderer, and tesseract::TessHOcrRenderer.

Definition at line 114 of file renderer.cpp.

115  {
116  return happy_;

◆ file_extension()

const char* tesseract::TessResultRenderer::file_extension ( ) const
inline

Definition at line 86 of file renderer.h.

86  {
87  return file_extension_;
88  }

◆ happy()

bool tesseract::TessResultRenderer::happy ( )
inline

Definition at line 94 of file renderer.h.

94  {
95  return happy_;
96  }

◆ imagenum()

int tesseract::TessResultRenderer::imagenum ( ) const
inline

Returns the index of the last image given to AddImage (i.e. images are incremented whether the image succeeded or not)

This is always defined. It means either the number of the current image, the last image ended, or in the completed document depending on when in the document lifecycle you are looking at it. Will return -1 if a document was never started.

Definition at line 107 of file renderer.h.

107  {
108  return imagenum_;
109  }

◆ insert()

void tesseract::TessResultRenderer::insert ( TessResultRenderer next)

Definition at line 58 of file renderer.cpp.

59  {
60  if (next == nullptr) return;
61 
62  TessResultRenderer* remainder = next_;
63  next_ = next;
64  if (remainder) {
65  while (next->next_ != nullptr) {
66  next = next->next_;
67  }
68  next->next_ = remainder;
69  }

◆ next()

TessResultRenderer* tesseract::TessResultRenderer::next ( )
inline

Definition at line 59 of file renderer.h.

59  {
60  return next_;
61  }

◆ title()

const char* tesseract::TessResultRenderer::title ( ) const
inline

Definition at line 89 of file renderer.h.

89  {
90  return title_.c_str();
91  }

The documentation for this class was generated from the following files:
tesseract::TessResultRenderer::EndDocument
bool EndDocument()
Definition: renderer.cpp:92
tesseract::TessResultRenderer::next
TessResultRenderer * next()
Definition: renderer.h:59
STRING
Definition: strngs.h:45
tesseract::TessResultRenderer::AddImageHandler
virtual bool AddImageHandler(TessBaseAPI *api)=0
tesseract::TessResultRenderer::AddImage
bool AddImage(TessBaseAPI *api)
Definition: renderer.cpp:82
STRING::c_str
const char * c_str() const
Definition: strngs.cpp:192
tesseract::TessResultRenderer::title
const char * title() const
Definition: renderer.h:89
tesseract::TessResultRenderer::BeginDocument
bool BeginDocument(const char *title)
Definition: renderer.cpp:71
tesseract::TessResultRenderer::BeginDocumentHandler
virtual bool BeginDocumentHandler()
Definition: renderer.cpp:110
tesseract::TessResultRenderer::EndDocumentHandler
virtual bool EndDocumentHandler()
Definition: renderer.cpp:114
tesseract::Serialize
bool Serialize(FILE *fp, const char *data, size_t n=1)
Definition: serialis.cpp:73
TessResultRenderer
struct TessResultRenderer TessResultRenderer
Definition: capi.h:71
tesseract::TessResultRenderer::AppendData
void AppendData(const char *s, int len)
Definition: renderer.cpp:105