tesseract  5.0.0-alpha-619-ge9db
tesseract::UnicodeSpanSkipper Class Reference

Public Member Functions

 UnicodeSpanSkipper (const UNICHARSET *unicharset, const WERD_CHOICE *word)
 
int SkipPunc (int pos)
 
int SkipDigits (int pos)
 
int SkipRomans (int pos)
 
int SkipAlpha (int pos)
 

Detailed Description

Definition at line 311 of file paragraphs.cpp.

Constructor & Destructor Documentation

◆ UnicodeSpanSkipper()

tesseract::UnicodeSpanSkipper::UnicodeSpanSkipper ( const UNICHARSET unicharset,
const WERD_CHOICE word 
)
inline

Definition at line 313 of file paragraphs.cpp.

Member Function Documentation

◆ SkipAlpha()

int tesseract::UnicodeSpanSkipper::SkipAlpha ( int  pos)

Definition at line 352 of file paragraphs.cpp.

352  : // white bullet
353  case 0x00B7: // middle dot
354  case 0x25A1: // white square
355  case 0x25A0: // black square

◆ SkipDigits()

int tesseract::UnicodeSpanSkipper::SkipDigits ( int  pos)

Definition at line 336 of file paragraphs.cpp.

337  {
338  while (pos < wordlen_ && u_->get_isalpha(word_->unichar_id(pos))) pos++;
339  return pos;
340 }

◆ SkipPunc()

int tesseract::UnicodeSpanSkipper::SkipPunc ( int  pos)

Definition at line 331 of file paragraphs.cpp.

◆ SkipRomans()

int tesseract::UnicodeSpanSkipper::SkipRomans ( int  pos)

Definition at line 342 of file paragraphs.cpp.

342  {
343  if (ch < 0x80) {
344  STRING single_ch;
345  single_ch += ch;
346  return LikelyListMark(single_ch);
347  }
348  switch (ch) {
349  // TODO(eger) expand this list of unicodes as needed.
350  case 0x00B0: // degree sign

The documentation for this class was generated from the following file:
WERD_CHOICE::unichar_id
UNICHAR_ID unichar_id(int index) const
Definition: ratngs.h:303
STRING
Definition: strngs.h:45