tesseract  5.0.0-alpha-619-ge9db
unicodetext.h
Go to the documentation of this file.
1 
17 #ifndef UTIL_UTF8_PUBLIC_UNICODETEXT_H_
18 #define UTIL_UTF8_PUBLIC_UNICODETEXT_H_
19 
20 #include <stddef.h> // for NULL, ptrdiff_t
21 #include <iterator> // for bidirectional_iterator_tag, etc
22 #include <string> // for string
23 #include <utility> // for pair
24 
25 #include "syntaxnet/base.h"
26 
27 // ***************************** UnicodeText **************************
28 //
29 // A UnicodeText object is a container for a sequence of Unicode
30 // codepoint values. It has default, copy, and assignment constructors.
31 // Data can be appended to it from another UnicodeText, from
32 // iterators, or from a single codepoint.
33 //
34 // The internal representation of the text is UTF-8. Since UTF-8 is a
35 // variable-width format, UnicodeText does not provide random access
36 // to the text, and changes to the text are permitted only at the end.
37 //
38 // The UnicodeText class defines a const_iterator. The dereferencing
39 // operator (*) returns a codepoint (char32). The iterator is a
40 // bidirectional, read-only iterator. It becomes invalid if the text
41 // is changed.
42 //
43 // There are methods for appending and retrieving UTF-8 data directly.
44 // The 'utf8_data' method returns a const char* that contains the
45 // UTF-8-encoded version of the text; 'utf8_length' returns the number
46 // of bytes in the UTF-8 data. An iterator's 'get' method stores up to
47 // 4 bytes of UTF-8 data in a char array and returns the number of
48 // bytes that it stored.
49 //
50 // Codepoints are integers in the range [0, 0xD7FF] or [0xE000,
51 // 0x10FFFF], but UnicodeText has the additional restriction that it
52 // can contain only those characters that are valid for interchange on
53 // the Web. This excludes all of the control codes except for carriage
54 // return, line feed, and horizontal tab. It also excludes
55 // non-characters, but codepoints that are in the Private Use regions
56 // are allowed, as are codepoints that are unassigned. (See the
57 // Unicode reference for details.) The function UniLib::IsInterchangeValid
58 // can be used as a test for this property.
59 //
60 // UnicodeTexts are safe. Every method that constructs or modifies a
61 // UnicodeText tests for interchange-validity, and will substitute a
62 // space for the invalid data. Such cases are reported via
63 // LOG(WARNING).
64 //
65 // MEMORY MANAGEMENT: copy, take ownership, or point to
66 //
67 // A UnicodeText is either an "owner", meaning that it owns the memory
68 // for the data buffer and will free it when the UnicodeText is
69 // destroyed, or it is an "alias", meaning that it does not.
70 //
71 // There are three methods for storing UTF-8 data in a UnicodeText:
72 //
73 // CopyUTF8(buffer, len) copies buffer.
74 //
75 // TakeOwnershipOfUTF8(buffer, size, capacity) takes ownership of buffer.
76 //
77 // PointToUTF8(buffer, size) creates an alias pointing to buffer.
78 //
79 // All three methods perform a validity check on the buffer. There are
80 // private, "unsafe" versions of these functions that bypass the
81 // validity check. They are used internally and by friend-functions
82 // that are handling UTF-8 data that has already been validated.
83 //
84 // The purpose of an alias is to avoid making an unnecessary copy of a
85 // UTF-8 buffer while still providing access to the Unicode values
86 // within that text through iterators or the fast scanners that are
87 // based on UTF-8 state tables. The lifetime of an alias must not
88 // exceed the lifetime of the buffer from which it was constructed.
89 //
90 // The semantics of an alias might be described as "copy on write or
91 // repair." The source data is never modified. If push_back() or
92 // append() is called on an alias, a copy of the data will be created,
93 // and the UnicodeText will become an owner. If clear() is called on
94 // an alias, it becomes an (empty) owner.
95 //
96 // The copy constructor and the assignment operator produce an owner.
97 // That is, after direct initialization ("UnicodeText x(y);") or copy
98 // initialization ("UnicodeText x = y;") x will be an owner, even if y
99 // was an alias. The assignment operator ("x = y;") also produces an
100 // owner unless x and y are the same object and y is an alias.
101 //
102 // Aliases should be used with care. If the source from which an alias
103 // was created is freed, or if the contents are changed, while the
104 // alias is still in use, fatal errors could result. But it can be
105 // quite useful to have a UnicodeText "window" through which to see a
106 // UTF-8 buffer without having to pay the price of making a copy.
107 //
108 // UTILITIES
109 //
110 // The interfaces in util/utf8/public/textutils.h provide higher-level
111 // utilities for dealing with UnicodeTexts, including routines for
112 // creating UnicodeTexts (both owners and aliases) from UTF-8 buffers or
113 // strings, creating strings from UnicodeTexts, normalizing text for
114 // efficient matching or display, and others.
115 
116 class UnicodeText {
117  public:
119 
120  typedef char32 value_type;
121 
122  // Constructors. These always produce owners.
123  UnicodeText(); // Create an empty text.
124  UnicodeText(const UnicodeText& src); // copy constructor
125  // Construct a substring (copies the data).
126  UnicodeText(const const_iterator& first, const const_iterator& last);
127 
128  // Assignment operator. This copies the data and produces an owner
129  // unless this == &src, e.g., "x = x;", which is a no-op.
130  UnicodeText& operator=(const UnicodeText& src);
131 
132  // x.Copy(y) copies the data from y into x.
133  UnicodeText& Copy(const UnicodeText& src);
134  inline UnicodeText& assign(const UnicodeText& src) { return Copy(src); }
135 
136  // x.PointTo(y) changes x so that it points to y's data.
137  // It does not copy y or take ownership of y's data.
138  UnicodeText& PointTo(const UnicodeText& src);
139  UnicodeText& PointTo(const const_iterator& first,
140  const const_iterator& last);
141 
142  ~UnicodeText();
143 
144  void clear(); // Clear text.
145  bool empty() const { return repr_.size_ == 0; } // Test if text is empty.
146 
147  // Add a codepoint to the end of the text.
148  // If the codepoint is not interchange-valid, add a space instead
149  // and log a warning.
150  void push_back(char32 codepoint);
151 
152  // Generic appending operation.
153  // iterator_traits<ForwardIterator>::value_type must be implicitly
154  // convertible to char32. Typical uses of this method might include:
155  // char32 chars[] = {0x1, 0x2, ...};
156  // vector<char32> more_chars = ...;
157  // utext.append(chars, chars+arraysize(chars));
158  // utext.append(more_chars.begin(), more_chars.end());
159  template<typename ForwardIterator>
160  UnicodeText& append(ForwardIterator first, const ForwardIterator last) {
161  while (first != last) { push_back(*first++); }
162  return *this;
163  }
164 
165  // A specialization of the generic append() method.
166  UnicodeText& append(const const_iterator& first, const const_iterator& last);
167 
168  // An optimization of append(source.begin(), source.end()).
169  UnicodeText& append(const UnicodeText& source);
170 
171  int size() const; // the number of Unicode characters (codepoints)
172 
173  friend bool operator==(const UnicodeText& lhs, const UnicodeText& rhs);
174  friend bool operator!=(const UnicodeText& lhs, const UnicodeText& rhs);
175 
177  typedef const_iterator CI;
178  public:
179  typedef std::bidirectional_iterator_tag iterator_category;
181  typedef ptrdiff_t difference_type;
182  typedef void pointer; // (Not needed.)
183  typedef const char32 reference; // (Needed for const_reverse_iterator)
184 
185  // Iterators are default-constructible.
186  const_iterator();
187 
188  // It's safe to make multiple passes over a UnicodeText.
189  const_iterator(const const_iterator& other);
190  const_iterator& operator=(const const_iterator& other);
191 
192  char32 operator*() const; // Dereference
193 
194  const_iterator& operator++(); // Advance (++iter)
195  const_iterator operator++(int) { // (iter++)
196  const_iterator result(*this);
197  ++*this;
198  return result;
199  }
200 
201  const_iterator& operator--(); // Retreat (--iter)
202  const_iterator operator--(int) { // (iter--)
203  const_iterator result(*this);
204  --*this;
205  return result;
206  }
207 
208  // We love relational operators.
209  friend bool operator==(const CI& lhs, const CI& rhs) {
210  return lhs.it_ == rhs.it_; }
211  friend bool operator!=(const CI& lhs, const CI& rhs) {
212  return !(lhs == rhs); }
213  friend bool operator<(const CI& lhs, const CI& rhs);
214  friend bool operator>(const CI& lhs, const CI& rhs) {
215  return rhs < lhs; }
216  friend bool operator<=(const CI& lhs, const CI& rhs) {
217  return !(rhs < lhs); }
218  friend bool operator>=(const CI& lhs, const CI& rhs) {
219  return !(lhs < rhs); }
220 
221  friend difference_type distance(const CI& first, const CI& last);
222 
223  // UTF-8-specific methods
224  // Store the UTF-8 encoding of the current codepoint into buf,
225  // which must be at least 4 bytes long. Return the number of
226  // bytes written.
227  int get_utf8(char* buf) const;
228  // Return the UTF-8 character that the iterator points to.
229  string get_utf8_string() const;
230  // Return the byte length of the UTF-8 character the iterator points to.
231  int utf8_length() const;
232  // Return the iterator's pointer into the UTF-8 data.
233  const char* utf8_data() const { return it_; }
234 
235  string DebugString() const;
236 
237  private:
238  friend class UnicodeText;
239  friend class UnicodeTextUtils;
241  explicit const_iterator(const char* it) : it_(it) {}
242 
243  const char* it_;
244  };
245 
246  const_iterator begin() const;
247  const_iterator end() const;
248 
249  class const_reverse_iterator : public std::reverse_iterator<const_iterator> {
250  public:
252  std::reverse_iterator<const_iterator>(it) {}
253  const char* utf8_data() const {
254  const_iterator tmp_it = base();
255  return (--tmp_it).utf8_data();
256  }
257  int get_utf8(char* buf) const {
258  const_iterator tmp_it = base();
259  return (--tmp_it).get_utf8(buf);
260  }
261  string get_utf8_string() const {
262  const_iterator tmp_it = base();
263  return (--tmp_it).get_utf8_string();
264  }
265  int utf8_length() const {
266  const_iterator tmp_it = base();
267  return (--tmp_it).utf8_length();
268  }
269  };
271  return const_reverse_iterator(end());
272  }
274  return const_reverse_iterator(begin());
275  }
276 
277  // Substring searching. Returns the beginning of the first
278  // occurrence of "look", or end() if not found.
279  const_iterator find(const UnicodeText& look, const_iterator start_pos) const;
280  // Equivalent to find(look, begin())
281  const_iterator find(const UnicodeText& look) const;
282 
283  // Returns whether this contains the character U+FFFD. This can
284  // occur, for example, if the input to Encodings::Decode() had byte
285  // sequences that were invalid in the source encoding.
286  bool HasReplacementChar() const;
287 
288  // UTF-8-specific methods
289  //
290  // Return the data, length, and capacity of UTF-8-encoded version of
291  // the text. Length and capacity are measured in bytes.
292  const char* utf8_data() const { return repr_.data_; }
293  int utf8_length() const { return repr_.size_; }
294  int utf8_capacity() const { return repr_.capacity_; }
295 
296  // Return the UTF-8 data as a string.
297  static string UTF8Substring(const const_iterator& first,
298  const const_iterator& last);
299 
300  // There are three methods for initializing a UnicodeText from UTF-8
301  // data. They vary in details of memory management. In all cases,
302  // the data is tested for interchange-validity. If it is not
303  // interchange-valid, a LOG(WARNING) is issued, and each
304  // structurally invalid byte and each interchange-invalid codepoint
305  // is replaced with a space.
306 
307  // x.CopyUTF8(buf, len) copies buf into x.
308  UnicodeText& CopyUTF8(const char* utf8_buffer, int byte_length);
309 
310  // x.TakeOwnershipOfUTF8(buf, len, capacity). x takes ownership of
311  // buf. buf is not copied.
312  UnicodeText& TakeOwnershipOfUTF8(char* utf8_buffer,
313  int byte_length,
314  int byte_capacity);
315 
316  // x.PointToUTF8(buf,len) changes x so that it points to buf
317  // ("becomes an alias"). It does not take ownership or copy buf.
318  // If the buffer is not valid, this has the same effect as
319  // CopyUTF8(utf8_buffer, byte_length).
320  UnicodeText& PointToUTF8(const char* utf8_buffer, int byte_length);
321 
322  // Occasionally it is necessary to use functions that operate on the
323  // pointer returned by utf8_data(). MakeIterator(p) provides a way
324  // to get back to the UnicodeText level. It uses CHECK to ensure
325  // that p is a pointer within this object's UTF-8 data, and that it
326  // points to the beginning of a character.
327  const_iterator MakeIterator(const char* p) const;
328 
329  string DebugString() const;
330 
331  private:
332  friend class const_iterator;
333  friend class UnicodeTextUtils;
334 
335  class Repr { // A byte-string.
336  public:
337  char* data_;
338  int size_;
339  int capacity_;
340  bool ours_; // Do we own data_?
341 
342  Repr() : data_(nullptr), size_(0), capacity_(0), ours_(true) {}
343  ~Repr() { if (ours_) delete[] data_; }
344 
345  void clear();
346  void reserve(int capacity);
347  void resize(int size);
348 
349  void append(const char* bytes, int byte_length);
350  void Copy(const char* data, int size);
351  void TakeOwnershipOf(char* data, int size, int capacity);
352  void PointTo(const char* data, int size);
353 
354  string DebugString() const;
355 
356  private:
357  Repr& operator=(const Repr&);
358  Repr(const Repr& other);
359  };
360 
361  Repr repr_;
362 
363  // UTF-8-specific private methods.
364  // These routines do not perform a validity check when compiled
365  // in opt mode.
366  // It is an error to call these methods with UTF-8 data that
367  // is not interchange-valid.
368  //
369  UnicodeText& UnsafeCopyUTF8(const char* utf8_buffer, int byte_length);
370  UnicodeText& UnsafeTakeOwnershipOfUTF8(
371  char* utf8_buffer, int byte_length, int byte_capacity);
372  UnicodeText& UnsafePointToUTF8(const char* utf8_buffer, int byte_length);
373  UnicodeText& UnsafeAppendUTF8(const char* utf8_buffer, int byte_length);
374  const_iterator UnsafeFind(const UnicodeText& look,
375  const_iterator start_pos) const;
376 };
377 
378 bool operator==(const UnicodeText& lhs, const UnicodeText& rhs);
379 
380 inline bool operator!=(const UnicodeText& lhs, const UnicodeText& rhs) {
381  return !(lhs == rhs);
382 }
383 
384 // UnicodeTextRange is a pair of iterators, useful for specifying text
385 // segments. If the iterators are ==, the segment is empty.
386 typedef pair<UnicodeText::const_iterator,
388 
390  return r.first == r.second;
391 }
392 
393 
394 // *************************** Utilities *************************
395 
396 // A factory function for creating a UnicodeText from a buffer of
397 // UTF-8 data. The new UnicodeText takes ownership of the buffer. (It
398 // is an "owner.")
399 //
400 // Each byte that is structurally invalid will be replaced with a
401 // space. Each codepoint that is interchange-invalid will also be
402 // replaced with a space, even if the codepoint was represented with a
403 // multibyte sequence in the UTF-8 data.
404 //
406  char* utf8_buffer, int byte_length, int byte_capacity) {
408  utf8_buffer, byte_length, byte_capacity);
409 }
410 
411 // A factory function for creating a UnicodeText from a buffer of
412 // UTF-8 data. The new UnicodeText does not take ownership of the
413 // buffer. (It is an "alias.")
414 //
416  const char* utf8_buffer, int byte_length) {
417  return UnicodeText().PointToUTF8(utf8_buffer, byte_length);
418 }
419 
420 // Create a UnicodeText from a UTF-8 string or buffer.
421 //
422 // If do_copy is true, then a copy of the string is made. The copy is
423 // owned by the resulting UnicodeText object and will be freed when
424 // the object is destroyed. This UnicodeText object is referred to
425 // as an "owner."
426 //
427 // If do_copy is false, then no copy is made. The resulting
428 // UnicodeText object does NOT take ownership of the string; in this
429 // case, the lifetime of the UnicodeText object must not exceed the
430 // lifetime of the string. This Unicodetext object is referred to as
431 // an "alias." This is the same as MakeUnicodeTextWithoutAcceptingOwnership.
432 //
433 // If the input string does not contain valid UTF-8, then a copy is
434 // made (as if do_copy were true) and coerced to valid UTF-8 by
435 // replacing each invalid byte with a space.
436 //
437 inline UnicodeText UTF8ToUnicodeText(const char* utf8_buf, int len,
438  bool do_copy) {
439  UnicodeText t;
440  if (do_copy) {
441  t.CopyUTF8(utf8_buf, len);
442  } else {
443  t.PointToUTF8(utf8_buf, len);
444  }
445  return t;
446 }
447 
448 inline UnicodeText UTF8ToUnicodeText(const string& utf_string, bool do_copy) {
449  return UTF8ToUnicodeText(utf_string.data(), utf_string.size(), do_copy);
450 }
451 
452 inline UnicodeText UTF8ToUnicodeText(const char* utf8_buf, int len) {
453  return UTF8ToUnicodeText(utf8_buf, len, true);
454 }
455 inline UnicodeText UTF8ToUnicodeText(const string& utf8_string) {
456  return UTF8ToUnicodeText(utf8_string, true);
457 }
458 
459 // Return a string containing the UTF-8 encoded version of all the
460 // Unicode characters in t.
461 inline string UnicodeTextToUTF8(const UnicodeText& t) {
462  return string(t.utf8_data(), t.utf8_length());
463 }
464 
465 // This template function declaration is used in defining arraysize.
466 // Note that the function doesn't need an implementation, as we only
467 // use its type.
468 template <typename T, size_t N>
469 char (&ArraySizeHelper(T (&array)[N]))[N];
470 #define arraysize(array) (sizeof(ArraySizeHelper(array)))
471 
472 // For debugging. Return a string of integers, written in uppercase
473 // hex (%X), corresponding to the codepoints within the text. Each
474 // integer is followed by a space. E.g., "61 62 6A 3005 ".
475 string CodepointString(const UnicodeText& t);
476 
477 #endif // UTIL_UTF8_PUBLIC_UNICODETEXT_H_
UnicodeText::const_iterator::operator<
friend bool operator<(const CI &lhs, const CI &rhs)
Definition: unicodetext.cc:416
UTF8ToUnicodeText
UnicodeText UTF8ToUnicodeText(const char *utf8_buf, int len, bool do_copy)
Definition: unicodetext.h:437
string
std::string string
Definition: equationdetect_test.cc:21
UnicodeText::empty
bool empty() const
Definition: unicodetext.h:145
UnicodeText::~UnicodeText
~UnicodeText()
Definition: unicodetext.cc:351
UnicodeText::const_reverse_iterator::const_reverse_iterator
const_reverse_iterator(const_iterator it)
Definition: unicodetext.h:251
UnicodeText::const_iterator
friend class const_iterator
Definition: unicodetext.h:332
UnicodeText::const_iterator::iterator_category
std::bidirectional_iterator_tag iterator_category
Definition: unicodetext.h:179
UnicodeText::rbegin
const_reverse_iterator rbegin() const
Definition: unicodetext.h:270
CodepointString
string CodepointString(const UnicodeText &t)
Definition: unicodetext.cc:502
UnicodeText::const_iterator::operator>
friend bool operator>(const CI &lhs, const CI &rhs)
Definition: unicodetext.h:214
UnicodeText::clear
void clear()
Definition: unicodetext.cc:346
UnicodeText::rend
const_reverse_iterator rend() const
Definition: unicodetext.h:273
UnicodeText::const_iterator::operator!=
friend bool operator!=(const CI &lhs, const CI &rhs)
Definition: unicodetext.h:211
UnicodeText::value_type
char32 value_type
Definition: unicodetext.h:118
UnicodeText::operator==
friend bool operator==(const UnicodeText &lhs, const UnicodeText &rhs)
Definition: unicodetext.cc:375
UnicodeText::UnicodeTextUtils
friend class UnicodeTextUtils
Definition: unicodetext.h:333
UnicodeText::const_iterator::difference_type
ptrdiff_t difference_type
Definition: unicodetext.h:181
UnicodeTextRange
pair< UnicodeText::const_iterator, UnicodeText::const_iterator > UnicodeTextRange
Definition: unicodetext.h:387
UnicodeText::const_iterator::operator=
const_iterator & operator=(const const_iterator &other)
Definition: unicodetext.cc:402
UnicodeText::UTF8Substring
static string UTF8Substring(const const_iterator &first, const const_iterator &last)
Definition: unicodetext.cc:198
UnicodeText::push_back
void push_back(char32 codepoint)
Definition: unicodetext.cc:354
UnicodeText::const_reverse_iterator::get_utf8_string
string get_utf8_string() const
Definition: unicodetext.h:261
UnicodeText::size
int size() const
Definition: unicodetext.cc:371
UnicodeText::const_iterator::const_iterator
const_iterator()
Definition: unicodetext.cc:395
UnicodeText::const_reverse_iterator
Definition: unicodetext.h:249
operator==
bool operator==(const UnicodeText &lhs, const UnicodeText &rhs)
Definition: unicodetext.cc:375
UnicodeText::const_iterator::distance
friend difference_type distance(const CI &first, const CI &last)
Definition: unicodetext.cc:44
UnicodeText::const_iterator::UTF8StateTableProperty
friend class UTF8StateTableProperty
Definition: unicodetext.h:240
UnicodeText::DebugString
string DebugString() const
Definition: unicodetext.cc:381
UnicodeText::const_iterator::UnicodeTextUtils
friend class UnicodeTextUtils
Definition: unicodetext.h:239
UnicodeText::const_iterator::operator--
const_iterator & operator--()
Definition: unicodetext.cc:455
base.h
UnicodeText::const_iterator::operator++
const_iterator operator++(int)
Definition: unicodetext.h:195
UnicodeText::UnicodeText
UnicodeText()
Definition: unicodetext.cc:183
UnicodeText::const_iterator::operator*
char32 operator*() const
Definition: unicodetext.cc:421
UnicodeText::const_reverse_iterator::get_utf8
int get_utf8(char *buf) const
Definition: unicodetext.h:257
UnicodeText::MakeIterator
const_iterator MakeIterator(const char *p) const
Definition: unicodetext.cc:484
UnicodeText::TakeOwnershipOfUTF8
UnicodeText & TakeOwnershipOfUTF8(char *utf8_buffer, int byte_length, int byte_capacity)
Definition: unicodetext.cc:236
UnicodeText::const_iterator::utf8_data
const char * utf8_data() const
Definition: unicodetext.h:233
UnicodeText::operator=
UnicodeText & operator=(const UnicodeText &src)
Definition: unicodetext.cc:207
last
LIST last(LIST var_list)
Definition: oldlist.cpp:151
UnicodeText::find
const_iterator find(const UnicodeText &look, const_iterator start_pos) const
Definition: unicodetext.cc:306
UnicodeText::const_iterator::operator--
const_iterator operator--(int)
Definition: unicodetext.h:202
UnicodeText::utf8_length
int utf8_length() const
Definition: unicodetext.h:293
UnicodeText::const_iterator::operator==
friend bool operator==(const CI &lhs, const CI &rhs)
Definition: unicodetext.h:209
UnicodeText
Definition: unicodetext.h:116
UnicodeTextRangeIsEmpty
bool UnicodeTextRangeIsEmpty(const UnicodeTextRange &r)
Definition: unicodetext.h:389
UnicodeText::PointTo
UnicodeText & PointTo(const UnicodeText &src)
Definition: unicodetext.cc:273
UnicodeText::const_iterator::reference
const typedef char32 reference
Definition: unicodetext.h:183
MakeUnicodeTextWithoutAcceptingOwnership
UnicodeText MakeUnicodeTextWithoutAcceptingOwnership(const char *utf8_buffer, int byte_length)
Definition: unicodetext.h:415
UnicodeText::const_reverse_iterator::utf8_data
const char * utf8_data() const
Definition: unicodetext.h:253
UnicodeText::const_iterator::operator++
const_iterator & operator++()
Definition: unicodetext.cc:450
UnicodeText::PointToUTF8
UnicodeText & PointToUTF8(const char *utf8_buffer, int byte_length)
Definition: unicodetext.cc:256
UnicodeText::const_iterator::pointer
void pointer
Definition: unicodetext.h:182
UnicodeText::const_iterator::DebugString
string DebugString() const
Definition: unicodetext.cc:495
UnicodeTextToUTF8
string UnicodeTextToUTF8(const UnicodeText &t)
Definition: unicodetext.h:461
UnicodeText::end
const_iterator end() const
Definition: unicodetext.cc:412
UnicodeText::HasReplacementChar
bool HasReplacementChar() const
Definition: unicodetext.cc:331
UnicodeText::const_iterator::utf8_length
int utf8_length() const
Definition: unicodetext.cc:472
UnicodeText::const_iterator::operator<=
friend bool operator<=(const CI &lhs, const CI &rhs)
Definition: unicodetext.h:216
UnicodeText::const_iterator::operator>=
friend bool operator>=(const CI &lhs, const CI &rhs)
Definition: unicodetext.h:218
MakeUnicodeTextAcceptingOwnership
UnicodeText MakeUnicodeTextAcceptingOwnership(char *utf8_buffer, int byte_length, int byte_capacity)
Definition: unicodetext.h:405
char32
signed int char32
Definition: pango_font_info.h:33
UnicodeText::utf8_capacity
int utf8_capacity() const
Definition: unicodetext.h:294
UnicodeText::append
UnicodeText & append(ForwardIterator first, const ForwardIterator last)
Definition: unicodetext.h:160
UnicodeText::utf8_data
const char * utf8_data() const
Definition: unicodetext.h:292
UnicodeText::Copy
UnicodeText & Copy(const UnicodeText &src)
Definition: unicodetext.cc:214
UnicodeText::operator!=
friend bool operator!=(const UnicodeText &lhs, const UnicodeText &rhs)
Definition: unicodetext.h:380
UnicodeText::assign
UnicodeText & assign(const UnicodeText &src)
Definition: unicodetext.h:134
ArraySizeHelper
char(& ArraySizeHelper(T(&array)[N]))[N]
UnicodeText::const_iterator::get_utf8_string
string get_utf8_string() const
Definition: unicodetext.cc:468
UnicodeText::const_iterator
Definition: unicodetext.h:176
operator!=
bool operator!=(const UnicodeText &lhs, const UnicodeText &rhs)
Definition: unicodetext.h:380
UnicodeText::CopyUTF8
UnicodeText & CopyUTF8(const char *utf8_buffer, int byte_length)
Definition: unicodetext.cc:219
UnicodeText::const_iterator::value_type
char32 value_type
Definition: unicodetext.h:180
UnicodeText::const_iterator::get_utf8
int get_utf8(char *buf) const
Definition: unicodetext.cc:460
UnicodeText::begin
const_iterator begin() const
Definition: unicodetext.cc:408
UnicodeText::const_reverse_iterator::utf8_length
int utf8_length() const
Definition: unicodetext.h:265