|
tesseract
5.0.0-alpha-619-ge9db
|
#include "oldlist.h"#include "featdefs.h"#include "tessopt.h"#include "ocrfeatures.h"#include "clusttool.h"#include "cluster.h"#include <cstring>#include <cstdio>#include <cmath>#include <tesseract/unichar.h>#include "commontraining.h"Go to the source code of this file.
Macros | |
| #define | PROGRAM_FEATURE_TYPE "cn" |
Functions | |
| int | main (int argc, char *argv[]) |
| #define PROGRAM_FEATURE_TYPE "cn" |
Definition at line 34 of file cntraining.cpp.
| int main | ( | int | argc, |
| char * | argv[] | ||
| ) |
This program reads in a text file consisting of feature samples from a training page in the following format:
FontName CharName NumberOfFeatureTypes(N)
FeatureTypeName1 NumberOfFeatures(M)
Feature1
...
FeatureM
FeatureTypeName2 NumberOfFeatures(M)
Feature1
...
FeatureM
...
FeatureTypeNameN NumberOfFeatures(M)
Feature1
...
FeatureM
FontName CharName ...
It then appends these samples into a separate file for each character. The name of the file is
DirectoryName/FontName/CharName.FeatureTypeName
The DirectoryName can be specified via a command line argument. If not specified, it defaults to the current directory. The format of the resulting files is:
NumberOfFeatures(M)
Feature1
...
FeatureM
NumberOfFeatures(M)
...
The output files each have a header which describes the type of feature which the file contains. This header is in the format required by the clusterer. A command line argument can also be used to specify that only the first N samples of each class should be used.
| argc | number of command line arguments |
| argv | array of command line arguments |
Definition at line 104 of file cntraining.cpp.