tesseract
5.0.0-alpha-619-ge9db
|
#include "oldlist.h"
#include "featdefs.h"
#include "tessopt.h"
#include "ocrfeatures.h"
#include "clusttool.h"
#include "cluster.h"
#include <cstring>
#include <cstdio>
#include <cmath>
#include <tesseract/unichar.h>
#include "commontraining.h"
Go to the source code of this file.
Macros | |
#define | PROGRAM_FEATURE_TYPE "cn" |
Functions | |
int | main (int argc, char *argv[]) |
#define PROGRAM_FEATURE_TYPE "cn" |
Definition at line 34 of file cntraining.cpp.
int main | ( | int | argc, |
char * | argv[] | ||
) |
This program reads in a text file consisting of feature samples from a training page in the following format:
FontName CharName NumberOfFeatureTypes(N) FeatureTypeName1 NumberOfFeatures(M) Feature1 ... FeatureM FeatureTypeName2 NumberOfFeatures(M) Feature1 ... FeatureM ... FeatureTypeNameN NumberOfFeatures(M) Feature1 ... FeatureM FontName CharName ...
It then appends these samples into a separate file for each character. The name of the file is
DirectoryName/FontName/CharName.FeatureTypeName
The DirectoryName can be specified via a command line argument. If not specified, it defaults to the current directory. The format of the resulting files is:
NumberOfFeatures(M) Feature1 ... FeatureM NumberOfFeatures(M) ...
The output files each have a header which describes the type of feature which the file contains. This header is in the format required by the clusterer. A command line argument can also be used to specify that only the first N samples of each class should be used.
argc | number of command line arguments |
argv | array of command line arguments |
Definition at line 104 of file cntraining.cpp.