A classifier is a method for determining the likely class of an unknown object or event based on a number of instances of each of the classes known as the training set.
The first step in classification is feature extraction, which is where each instance in the training set is expressed as a vector of measurements. Where images are being classified, this vector could be the made from the pixel intensities, but frequently a feature reduction step (such as Principal Component Analysis) is employed to reduce this to a more easily handled length. The measurements are usually referred to as features, and may be real, integer or categorical. The space spanned by all possible combinations of features is referred to as the feature space. For some problems, not all of the measurements are available, and this is referred to as missing data.
Once features have been extracted there are now two possible cases, which are in general handled quite differently. In the first case, the actual class of each instance in the training set is made available to the classifier. This is known as supervised classification. In the second case, the information is not made available and this is called unsupervised classification or clustering.
Supervised classification is where the actual classes of each of the instances in the training set are known. Sometimes an instance has been given an incorrect class, and this is called a labelling error.
The most frequently used supervised classifiers are binary classifiers, which distinguish between two types of objects or events. For this case, the output of the classifier will be a single output at each point in feature space indicating how strongly the classifier believes that point to have been generated by a particular class. This function is known as a discriminant (as in Fisher's linear discriminant or quadratic discriminant analysis). Multiple class classifiers are frequently generated from a number of binary classifiers combined pairwise or using one against the rest.
Performance of a supervised classifier
The output of a classifier is an estimate for the most likely class at each point in feature space. The obvious measure of performance is to compare the estimated class against the actual class. The classifier error can then be the total number of misclassified instances. There are also a number of common ways to graphically represent the classifier performance.
For a binary classifier, a number of thresholds can be applied to the resulting discriminant, and the probability of a correct classification of class 1 (sometimes called the probability of detection) can be plotted against the probability of an incorrect classification of class 2 (sometimes called the false alarm rate). The resulting curve is called a ROC (Receiver Operating Characteristic) curve. The area under the ROC curve is frequently used as a measure of the classifier performance.
For a multiple class classifier, performance cannot be displayed sosimply. Usually, a single operating point of the classifier is used, and a matrix is constructed with two axes. The first is the actual class, and the second is the predicted class. Each of the instances are then added to the matrix to produce a 2D histogram, which is referred to as a confusion matrix.
Classifiers with a large number of free parameters (such as the nearest neighbour classifier or most neural network classifiers) have a tendency to fit the training data extremely well. However, the actual performance of the classifier is how well it does on data that the classifier has never seen before. Improving the performance on the training set can have the seemingly paradoxical effect of producing worse classification on other data (an effect which is referred to as overtraining). For this reason, when assessing the performance of a classifier, all of the available data is divided into three separate sets. The first is the training set, the second is a cross-validation set which is used to give an intermediate measure of performance for classifiers which have a set of parameters which need to be tuned. The final set is a test set, which is used to assess the performance of the tuned classifier. For a give data set, a number of different measures of performance can be obtained by partitioning the data in different ways and summarising the results as a mean and variance.
Types of supervised classifiers
- Fisher's linear discriminant
- Quadratic discriminant analysis
- Nearest neighbours
- Parzen window methods
- Support Vector Machines
- Ensemble methods
Types of unsupervised classifiers
- Gaussian mixture models
So that any computer programs for classifiers can interoperate, see Classifier interoperation for details on how to standardise the input and outputs of classifier programs on this site.