What is Classification?
Classification is a type of predictive modeling that assigns items or observations to one of several predefined categories based on input data. It is used when the outcome variable is categorical. For example, predicting whether an email is spam or not spam, whether a transaction is fraudulent or legitimate, or whether a patient’s test result is positive or negative.
The goal of a classification model is to learn patterns from a labeled dataset, where the correct category for each observation is already known. The model then applies what it has learned to new, unlabeled data to predict the most likely category. This process falls under supervised learning, because the model is trained using known outcomes.
Classification tasks can involve just two categories (binary classification), or more than two (multiclass classification). Some common examples include:
- Determining whether a customer will churn or stay
- Categorizing loan applications as high risk, medium risk, or low risk
- Identifying species of plants or animals based on physical traits
Methods
Many different methods can be used for classification, including the following:
- Logistic regression
- Decision trees and random forests
- Support vector machines (SVM)
- Naive Bayes classifiers
- K-nearest neighbors (KNN)
- Neural networks, including deep learning models
- Gradient boosting machines (e.g., XGBoost)
Analysts use various metrics to evaluate classification model performance including accuracy, precision, recall, F1 score, and the confusion matrix. The choice depends on the goals of the analysis and the consequences of different types of classification errors. In some cases, especially in medical or diagnostic contexts, related metrics like sensitivity, specificity, positive predictive value, negative predictive value, and likelihood ratios may also be used to assess model performance.
Classification is a core task in machine learning and predictive analytics, and it plays a key role in applications such as image recognition, medical diagnosis, spam detection, and customer segmentation.
« Back to Glossary Index