An Oversampling Technique for Classifying Imbalanced Datasets
-
Published:2017
Son Nguyen, John Quinn, Alan Olinsky, 2017. "An Oversampling Technique for Classifying Imbalanced Datasets", Advances in Business and Management Forecasting
Download citation file:
Abstract
We propose an oversampling technique to increase the true positive rate (sensitivity) in classifying imbalanced datasets (i.e., those with a value for the target variable that occurs with a small frequency) and hence boost the overall performance measurements such as balanced accuracy, G-mean and area under the receiver operating characteristic (ROC) curve, AUC. This oversampling method is based on the idea of applying the Synthetic Minority Oversampling Technique (SMOTE) on only a selective portion of the dataset instead of the entire dataset. We demonstrate the effectiveness of our oversampling method with four real and simulated datasets generated from three models.
