An illustrative example for defining qualitatively the metric that assesses both “intrinsic difficulty” (when using the same source for training and testing) and “inconsistency” (difference when using different training and testing sources), here the prediction error is measured by the average error rate.