Table A4.

Comparative overview of classification methods

MethodBasic ideaKey assumpitonsStrengthsLimitations
Random forestEnsemble of bootstrapped trees with feature randomness; votes aggregatedWeakly correlated, moderately strong trees; enough trees for stabilityStrong out-of-box accuracy; robust to noise and outliers; little tuning; variable importance availableLess interpretable than single tree; slower with many trees; probability calibration sometimes needed
Decision treeGreedy recursive splits to reduce impurityNone on distribution/scale; assumes meaningful splits existInterpretable rules; handles nonlinearity and interactions; invariant to monotone rescalingUnstable; prone to overfit; lower accuracy vs ensembles
Gradient boostingSequentially adds small trees to fit residuals (boosts weak learners)Additive tree model; learning rate and depth govern bias–variance tradeoffState-of-the-art tabular accuracy; captures subtle interactions; flexibleMore tuning sensitive; can overfit without early stopping; slower than RF
Support vector machineFinds a maximum-margin boundary (linear or kernelized)Margin separability in transformed space; appropriate kernel choice; scaled featuresStrong on high-dimensional data; handles complex boundaries with kernelsHarder to tune; no native probabilities (needs calibration); slower on very large N; scaling required
Logistic regressionFits a logistic link between features and the probability of bankruptcyLinear log-odds; additivity; limited multicollinearity; well-specified featuresSimple, fast, well-understood; baseline for odds ratios; calibrated probabilitiesMisses nonlinearity and interactions unless engineered; sensitive to scaling/collinearity; underfits complex patterns

or Create an Account

Close Modal
Close Modal