Comparative overview of classification methods
| Method | Basic idea | Key assumpitons | Strengths | Limitations |
|---|---|---|---|---|
| Random forest | Ensemble of bootstrapped trees with feature randomness; votes aggregated | Weakly correlated, moderately strong trees; enough trees for stability | Strong out-of-box accuracy; robust to noise and outliers; little tuning; variable importance available | Less interpretable than single tree; slower with many trees; probability calibration sometimes needed |
| Decision tree | Greedy recursive splits to reduce impurity | None on distribution/scale; assumes meaningful splits exist | Interpretable rules; handles nonlinearity and interactions; invariant to monotone rescaling | Unstable; prone to overfit; lower accuracy vs ensembles |
| Gradient boosting | Sequentially adds small trees to fit residuals (boosts weak learners) | Additive tree model; learning rate and depth govern bias–variance tradeoff | State-of-the-art tabular accuracy; captures subtle interactions; flexible | More tuning sensitive; can overfit without early stopping; slower than RF |
| Support vector machine | Finds a maximum-margin boundary (linear or kernelized) | Margin separability in transformed space; appropriate kernel choice; scaled features | Strong on high-dimensional data; handles complex boundaries with kernels | Harder to tune; no native probabilities (needs calibration); slower on very large N; scaling required |
| Logistic regression | Fits a logistic link between features and the probability of bankruptcy | Linear log-odds; additivity; limited multicollinearity; well-specified features | Simple, fast, well-understood; baseline for odds ratios; calibrated probabilities | Misses nonlinearity and interactions unless engineered; sensitive to scaling/collinearity; underfits complex patterns |
| Method | Basic idea | Key assumpitons | Strengths | Limitations |
|---|---|---|---|---|
| Random forest | Ensemble of bootstrapped trees with feature randomness; votes aggregated | Weakly correlated, moderately strong trees; enough trees for stability | Strong out-of-box accuracy; robust to noise and outliers; little tuning; variable importance available | Less interpretable than single tree; slower with many trees; probability calibration sometimes needed |
| Decision tree | Greedy recursive splits to reduce impurity | None on distribution/scale; assumes meaningful splits exist | Interpretable rules; handles nonlinearity and interactions; invariant to monotone rescaling | Unstable; prone to overfit; lower accuracy vs ensembles |
| Gradient boosting | Sequentially adds small trees to fit residuals (boosts weak learners) | Additive tree model; learning rate and depth govern bias–variance tradeoff | State-of-the-art tabular accuracy; captures subtle interactions; flexible | More tuning sensitive; can overfit without early stopping; slower than |
| Support vector machine | Finds a maximum-margin boundary (linear or kernelized) | Margin separability in transformed space; appropriate kernel choice; scaled features | Strong on high-dimensional data; handles complex boundaries with kernels | Harder to tune; no native probabilities (needs calibration); slower on very large |
| Logistic regression | Fits a logistic link between features and the probability of bankruptcy | Linear log-odds; additivity; limited multicollinearity; well-specified features | Simple, fast, well-understood; baseline for odds ratios; calibrated probabilities | Misses nonlinearity and interactions unless engineered; sensitive to scaling/collinearity; underfits complex patterns |
Sharing content requires targeting cookies to be enabled. Please update your cookie preferences to use this feature.