Predicting post-IPO financial performance: a hybrid approach using logistic regression and decision trees

Supsermpol, Pornpawee; Huynh, Van Nam; Thajchayapong, Suttipong; Suppakitjarak, Nathridee; Chiadamrong, Navee

doi:10.1108/JABES-06-2024-0292

Purpose

This study enhances the financial modelling of companies undergoing an Initial Public Offering (IPO) by focusing on internal capability determinants and IPO proceeds.

Design/methodology/approach

A hybrid logistic regression and shallow-depth decision tree approach are employed to predict the initial three-year post-IPO performance of companies listed on the Stock Exchange of Thailand (SET) using data from 2002 to 2021.

Findings

The results demonstrate that these models not only perform competitively against complex machine learning algorithms but also surpass them in terms of interpretability, an essential feature in financial modelling. The proposed approach effectively captures the effects of each determinant, offering valuable insights into strategic resource allocation and investment decision-making during transition years.

Originality/value

This study introduces a novel application that integrates logistic regression with decision trees to predict multiclass financial performance, filling the gap between complex machine learning techniques and interpretable financial models. It offers practical tools for companies and investors to make informed decisions in challenging post-IPO environments.

1. Introduction

Predicting the post-initial public offering (IPO) financial performance is crucial for both companies and investors. Companies encounter difficulties in managing finite resources for long-term survival and competitive advantage (Hensler et al., 1997), whereas investors rely on accurate predictions of informed investment decisions (Raut et al., 2020). Research has explored internal capability determinants, such as financial ratios, which reflect operational efficiency and strategic financial management (Šarlija et al., 2017; Shafiquea et al., 2021). Additionally, IPO proceeds provide essential capital for growth and risk mitigation (Larrain et al., 2021). However, an excessive or inefficient allocation of proceeds can result in adverse effects (Engelen et al., 2020). Although external factors have also been studied, they fall outside the direct influence of the company and present a more complex challenge to leverage when developing strategic decisions. This study examines the internal determinants and IPO proceeds within a company’s control, providing actionable insights for stakeholders.

By employing a combination of Return on Assets (ROA), Return on Equity (ROE), and Return on Sales (ROS), this study offers a comprehensive insight into the company’s profitability and operational efficiency. Various methodologies have been employed to predict these values. While Machine Learning (ML) algorithms may deliver superior predictive accuracy, the “black box” nature poses a challenge in comprehending the variable connections (Lakkaraju et al., 2017). Logistic regression, which is simpler and more interpretable, may not fully capture the complex relationships in financial data. Thus, integrating methods that can capture these nonlinear relationships is pivotal. This study adopts a hybrid methodology grounding its analysis in logistic regression, leveraging its transparency while introducing Decision Trees (DT) as complementary tools to unravel intricate relationships and threshold effects. This enhances the accuracy of predictions.

As Thailand remains a leading destination for IPOs in Southeast Asia, offering unique insights into emerging markets (Al Farooque et al., 2020), this study endeavours to construct financial models to assess how new listings perform on the Stock Exchange of Thailand (SET). Predictive models were developed for each year during the initial three years post-IPO with a dataset of 134 companies from 2002 to 2021. Therefore, this study provides the following insights:

(1)
It focuses on the impact of internal determinants and IPO proceeds and offers a detailed understanding of their influence on the financial performance of each post-IPO year.
(2)
To our knowledge, this study is the first to use a hybrid model of logistic regression and DT to predict multiclass financial performance, competing effectively with the Random Forest (RF) algorithm while maintaining superior interpretability.
(3)
Identifying and analysing univariate and bivariate threshold effects provides critical insights into the specific points where changes in key variables significantly impact financial performance and guide strategic decision-making.
(4)
The models deliver actionable recommendations for companies, guiding them on strategically allocating resources to crucial factors during different post-IPO phases and assisting investors in making informed investment strategies.

2. Literature review

An IPO represents a significant milestone, providing companies with capital to reduce debt and pursue growth opportunities, enhancing their financial well-being. Understanding the factors influencing post-IPO performance has become a concern for companies, investors, stakeholders, and academics. Multiple linear regression has traditionally been used to predict financial performance, examining factors such as size, age, earnings, and tangible assets with metrics such as ROA, ROE, and ROS (Majumdar, 1997; Babalola, 2013; İltaş and Demirgüneş, 2020; Ong et al., 2023). Meanwhile, logistic regression has been widely employed for class predictions, offering high interpretability, but struggles with the nonlinear relationships common in financial datasets (Šarlija et al., 2017; Hoang et al., 2019).

To address these limitations, ML algorithms have been widely adopted because of their capability to model intricate relationships. DT provides a straightforward and interpretable approach by splitting data into decision rules based on feature thresholds (Delen et al., 2013). However, it can be prone to overfitting and may not perform as well on its own compared to ensemble algorithms. Both RF and extreme Gradient Boosting (XGBoost), which are ensemble tree-based methods, overcome the limitations of simple DT. RF reduces overfitting by averaging predictions from multiple decision trees; however, its lack of transparency makes the interpretation of individual variable contributions challenging (Appiahene et al., 2020). XGBoost further enhances predictive accuracy through boosting techniques, making it efficient for large datasets (Munshi et al., 2022). However, both RF and XGBoost lack interpretability and require post hoc methods to explain their predictions. Other algorithms, such as Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs), have also been used to model nonlinear relationships (Pechlivanidis et al., 2022; Kayakus et al., 2023). However, these models are “black-box” in nature, demanding large datasets for effective training and offering limited interpretability, which complicates the ability of decision makers to grasp how predictions are made (Lakkaraju et al., 2017). Post hoc interpretability techniques such as Local Interpretable Model-agnostic Explanations (LIME; Ribeiro et al., 2016), SHapley Additive exPlanations (SHAP; Lundberg and Lee, 2017), and Partial Dependence Plots (Friedman, 2001) can help explain predictions by providing variable importance and detailed local explanations of complex models. However, they are limited in their ability to provide clear and direct insights, which is crucial in high-stakes domains such as finance.

Interpretability is essential in financial contexts as it provides clear insights into how various determinants impact post-IPO performance. This transparency is crucial for companies in crafting strategic decisions and for investors in evaluating risks and making well-informed investment decisions. To address these limitations, this study adapts Penalised Logistic Tree Regression (PLTR; Dumitrescu et al., 2022) to enhance logistic regression models by incorporating short-depth DT. DT is particularly effective in pinpointing exact thresholds, both individually and in interactions with other variables, where performance outcomes shift. Logistic regression complements this finding by providing clear effects for each determinant through its coefficients. Unlike similar techniques that rely on deeper DT, such as CART (Breiman et al., 1984), PLTR employs short-depth trees to allow for the granular identification of individuals and thresholds, balancing predictive accuracy with interpretability. Compared to more advanced ML algorithms, such as RF and XGBoost, the hybrid model offers a significant advantage regarding interpretability. Although complex algorithms require post hoc explanations that may not provide the same intuitive understanding of variable interactions or thresholds, the proposed hybrid approach inherently provides these insights.

Using a relative transition year to assess the timing effect for predicting post-IPO performance offers more precise insights than traditional calendar-year-based evaluations. Accurately comparing financial performance across companies during their public transition phases is valuable, as shown in Larrain et al. (2021), Engelen et al. (2020), and Supsermpol et al. (2023). Table A1 ^[1] presents an overview of the literature and the advantages and disadvantages of the applied algorithms, highlighting existing research gaps in this domain.

3. Research formulation

This study focuses on predicting the financial outcomes of companies listed on the SET during their post-IPO transition periods. Several key metrics are commonly employed to offer a holistic view of financial well-being and operational efficiency. ROA evaluates how efficiently assets are used to produce profits, offering a lens for operational efficiency (Babalola, 2013). ROE measures profitability relative to shareholder equity, indicating how effectively invested funds drive earnings growth (Shafiquea et al., 2021). ROS reflects how effectively a company converts sales into profit, reflecting its pricing strategy, cost control, and overall operational efficiency (Majumdar, 1997). In this study, financial performance (the dependent variable) is constructed from a composite of ROA, ROE, and ROS to capture the multifaceted nature of financial performance compared with single-indicator methods (Šarlija et al., 2017; Hoang et al., 2019). For instance, a high ROE with a low ROA may indicate an aggressive debt strategy, whereas a robust ROA with diminished ROE suggests efficient asset utilisation without proportional shareholder gains. Strong ROA and ROE, alongside lagging ROS, signal effective profit conversion despite sales challenges. This aggregated metric reflects a company’s profitability, asset management efficiency, and sales efficacy, which are crucial for understanding its post-IPO financial health. The dependent variables are presented in Table A2 ^[1].

3.1 Independent variables

This study predicts the financial performance of companies post-IPO by concentrating on two groups of internal capabilities—financial robustness and operational efficacy— along with the amount of IPO proceeds instead of less manageable external factors. This approach offers deep insights into financial integrity, endurance, and capacity for expansion.

Financial robustness reflects the ability to adhere to financial duties while persisting operationally over the long term. This study examined financial robustness through company size, liquidity, leverage, and asset tangibility. Larger companies benefit from economies of scale but often experience slower growth post-IPO, whereas smaller firms can capitalise on new investments for faster expansion (Pagano et al., 1998). High liquidity supports initial growth opportunities but may signal inefficient capital allocation if it remains excessive in later years (Billett et al., 2006; Nguyen et al., 2022). Leverage can amplify returns and increase risk (Engelen et al., 2020). Lastly, tangible assets help secure financing post-IPO but may result in inflexibility and higher maintenance costs (İltaş and Demirgüneş, 2020).

Another group of determinants to be considered is operational efficacy, which involves the adept management of a company’s resources to maximise revenue generation and minimise expenses. The key performance indicators include the cost of goods sold (COGS), asset utilisation, retention ratio, and net profit margin. Effective management of COGS can boost a company’s profits (Dambra et al., 2015). Efficient asset use drives better financial outcomes in the early post-IPO years, reflecting operational optimisation (Shafiquea et al., 2021). The retention ratio shapes the growth strategy and reflects a commitment to internal funds for future expansions (Pagano et al., 1998). The net profit margin provides insights into overall efficiency, cost management, and the ability to generate profits from sales (Fairfield and Yohn, 2001).

Finally, larger IPO proceeds signal market confidence and provide companies with vital capital for expansion, innovation, and debt reduction (Ritter, 1991). However, this may also lead to poorer financial outcomes because of overspending or ineffective resource use (Engelen et al., 2020). These variables offer a multidimensional framework for understanding the dynamics of financial performance post-IPO. Table A3 ^[1] lists the independent variables used in this study.

3.2 Data

The dataset of this study, originally featured in Supsermpol et al. (2023), encompasses the financial information of companies on the SET over their transition periods, from one year before to three years post-IPO. Typically, companies seeking to join the SET must disclose at least one year of pre-IPO financial documentation, which is crucial for assessing their financial health and prospects. Owing to these data constraints, this study leverages financial information from the year pre-IPO to forecast a company’s well-being over the subsequent three years.

The dataset was compiled from the SET (https://www.set.or.th/en/home). Companies that experienced bankruptcy or mergers resulting in incomplete financial records were excluded. This resulted in a final dataset comprising 134 companies registered on the SET from 2003 to 2018, with data from 2002 to 2021. Companies in the financial and service industries were omitted to ensure a focused and relevant evaluation, as shown in Figures A1 and A2 [1], respectively.

4. Proposed method

Traditional regression methods have shown limitations in predictive precision, as demonstrated by Chiadamrong and Wattanawarangkoon (2023), where they found the mean absolute percentage error to be as high as 50%. To address this limitation, this study segments companies’ financial health into three classes—high, medium, and low—based on the third and two-thirds percentiles of aggregated ROA, ROE, and ROS. This strategy counters the inaccuracies in predicting exact values, thereby avoiding the pitfalls of oversimplification and excessive granularity.

To advance the methodology for evaluating post-IPO financial performance, this study proposes a new hybrid model that combines penalised logistic regression with short-depth DT to predict multiclass company performance. This approach, adapted from Dumitrescu et al. (2022), marks a significant shift by extending binary credit scoring models to accommodate three-class responses and exploring optimal penalisation techniques. Figure A3 ^[1] illustrates the framework of the proposed hybrid model. Logistic regression was chosen because of its high degree of interpretability. It offers a clear assessment of the impact of each determinant on financial performance, with coefficients that can be directly shown. Interpretability is particularly valuable in the financial and ML domains, where justifying predictive decisions is paramount. DT captures nonlinear relationships and threshold effects that may otherwise remain obscure.

4.1 Step 1: identify threshold effects from short-depth DT

Shallow-depth DT identifies threshold effects, captures nonlinear relationships and critical breakpoints, and provides insights beyond linear models. The thresholds mark shifts in the behaviour of the determinants, which are invaluable for understanding the specific conditions under which financial performance varies significantly while maintaining interpretability. The DT was constructed by recursively splitting the dataset and utilising the Gini index to derive the cutoff values. A lower Gini index indicates a purer split, which pinpoints relevant cutoff values. Tree depth was restricted to two levels to prevent overfitting and to capture significant nonlinear relationships. Figure A4 ^[1] presents an example of a two-level DT for size and liquidity.

Three distinct binary variables are constructed for each DT. The $ν_{1}^{(j)}$ encapsulates the univariate threshold effects, allowing for the detection of changes at specific points. The $ν_{2}^{(j, k)}$ and $ν_{3}^{(j, k)}$ capture the bivariate threshold effects, uncovering complex interdependencies and synergies between variables.

The DT framework requires both variables to be informative, with each variable selected once as the splitting variable. If one of the variables lacks informativeness, the tree may be split twice using the same variable, leading to redundancy. The number of unique pairs can be calculated using the formula for combinations $(\begin{array}{c} J \\ 2 \end{array})$ ⁠, where $J$ is the total number of variables. For nine variables $(J = 9)$ ⁠, this resulted in 36 distinct pairs, generating 117 variables (nine original variables, 36 univariate variables, and 72 bivariate variables). Redundant variables were identified and excluded from subsequent steps.

4.2 Step 2: include the univariate and bivariate variables into the penalised logistic regression model

Once the univariate and bivariate variables were identified from the DT, they were incorporated into the logistic regression model. In general, the multinomial model is presented in Eq. 1, with its general form given by

\log (\frac{P (Y = i)}{1 - P (Y = i)}) = β_{i, 0} + β_{i, 1} X_{1} + \dots + β_{i, j} X_{j}

(1)

where

$P$ the likelihood linked to the event’s occurrence
$i$ the classes, $i = 1, . . ., I$
$j$ the independent variables, $j = 1, . . ., J$
$β_{i, 0}$ the intercept term’s coefficient
$β_{i, 1}$ to $β_{i, j}$ the coefficient for the independent variables X₁ to X_j
$X_{1}$ to $X_{j}$ the distinct independent variables
$Y$ the dependent variables

Or it can be written as

η (X_{j}; β) = β_{i, 0} + \sum_{j = 1}^{J} β_{i, j} X_{j}

(2)

Using the multinomial logistic regression model, the likelihood of event class j occurring is calculated as follows:

P (Y = i | X) = F (η (X_{j}; β)) = \frac{\exp (β_{i, 0} + \sum_{j = 1}^{J} β_{i, j} X_{j})}{1 + \exp (β_{i, 0} + \sum_{j = 1}^{J} β_{i, j} X_{j})} = \frac{\exp (η (X_{j}; β))}{1 + \exp (η (X_{j}; β))}

(3)

To estimate coefficients β_i,j in the model, maximum likelihood estimation was employed. The log-likelihood function for logistic regression was used to ascertain the values most likely to generate the observed data, which can be expressed as

L (y_{j}; β) = \sum_{j = 1}^{J} {y_{j} \log [F (η (X_{j}; β)) + (1 - y_{j}) \log (F (η (X_{j}; β)))]}

(4)

Considering the univariate and bivariate variables identified by short-depth DT in Step 1, the logistic model is formulated as follows:

η (X_{j}, ν_{1}^{(j)}, ν_{2}^{(j, k)}, ν_{3}^{(j, k)}; Θ) = α_{i, 0} + \sum_{j = 1}^{J} α_{i, j} X_{j} + \sum_{j = 1}^{J} β_{i, j} ν_{1}^{(j)} + \sum_{j = 1}^{J - 1} \sum_{k = j + 1}^{J} γ_{i}^{(j, k)} ν_{2}^{(j, k)} + \sum_{j = 1}^{J - 1} \sum_{k = j + 1}^{J} δ_{i}^{(j, k)} ν_{3}^{(j, k)}

(5)

where $Θ$ = (⁠ $α_{i, 0}, α_{i, 1}, \dots, α_{i, j}, β_{i, 1}, \dots, β_{i, j}, γ_{i}^{(1, 2)}, \dots, γ_{i}^{(j, k)}, δ_{i}^{(1, 2)}, \dots, δ_{i}^{(j, k)}$ represents the set of coefficients. These coefficients are estimated using the log-likelihood function defined as

L (X_{j}, ν_{1}^{(j)}, ν_{2}^{(j, k)}, ν_{3}^{(j, k)}; Θ) = \sum_{j = 1}^{J} {y_{j} \log [F (η (X_{j}, ν_{1}^{(j)}, ν_{2}^{(j, k)}, ν_{3}^{(j, k)}; Θ)) + (1 - y_{j}) \log (F (η (X_{j}, ν_{1}^{(j)}, ν_{2}^{(j, k)}, ν_{3}^{(j, k)}; Θ)))]}

(6)

The coefficients within the set $Θ$ are determined by optimising the log-likelihood function $L (X_{j}, ν_{1}^{(j)}, ν_{2}^{(j, k)}, ν_{3}^{(j, k)}; Θ)$ ⁠. The model may incorporate up to 117 variables, potentially leading to overfitting. Regularisation, which adds a negative penalty to the log-likelihood function to limit the parameter magnitude, is expressed as

L (X_{j}, ν_{1}^{(j)}, ν_{2}^{(j, k)}, ν_{3}^{(j, k)}; Θ) = L (X_{j}, ν_{1}^{(j)}, ν_{2}^{(j, k)}, ν_{3}^{(j, k)}; Θ) - λ P (Θ)

(7)

where the $P (Θ)$ represents the penalty function, and $λ$ denotes the factor that scales the intensity of the regularisation. If $λ$ is set too high, the model risks underfitting, becoming unable to capture the complexity of the data. Conversely, a too low $λ$ might not sufficiently penalise the complexity of the model, leading to overfitting. Thus, this study employs a grid search with cross-validation, exploring a range of values {0.1,1,3,5}, assessing performance by accuracy. The optimal $λ$ is 1, and it is subsequently applied across all models.

In this study, we experimented with a variety of penalty terms, including the ridge penalty (Hoerl and Kennard, 1970), the Lasso penalty (Tibshirani, 1996), and the elastic net penalty (Zou and Hastie, 2005). The Lasso penalty, or L1, which yielded the highest accuracy and Area Under the Curve (AUC), was selected for this study. It is mathematically represented as the sum of the absolute values of the coefficients and is expressed as follows:

P (Θ) = \sum_{j = 1}^{J} | α_{i, j} | + \sum_{j = 1}^{J} | β_{i, j} | + \sum_{j = 1}^{J - 1} \sum_{k = j + 1}^{J} | γ_{i}^{(j, k)} | + \sum_{j = 1}^{J - 1} \sum_{k = j + 1}^{J} | δ_{i}^{(j, k)} |

(8)

LASSO regularisation produces sparse models, where some coefficients can become exactly zero, aiding variable selection in high-dimensional datasets by reducing the total number of coefficients and eliminating features from the model.

4.3 Performance evaluation

Performance evaluation primarily focuses on two widely recognised measures: Accuracy and AUC. The accuracy offers a clear overview of a model’s overall predictive capability.

A c c u r a c y = \frac{N u m b e r o f a c c u r a t e p r e d i c t i o n s}{T o t a l p r e d i c t i o n s}

(9)

AUC offers a detailed understanding of the model’s ability to differentiate between classes. It is derived by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) across a range of cutoff points, providing a value between (no discriminative power) and 1 (ideal discrimination). The AUC is crucial because it provides an extensive performance overview across different cutoffs. The definitions of the TPR and FPR are as follows:

T P R = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(10)

F P R = \frac{F a l s e P o s i t i v e s}{F a l s e P o s i t i v e s + T r u e N e g a t i v e s}

(11)

An AUC score above 0.7 indicates reasonably good performance (El Khouli et al., 2009). While accuracy measures correct classifications, the AUC offers a deeper understanding of the capacity to distinguish between classes. This dual approach ensures a holistic assessment of the efficacy of the model.

5. Results and discussion

Descriptive statistics for companies in their final year before the IPO are presented in Table A4 [1]. Financial performance varies significantly, with most data points skewed higher within the interquartile range. The correlation matrix in Table A5 [1] indicates no multicollinearity concerns because no variable pair exceeds the 0.7 threshold (Yoo et al., 2014). The dataset was split into training (85%) and testing (15%) portions to enable model calibration and evaluate real-world predictive capabilities. The predictions span one to three years post-IPO to evaluate the long-term predictive power.

5.1 Results of logistic regression models

Logistic regression serves as the methodological backbone for examining the financial datasets undertaken through Minitab 21 using a standardisation technique to neutralise scale disparities. The model considers up to first-order variable interactions, is refined through backward elimination, and excludes terms with a p-value exceeding 0.10. The performances of the logistic regression models are summarised in Table A6 [1].

The accuracy of the test set 55.55%, with an AUC score of 0.6344. Although the AUC was below the 0.7 benchmark, the model’s clarity was considered valuable. To improve the performance, a hybrid logistic and DT approach was employed, combining regularisation and tree-based learning to enhance the prediction accuracy and interpretability by capturing complex interactions and nonlinear effects.

5.2 Results of the proposed hybrid logistic and DT approach

The analysis was conducted using Python 3 and the Scikit-Learn library, in which a k-fold cross-validation technique was implemented during the training phase. The training set was split into five folds to ensure thorough performance assessment and data representativeness. Hyperparameters limit the tree depth to two and the leaf nodes to a maximum of three for the DT to align with the framework illustrated in Figure A4 ^[1] and a regularisation parameter λ of 1 for logistic regression. Table A7 ^[1] summarises the model’s performance over the three years post-IPO.

In the first year post-IPO, the model achieved 71.43% accuracy, with an AUC of 0.7415. In the second year, the accuracy decreased slightly to 66.67% with AUC of 0.7755. The third year showed a marginal drop in accuracy to 61.90% and an AUC of 0.7141. Over three years, the average accuracy remained at 66.67%, with an AUC of 0.7437, demonstrating consistent performance. The AUC scores exceed the 0.7 threshold, suggesting a moderate level of predictive accuracy (El Khouli et al., 2009). Table A8 ^[1] presents the coefficients from the first-year model. They quantify the impact of each determinant on the likelihood of achieving each financial performance class.

The logistic regression model provides insights into the magnitude and direction of the effects of each variable. Positive coefficients suggest an increased likelihood of achieving a specified performance class, whereas negative coefficients imply a decreased likelihood. The simplicity of the logistic regression equation makes it easier for stakeholders with different expertise levels to understand the implications of the findings.

The threshold effects of DT were incorporated into the logistic regression model using univariate and bivariate binary variables. For instance, a company with a tangibility of 0.4 and a retention ratio of 2.0 has a univariate variable set to 1 for a retention ratio ≤2.44. This increases its likelihood of falling into the high-performance class. Similarly, a bivariate variable set to 1 for tangibility >0.36 and a retention ratio ≤2.44 increases the probability of being in the high or medium classes while decreasing the chance of being in the low class. This hybrid approach elucidates threshold effects, offering stakeholders a clear and actionable understanding of financial performance that can be extended to subsequent years.

5.3 Variable importance

Variable importance was elucidated through SHAP values, which measure the effect of each variable on the model’s output based on game theory. This approach calculates the mean impact of a variable across all potential combinations by computing the differences in predictions with and without the variable to ensure local accuracy and global uniformity (Lundberg and Lee, 2017). The analysis highlights the variables with the top five SHAP values for each transitional year in Tables A9–A11 ^[1]. The coefficients and the SHAP values provide a comprehensive view of each variable’s direction, magnitude, and relative importance.

Managing operational efficiency and capital structure is critical in the first year after an IPO. While fixed assets are important, underutilisation impairs revenue growth, underscoring the need for effective asset management (Dambra et al., 2015). Companies that fail to leverage their tangible assets experience slower profitability early, which affects ROA. A balanced retention ratio and strong tangible assets support reinvestment, reduce overcapitalisation, and improve ROE (Pagano et al., 1998). Higher operational costs are typically observed during the initial stages. However, maintaining cost control remains crucial during this volatile period (Dambra et al., 2015). Smaller companies with high liquidity achieve better ROE, but high leverage coupled with low NPM indicates financial vulnerability (Billett et al., 2006).

In the second year, various variables gained prominence. Excessive liquidity hinders ROA and ROS because idle capital fails to generate returns, which is consistent with the findings of Ritter (1991). Inefficient asset utilisation also indicates investor risk. Effectively using IPO proceeds enhances financial flexibility for expansion and strategic initiatives, boosting ROE and ROS (Ritter, 1991). By the third year, the focus shifted to solidify operational efficiency and reinvestment—a balanced COGS with reinvestment drives ROE growth (Shafiquea et al., 2021). High NPM and liquidity indicate successful revenue conversion although excess liquidity suggests inefficiency. Low NPM and poor utilisation of IPOs limit growth potential.

This study underscores the evolving importance of financial variables over three years. Companies must adhere to key financial metric thresholds to achieve optimal performance. For univariate factors, such as liquidity, a delicate balance is essential and sufficient to ensure security without excessive idle capital (Pagano et al., 1998; Ritter, 1991). A balance that optimises costs without hindering reinvestment is essential for bivariate interactions, such as COGS and the retention ratio. These thresholds serve as benchmarks for operational excellence and financial prudence, thereby helping companies improve their performance and attract investors.

5.4 Result comparison

The proposed hybrid logistic and DT approach was also benchmarked against logistic regression and other established ML techniques, including DT, RF, XGBoost, and SVM with radial kernels, to evaluate its effectiveness in financial performance prediction via the average performance of the three-year post-IPO period, as detailed in Table 1.

Table 1

Result comparison

Method	Test set
Method	Accuracy	AUC
Logistic regression	0.5555	0.6344
Decision trees	0.5929	0.6996
Random forest	0.6667	0.7349
eXtreme gradient boosting	0.6508	0.7426
Support vector machines (radial kernel)	0.5238	0.6270
Hybrid logistic regression and DT (proposed approach)	0.6667	0.7437

Method	Test set
Method	Accuracy	AUC
Logistic regression	0.5555	0.6344
Decision trees	0.5929	0.6996
Random forest	0.6667	0.7349
eXtreme gradient boosting	0.6508	0.7426
Support vector machines (radial kernel)	0.5238	0.6270
Hybrid logistic regression and DT (proposed approach)	0.6667	0.7437

The proposed approach matches RF in the test set accuracy at 66.67% but surpasses it with a superior AUC of 0.7437, outperforming other methods. A key advantage of the hybrid model is its interpretability, which is achieved by integrating logistic regression and shallow-depth decision trees. This setup captures threshold effects and provides clear and actionable insights. While more complex algorithms are often black-box models, even tree-based models such as RF and XGBoost, although more interpretable, require post hoc techniques to explain variable importance and individual effects. However, these explanations do not directly highlight critical thresholds, as in the proposed approach. The transparency of the hybrid model makes it ideal for understanding how determinants impact post-IPO performance.

5.5 Practical implications

This study applies a hybrid logistic regression and DT approach to predict the financial performance of companies that have recently gone public in Thailand. It emphasises the use of threshold values and conducts an in-depth analysis to understand the significant cutoff and interplay of variables. The practical implications are broad and multifaceted, offering substantial benefits for strategic decision-making.

For companies:

(1)
This model enables the strategic allocation of resources in areas most significantly affecting financial performance each year.
(2)
Identifying threshold effects, such as maintaining liquidity above 0.83 but below 2.29 (as seen in Tables A9-A11 ^[1]), allows companies to adjust strategies for optimal performance.
(3)
Capturing nonlinear relationships between variables such as tangibility and asset utilisation helps optimise operations and align strategies with complex interactions.

For investor:

(1)
This model enables investors to assess the potential performance of newly listed companies with greater precision, allowing more informed investment decisions.
(2)
Focusing on companies with strong liquidity, high asset utilisation, low COGS, and adequate IPO proceeds can be a good strategy, especially in the first year post-IPO.
(3)
Investors can identify companies that manage key variables within optimal ranges to make better investment decisions.

Overall, this study offers a framework for analysing post-IPO financial performance and provides actionable insights for strategic planning and investment decisions. Emphasis on threshold values and the interaction of variables enhances the ability of companies and investors to make informed decisions in a complex financial landscape.

6. Conclusions

This study enhances the fields of finance and ML by introducing a novel hybrid approach that combines logistic regression with a DT algorithm to predict the multiclass post-IPO financial performance of companies entering the SET. Two key internal capability determinants — financial robustness and operational efficacy — as well as IPO proceeds, were analysed to understand their impact during the initial three years post-IPO. The proposed approach demonstrates superior performance, outperforming traditional logistic regression and other ML algorithms.

Combining logistic regression with the DT algorithm captured complex nonlinear relationships, with Lasso regularisation improving variable selection and managing overfitting. Interpretability is a significant advantage, especially in the financial context, because it clarifies financial drivers. Univariate and bivariate effects allowed a precise understanding of the significance thresholds and variable impacts. SHAP values revealed the influence of various determinants over the three years post-IPO: efficient asset management and cost control in the first year, balanced liquidity and high asset utilisation in the second year, and the impact of operational efficiency in the third year. This model provides deep insights into companies’ post-IPO financial performance and facilitates strategic decision-making aligned with evolving thresholds.

This study has several limitations that open opportunities for future research. The model may not directly apply to markets with different economic and regulatory contexts. However, the proposed approach can be applied to other markets and IPO concepts, such as IPO underpricing prediction. Additionally, the financial measures used (ROA, ROE, and ROS) may not capture all aspects of a company’s financial health or market value. Future research could benefit from incorporating industry-specific variables and external economic factors, such as market dynamics, global economic trends, and political influences, which could yield intriguing insights. To inspire broader research applications in finance and economics studies, further exploration of cutting-edge deep-learning techniques could expand the model’s capabilities. Long Short-Term Memory Networks could be valuable if additional pre-IPO time-series data become available to capture the trends leading up to the IPO. Additionally, Convolutional Neural Networks and Graph Neural Networks can be employed to analyse unstructured or relational data, such as textual information in IPO filings or networked relationships between firms and investors. These methods can delve deeply into IPO application materials, reveal diverse information sources, and uncover complex patterns influencing IPO outcomes. Finally, extending the model to analyse long-term performance beyond the initial three years may offer insights into sustained financial trajectories, with periodic updates ensuring relevance to evolving market conditions.

This work was supported by the Sirindhorn International Institute of Technology (SIIT), Thammasat University; Japan Advanced Institute of Science and Technology (JAIST), Japan; and National Science and Technology Development Agency (NSTDA), Thailand. In addition, the work was also supported by Thammasat University Research Fund [Grant numbers TUFT 33/2566].

Notes

1.

Please see it on the Online Appendix.

Declaration of interest: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Al Farooque

,

O.

,

Buachoom

,

W.

and

Sun

,

L.

(

2020

), “

Board, audit committee, ownership and financial performance–emerging trends from Thailand

”,

Pacific Accounting Review

, Vol.

32

No.

1

, pp.

54

-

81

, doi:

https://doi.org/10.1108/par-10-2018-0079

.

Google Scholar

Crossref

Appiahene

,

P.

,

Missah

,

Y.M.

and

Najim

,

U.

(

2020

), “

Predicting bank operational efficiency using machine learning algorithm: comparative study of decision tree, random forest, and neural networks

”,

Advances in Fuzzy Systems

, Vol.

2020

, pp.

1

-

12

, doi:

https://doi.org/10.1155/2020/8581202

.

Google Scholar

Crossref

Babalola

,

Y.A.

(

2013

), “

The effect of firm size on firms profitability in Nigeria

”,

Journal of Economics and Sustainable Development

, Vol.

4

No.

5

, pp.

90

-

94

.

Google Scholar

Billett

,

M.T.

,

Flannery

,

M.J.

and

Garfinkel

,

J.A.

(

2006

), “

Are bank loans special? Evidence on the post-announcement performance of bank borrowers

”,

Journal of Financial and Quantitative Analysis

, Vol.

41

No.

4

, pp.

733

-

751

, doi:

https://doi.org/10.1017/s0022109000002623

.

Google Scholar

Crossref

Breiman

,

L.

,

Friedman

,

J.H.

,

Olshen

,

R.

and

Stone

,

C.J.

(

1984

),

Classification and Regression Trees

, (1st ed.) ,

Routledge

.

Google Scholar

Chiadamrong

,

N.

and

Wattanawarangkoon

,

T.

(

2023

), “

Financial performance prediction model based on firms' internal capability determinants: evidence from listed firms in Thailand during the transition period of going public

”,

Cogent Engineering

, Vol.

10

No.

1

, 2216860, doi:

https://doi.org/10.1080/23311916.2023.2216860

.

Google Scholar

Dambra

,

M.

,

Field

,

L.C.

and

Gustafson

,

M.T.

(

2015

), “

The JOBS Act and IPO volume: evidence that disclosure costs affect the IPO decision

”,

Journal of Financial Economics

, Vol.

116

No.

1

, pp.

121

-

143

, doi:

https://doi.org/10.1016/j.jfineco.2014.11.012

.

Google Scholar

Crossref

,

D.

,

Kuzey

,

C.

and

Uyar

,

A.

(

2013

), “

Measuring firm performance using financial ratios: a decision tree approach

”,

Expert Systems with Applications

, Vol.

40

No.

10

, pp.

3970

-

3983

, doi:

https://doi.org/10.1016/j.eswa.2013.01.012

.

Google Scholar

Crossref

Dumitrescu

,

E.

,

Hué

,

S.

,

Hurlin

,

C.

and

Tokpavi

,

S.

(

2022

), “

Machine learning for credit scoring: improving logistic regression with non-linear decision-tree effects

”,

European Journal of Operational Research

, Vol.

297

No.

3

, pp.

1178

-

1192

, doi:

https://doi.org/10.1016/j.ejor.2021.06.053

.

Google Scholar

Crossref

El Khouli

,

R.H.

,

Macura

,

K.J.

,

Barker

,

P.B.

,

Habba

,

M.R.

,

Jacobs

,

M.A.

and

Bluemke

,

D.A.

(

2009

), “

Relationship of temporal resolution to diagnostic performance for dynamic contrast enhanced MRI of the breast

”,

Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine

, Vol.

30

No.

5

, pp.

999

-

1004

, doi:

https://doi.org/10.1002/jmri.21947

.

Google Scholar

Crossref

Engelen

,

P.-J.

,

Heugens

,

P.

,

Van Essen

,

M.

,

Turturea

,

R.

and

Bailey

,

N.

(

2020

), “

The impact of stakeholders' temporal orientaton on short-and long-term IPO outcomes: a meta-analysis

”,

Long Range Planning

, Vol.

53

No.

2

, 101853, doi:

https://doi.org/10.1016/j.lrp.2018.10.003

.

Google Scholar

Fairfield

,

P.M.

and

Yohn

,

T.L.

(

2001

), “

Using asset turnover and profit margin to forecast changes in profitability

”,

Review of Accounting Studies

, Vol.

6

No.

4

, pp.

371

-

385

, doi:

https://doi.org/10.1023/a:1012430513430

.

Google Scholar

Crossref

Friedman

,

J.H.

(

2001

), “

Greedy function approximation: a gradient boosting machine

”,

Annals of Statistics

, Vol.

29

No.

5

, pp.

1189

-

1232

, doi:

https://doi.org/10.1214/aos/1013203451

.

Google Scholar

Crossref

Hensler

,

D.A.

,

Rutherford

,

R.C.

and

Springer

,

T.M.

(

1997

), “

The survival of initial public offerings in the aftermarket

”,

Journal of Financial Research

, Vol.

20

No.

1

, pp.

93

-

110

, doi:

https://doi.org/10.1111/j.1475-6803.1997.tb00238.x

.

Google Scholar

Crossref

Hoang

,

T.V.H.

,

Dang

,

N.H.

,

Tran

,

M.D.

,

Van Vu

,

T.T.

and

Pham

,

Q.T.

(

2019

), “

Determinants influencing financial performance of listed firms: quantile regression approach

”,

Asian Economic and Financial Review

, Vol.

9

No.

1

, pp.

78

-

90

, doi:

https://doi.org/10.18488/journal.aefr.2019.91.78.90

.

Google Scholar

Crossref

Hoerl

,

A.E.

and

Kennard

,

R.W.

(

1970

), “

Ridge regression: biased estimation for nonorthogonal problems

”,

Technometrics

, Vol.

12

No.

1

, pp.

55

-

67

, doi:

https://doi.org/10.2307/1267351

.

Google Scholar

Crossref

İltaş

,

Y.

and

Demirgüneş

,

K.

(

2020

), “

Asset tangibility and financial performance: a time series evidence

”,

Ahi Evran Üniversitesi Sosyal Bilimler Enstitüsü Dergisi

, Vol.

6

No.

2

, pp.

345

-

364

, doi:

https://doi.org/10.31592/aeusbed.731079

.

Google Scholar

Crossref

Kayakus

,

M.

,

Tutcu

,

B.

,

Terzioglu

,

M.

,

Talaş

,

H.

and

Ünal Uyar

,

G.F.

(

2023

), “

ROA and ROE forecasting in iron and steel industry using machine learning techniques for sustainable profitability

”,

Sustainability

, Vol.

15

No.

9

, p.

7389

, doi:

https://doi.org/10.3390/su15097389

.

Google Scholar

Crossref

Lakkaraju

,

H.

,

Kamar

,

E.

,

Caruana

,

R.

and

Leskovec

,

J.

(

2017

), “

Interpretable & explorable approximations of black box models

”,

arXiv preprint arXiv:1707.01154

.

Google Scholar

Larrain

,

B.

,

Phillips

,

G.M.

,

Sertsios

,

G.

and

Urzúa

,

F.

(

2021

), “

The effects of going public on firm performance and commercialization strategy: evidence from international IPOs

”,

National Bureau of Economic Research

.

Google Scholar

Lundberg

,

S.M.

and

Lee

,

S.-I.

(

2017

), “

Advances in Neural Information Processing Systems 30: 31st Annual Conference on Neural Information Processing Systems (NIPS 2017)

”, in

Guyon

,

I.

,

Luxburg

,

U.

,

Bengio

,

S.

,

Wallach

,

H.M.

,

Fergus

,

R.

,

Vishwanathan

,

S.V.N.

and

Garnett

,

R.

(Eds),

Advances in Neural Information Processing Systems

,

NeurIPS Proceedings

, Vol.

30

, pp.

4768

-

4777

.

Google Scholar

Majumdar

,

S.K.

(

1997

), “

The impact of size and age on firm-level performance: some evidence from India

”,

Review of Industrial Organization

, Vol.

12

No.

2

, pp.

231

-

241

, doi:

https://doi.org/10.1023/a:1007766324749

.

Google Scholar

Crossref

Munshi

,

M.

,

Patel

,

M.

,

Alqahtani

,

F.

,

Tolba

,

A.

,

Gupta

,

R.

,

Jadav

,

N.K.

,

Tanwar

,

S.

,

Neagu

,

B.C.

and

Dragomir

,

A.

(

2022

), “

Artificial intelligence and exploratory-data-analysis-based initial public offering gain prediction for public investors

”,

Sustainability

, Vol.

4

No.

20

, 13406, soi:

https://doi.org/10.3390/su142013406

.

Google Scholar

Nguyen

,

T.T.H.

,

Phan

,

G.Q.

,

Wong

,

W.-K.

and

Moslehpour

,

M.

(

2022

), “

The influence of market power on liquidity creation of commercial banks in Vietnam

”,

Journal of Asian Business and Economic Studies

, Vol.

30

No.

3

, pp.

166

-

168

, doi:

https://doi.org/10.1108/jabes-06-2021-0076

.

Google Scholar

Crossref

Ong

,

C.Z.

,

Mohd-Rashid

,

R.

,

Anwar

,

A.

and

Mehmood

,

W.

(

2023

), “

Earnings forecast disclosures and oversubscription rates of fixed-price initial public offerings (IPOs): the case of Malaysia

”,

Journal of Asian Business and Economic Studies

, Vol.

30

No.

4

, pp.

270

-

282

, doi:

https://doi.org/10.1108/jabes-03-2022-0065

.

Google Scholar

Crossref

Pagano

,

M.

,

Panetta

,

F.

and

Zingales

,

L.

(

1998

), “

Why do companies go public? An empirical analysis

”,

The Journal of Finance

, Vol.

53

No.

1

, pp.

27

-

64

, doi:

https://doi.org/10.1111/0022-1082.25448

.

Google Scholar

Crossref

Pechlivanidis

,

E.

,

Ginoglou

,

D.

and

Barmpoutis

,

P.

(

2022

), “

Can intangible assets predict future performance? A deep learning approach

”,

International Journal of Accounting and Information Management

, Vol.

30

No.

1

, pp.

61

-

72

, doi:

https://doi.org/10.1108/ijaim-06-2021-0124

.

Google Scholar

Crossref

Raut

,

R.K.

,

Kumar

,

R.

and

Das

,

N.

(

2020

), “

Individual investors' intention towards Sri in India: an implementation of the theory of reasoned action

”,

Social Responsibility Journal

, Vol.

17

No.

7

, pp.

877

-

896

, doi:

https://doi.org/10.1108/srj-02-2018-0052

.

Google Scholar

Crossref

Ribeiro

,

M.T.

,

Singh

,

S.

and

Guestrin

,

C.

(

2016

), “

Why should i trust you?’ Explaining the predictions of any classifier

”,

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

, pp.

1135

-

1144

, doi:

https://doi.org/10.1145/2939672.2939778

.

Google Scholar

Crossref

Ritter

,

J.R.

(

1991

), “

The long‐run performance of initial public offerings

”,

The Journal of Finance

, Vol.

46

No.

1

, pp.

3

-

27

, doi:

https://doi.org/10.2307/2328687

.

Google Scholar

Crossref

Šarlija

,

N.

,

Bilandžić

,

A.

and

Stanic

,

M.

(

2017

), “

Logistic regression modelling: procedures and pitfalls in developing and interpreting prediction models

”,

Croatian Operational Research Review

, Vol.

8

No.

2

, pp.

631

-

652

, doi:

https://doi.org/10.17535/crorr.2017.0041

.

Google Scholar

Crossref

Shafiquea

,

A.

,

Kashifb

,

A.R.

,

Haiderc

,

A.

,

Zaheerd

,

N.

and

Khan

,

S.

(

2021

), “

Effect of asset utilisation and corporate growth on financial performance

”,

International Journal of Innovation, Creativity and Change

, Vol.

15

, pp.

1104

-

1118

.

Google Scholar

Supsermpol

,

P.

,

Thajchayapong

,

S.

and

Chiadamrong

,

N.

and

others

(

2023

), “

Predicting financial performance for listed companies in Thailand during the transition period: a class-based approach using logistic regression and random forest algorithm

”,

Journal of Open Innovation: Technology, Market, and Complexity

, Vol.

9

No.

3

, 100130, doi:

https://doi.org/10.1016/j.joitmc.2023.100130

.

Google Scholar

Tibshirani

,

R.

(

1996

), “

Regression shrinkage and selection via the lasso

”,

Journal of the Royal Statistical Society – Series B: Statistical Methodology

, Vol.

58

No.

1

, pp.

267

-

288

, doi:

https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

.

Google Scholar

Crossref

Yoo

,

W.

,

Mayberry

,

R.

,

Bae

,

S.

,

Singh

,

K.

,

He

,

Q.P.

and

Lillard

,

J.W.

(

2014

), “

A study of effects of multicollinearity in the multivariable analysis

”,

International Journal of Applied Science and Technology

, Vol.

4

No.

5

, pp.

9

-

19

.

Google Scholar

PubMed

Zou

,

H.

and

Hastie

,

T.

(

2005

), “

Regularization and variable selection via the elastic net

”,

Journal of the Royal Statistical Society – Series B: Statistical Methodology

, Vol.

67

No.

2

, pp.

301

-

320

, doi:

https://doi.org/10.1111/j.1467-9868.2005.00503.x

.

Google Scholar

Crossref

Supplementary material

The supplementary material for this article can be found online.

2025

Pornpawee Supsermpol, Van Nam Huynh, Suttipong Thajchayapong, Nathridee Suppakitjarak and Navee Chiadamrong

Published in Journal of Asian Business and Economic Studies. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

Predicting post-IPO financial performance: a hybrid approach using logistic regression and decision trees

1. Introduction

2. Literature review

3. Research formulation

3.1 Independent variables

3.2 Data

4. Proposed method

4.1 Step 1: identify threshold effects from short-depth DT

4.2 Step 2: include the univariate and bivariate variables into the penalised logistic regression model

4.3 Performance evaluation

5. Results and discussion

5.1 Results of logistic regression models

5.2 Results of the proposed hybrid logistic and DT approach

5.3 Variable importance

5.4 Result comparison

5.5 Practical implications

6. Conclusions

Notes

References

Supplementary material

Supplementary data

Email Alerts

Cited By

Predicting post-IPO financial performance: a hybrid approach using logistic regression and decision trees

1. Introduction

2. Literature review

3. Research formulation

3.1 Independent variables

3.2 Data

4. Proposed method

4.1 Step 1: identify threshold effects from short-depth DT

4.2 Step 2: include the univariate and bivariate variables into the penalised logistic regression model

4.3 Performance evaluation

5. Results and discussion

5.1 Results of logistic regression models

5.2 Results of the proposed hybrid logistic and DT approach

5.3 Variable importance

5.4 Result comparison

5.5 Practical implications

6. Conclusions

Notes

References

Supplementary material

Supplementary data

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable