Delays in water construction projects trigger severe financial losses and societal setbacks. This study pioneers a cutting-edge stacking ensemble machine learning model to predict delay severity with unprecedented precision, empowering project managers to mitigate risks and drive sustainable infrastructure development.
Leveraging a robust literature review and 439 real water project contracts, five critical features – project duration, cost, climate zone, change costs, and adjustment costs – were meticulously selected. Data underwent rigorous preprocessing (standardization, Elliptic Envelope outlier detection) using scikit-learn. Four base learners (ANN, Decision Tree, Random Forest, KNN) were optimized via grid search, integrated into a stacking model with Random Forest as the meta-learner, and validated through repeated stratified 5-fold cross-validation.
The stacking model achieves remarkable performance (Accuracy: 0.957, F1-score: 0.957, Kappa: 0.935), outperforming individual algorithms by up to 5.5% and surpassing prior benchmarks. It excels in critical delay classes (4.4% error for 30–60%), enabling precise risk prediction and resource optimization.
This study revolutionizes delay forecasting by applying stacking ensemble learning to water projects for the first time, using real contract data to eliminate bias and overfitting. It delivers a transformative framework for proactive planning, cost-efficient buffering, and resilient project delivery, redefining construction management.
