The variance between the winning bid and the owner's estimated cost (OEC) is one of the construction management risks in the pre-tendering phase. The study aims to enhance the quality of the owner's estimation for predicting precisely the contract cost at the pre-tendering phase and avoiding future issues that arise through the construction phase.
This paper integrated artificial neural networks (ANN), deep neural networks (DNN) and time series (TS) techniques to estimate the ratio of a low bid to the OEC (R) for different size contracts and three types of contracts (building, electric and mechanic) accurately based on 94 contracts from King Saud University. The ANN and DNN models were evaluated using mean absolute percentage error (MAPE), mean sum square error (MSSE) and root mean sums square error (RMSSE).
The main finding is that the ANN provides high accuracy with MAPE, MSSE and RMSSE a 2.94%, 0.0015 and 0.039, respectively. The DNN's precision was high, with an RMSSE of 0.15 on average.
The owner and consultant are expected to use the study's findings to create more accuracy of the owner's estimate and decrease the difference between the owner's estimate and the lowest submitted offer for better decision-making.
This study fills the knowledge gap by developing an ANN model to handle missing TS data and forecasting the difference between a low bid and an OEC at the pre-tendering phase.
1. Introduction
Compared to other industries, the construction sector has one of the highest annual business failure rates associated with adverse effects (Chapman, 2001; Mahamid, 2018). Construction management needs to address many uncertainties and risks. The difference between the winning bid and the owner's estimated cost (OEC) is one of these risks through a pre-tendering phase. The winning bid is often the one with the lowest bid submitted to a tender. In the United States, the low bid method is widely utilized to award construction contracts at the price of the lowest responding bid (Gransberg and Gransberg, 2020). A significant variation between the lowest bid (winning bid) and the OEC is problematic for both parties. Such variances may negatively affect contract delays or cancellations, scope reductions and public distrust (Baek et al., 2019). For example, a high bid much higher than the OEC is likely to result in problems with budget allocation, which might delay or terminate the project, while a low bid significantly lower than the OEC could result in cost overruns (Li et al., 2022; WSDOT, 2011). Cost estimation is also crucial for initiating the project and dramatically impacts performance. Therefore, it is essential to provide full justification for cost variation to prevent the misuse of public funds and ensure the greatest possible economic outcome for all stakeholders (Carr, 2005).
The Federal Highway Agency (FHWA 2004) recommended that the difference ratio between the low bid and OEC be within ±10% (Li et al., 2022). Moreover, the California Department of Transportation (Caltrans) devised a performance metric comparing the OEC to low bids (i.e. low bids within 10% of the OEC) to assess the precision of cost estimation (Caltrans, Planning Cost Estimate, 2006). However, several agencies have needed support achieving and maintaining an acceptable range of low bids to OEC for their highway projects (Li et al., 2021).
Only some studies (Li et al., 2022) attempted to improve the quality of the OEC and enhance estimation processing by performing model forecasting of the ratio of a low bid to OEC (R) using time series (TS) techniques. However, this model was only used for highway construction, and it needed to consider the amount of the contract cost and different contract types.
For this paper, the R data for 94 contracts was collected from 2010 to 2021 for the building, electric and mechanic contracts. However, there is a significant shortage of information on the TS of the R for the different OECs of the three types of contracts. This shortage causes the deep neural networks (DNN) to be unable to predict future TS using long-short-term memory (LSTM), which has merit in dealing with the non-stationary and nonlinear characteristics of a TS (Bala and Singh, 2019). The integrated artificial neural networks (ANN) model is developed to address this lack of information issues. The techniques of maximizing data (Zayed, 2001) and improving accuracy (Pasini, 2015) were integrated into the developed ANN model. The input layer of the developed ANN model has a specific pattern or configuration to allow the generation of the TS for different amounts of OEC for the three contract types from 2010 to 2021. The improvement of input layer methodology has not been studied before and contributes to this paper. The DNN model was used to forecast the future TS using this generated data. The configuration that followed in the input layer with the techniques used in the developed ANN model will assist researchers and engineers in establishing a reasonable TS for different applications. Moreover, the paper's results will improve the accuracy of the OEC for assessing the contract cost at the pre-tendering phase and preventing future problems that develop during the construction phase (such as owner financial difficulties or cost overruns for contractors). The significance of the study is enhanced cost estimation, improved decision-making, early identification of cost deviations and optimization of bidding strategies.
2. Literature review
Several studies dealt with the difference between the low bid and OEC through the pre-tendering phase was investigated by examining and investigating the causes that increase these differences (Alsugair, 2022; Flyvbjerg et al., 2003; Saqer et al., 2020). On the other hand, several studies were considered in the forecasted model of cost deviation; regression analysis was used by Li et al. (2021) to measure the effects of influential factors on the cost deviation and identify factors impacting it. The explanatory model was developed using previously collected costs between the years 2011 and 2015 for Louisiana highway construction contracts. They stated that the level of bidding competition significantly influences cost deviation, the scope of the contract, the number of activities, the crude oil price and the value of the paving projects. Li et al. (2022) analyzed and investigated the identifying risk variables affecting the accuracy of the client's estimate for highway projects to predict the ratio of a low bid to the OEC using the TS model.
TS forecasting can be performed in various ways, commonly grouped into traditional statistical and nonlinear models. The first categories represent a linear analysis of the previous observations and include average, exponential smoothing and autoregressive integrated moving averages (ARIMA). Table 1 shows the studies of the TS for traditional statistics. The nonlinear method aims to overcome the linear limitation of TS. Khashei and Bijari (2011) summarized the methods as the bilinear model, the threshold autoregressive (TAR) model, the autoregressive conditional heteroscedastic (ARCH) model, general autoregressive conditional heteroscedastic (GARCH), chaotic dynamics and ANN. Table 2 summarizes the research of the nonlinear methods on TS. In addition, traditional TS approaches like ARIMA, SARIMA (Seasonal ARIMA) and ETS (error, trend, seasonality model) are made to manage a TS with a single seasonality; however, when several seasons occur, these techniques do not perform as well (Naim et al., 2018).
Studies dealt with the traditional method in TS
| References | Method | Application |
|---|---|---|
| Hwang (2011) | ARMA | Construction cost |
| Corrêa et al. (2016) | Auto-Regressive Integrated Moving Average with eXogenous variables and Generalized Auto-Regressive Conditional Heteroscedasticity (WARIMAX-GARCH) | Information technology |
| Zhao et al. (2020) | Casual method + Seasonal ARIMA (SARIMA) | Building cost index |
| Zhao et al. (2019) | Exponential smoothing models (ESM) + SARIMA | Building cost index |
| Rubio et al. (2016) | Fuzzy Time Series (FTS) | Economic applications |
| Naim et al. (2018) | BATS (Exponential smoothing state space with Box-Cox transformation, ARMA errors, Trend and Seasonal components) + TBATS (Trigonometric Exponential smoothing state space with Box-Cox transformation, ARMA errors, Trend and Seasonal components) | Natural gas consumption |
| References | Method | Application |
|---|---|---|
| ARMA | Construction cost | |
| Auto-Regressive Integrated Moving Average with eXogenous variables and Generalized Auto-Regressive Conditional Heteroscedasticity (WARIMAX-GARCH) | Information technology | |
| Casual method + Seasonal ARIMA (SARIMA) | Building cost index | |
| Exponential smoothing models (ESM) + SARIMA | Building cost index | |
| Fuzzy Time Series (FTS) | Economic applications | |
| BATS (Exponential smoothing state space with Box-Cox transformation, ARMA errors, Trend and Seasonal components) + TBATS (Trigonometric Exponential smoothing state space with Box-Cox transformation, ARMA errors, Trend and Seasonal components) | Natural gas consumption |
Source(s): Authors’ own work
Research that applied the nonlinear method to TS
| References | Method | Application |
|---|---|---|
| Almonacid et al. (2013) | ANN | Energy science |
| Camelo et al. (2018) | ANN | Wing generation |
| Iwana and Uchida (2021) | (DNN) | Data augmentation of TS |
| Kardakos et al. (2013) | ANN | Power generation |
| Khashei and Bijari (2011) | ARIMA + ANN | Information technology |
| Withington et al. (2021) | ANN | Expert system application |
| Pai and Lin (2005) | ARIMA + Support vector machines model (SVM) | Stock price |
| Chen and Wang (2007) | SARIMA + SVM | Industry application |
| Khashei et al. (2009) | ANN + ARIMA + Fuzzy logic | Information technology |
| References | Method | Application |
|---|---|---|
| ANN | Energy science | |
| ANN | Wing generation | |
| (DNN) | Data augmentation of TS | |
| ANN | Power generation | |
| ARIMA + ANN | Information technology | |
| ANN | Expert system application | |
| ARIMA + Support vector machines model (SVM) | Stock price | |
| SARIMA + SVM | Industry application | |
| ANN + ARIMA + Fuzzy logic | Information technology |
Source(s): Authors’ own work
Rashid and Louis (2019) illustrated the merits of using DNN. They stated that recent developments in DNN, specifically recurrent neural networks (RNN), present new opportunities to classify sequential TS data with recurrent lateral connections. In addition, Bala and Singh (2019) stated that a TS's non-stationary and nonlinear characteristics could be learned by a LSTM network, reducing predicting error. The LSTM is utilized as one layer in DNN. Therefore, the DNN was utilized to forecast TS.
TS often suffer from missing data and are thus difficult to use in forecasting. There are conventional and modern techniques to deal with the missing data. Table 3 shows these methods. However, the methods may not handle a relative amount of missing data, affecting the analysis's accuracy. In addition, some of the methods, such as DNN, require big data for missing data treatment.
Handling techniques of time series missing data
| Reference | Method |
|---|---|
| Conventional techniques | |
| Andridge and Little (2010) | Hot and Cold Deck Imputation |
| Strike et al. (2001), Dhevi (2014) | Mean Imputation |
| Little and Rubin (2019) | Multiple Imputation |
| Modern techniques | |
| Lobato et al. (2015), Aydilek et al. (2013), Azadeh et al. (2013) | Genetic Algorithm Optimization Based |
| Wu et al. (2015) | Support Vector Machine |
| Shao et al. (2014) | Interpolation |
| Banbura et al. (2014) | Maximum Likelihood |
| Amiri et al. (2016) | Fuzzy-Rough Set |
| Sitaram et al. (2015) | Similarity Measure |
| Zhang et al. (2022) | Bayesian Dynamic Regression |
| Torres et al. (2021) | DNN |
| Reference | Method |
|---|---|
| Conventional techniques | |
| Hot and Cold Deck Imputation | |
| Mean Imputation | |
| Multiple Imputation | |
| Modern techniques | |
| Genetic Algorithm Optimization Based | |
| Support Vector Machine | |
| Interpolation | |
| Maximum Likelihood | |
| Fuzzy-Rough Set | |
| Similarity Measure | |
| Bayesian Dynamic Regression | |
| DNN | |
Source(s): Authors’ own work
In this paper, the ANN was developed to deal with missing data and ensure the handling of missing data without affecting the data quality. On the other hand, the limited and insufficient availability of the required data represents a challenge in its usage of ANN. Maweu et al. (2021) stated that data scarcity and class imbalance are common occurrences in healthcare datasets and undermine the classification performance of machine learning models. The maximize data technique (Pasini, 2015) and the improved data quality method (Zayed, 2001) were used to overcome the minor data issues. However, the two techniques may not be adequate for TS data. The new contribution in this paper, the time variable, which was changed from 2010 to 2021, was decomposed into 12 variables (one for each year). These variables were changed to zero or one depending on the used data. Therefore, the ANN was utilized to compute TS from 2010 to 2021 based on the 94 collected data after implementing the maximize data innovation (Pasini, 2015) and evaluated using mean absolute percentage error (MAPE), mean sum square error (MSSE) and root mean sum square error (RMSSE). The reason ANNs are used to generate TS is that one key benefit of ANN models over other types of nonlinear models is that they are universal approximators that can estimate a broad class of functions with high accuracy. Their strength derives from the information in the data being processed in parallel (Khashei and Bijari, 2011).
3. Methodology
The methodology utilized ANN and DNN to predict a contract cost more accurately. Therefore, it mainly consists of five steps: (1) collect data to create the database; (2) implement size and normality test to ensure the collected sample represents the actual sample; (3) develop an ANN to compute additional current TS; (4) generate TS; (5) establish a DNN to forecast the future TS. The flow chart of the methodology is shown in Figure 1.
3.1 Collect data
The OEC and contract value difference is affected by many factors and causes that differ from one country to another and from one work environment to another. Therefore, limiting the study to a specific environment was necessary to facilitate the study by taking information and neutralizing some known factors (including administrative procedures) and unknown factors. Thus, KSU's (King Saud University) projects in Riyadh, KSA (Kingdom of Saudi Arabia) determined the field of study.
The cost data of cost estimation accuracy covers 94 projects completed at KSU between 2010 and 2021. The projects are classified as building, highway, electric and mechanic. The data contained the initial estimated cost, year of the award, contract amount and project type. Table 4 presents 94 project data. The initial estimated cost is difficult to acquire. The absolute cost deviation can be estimated as Eq. (1), which is shown in the sixth column.
Collected data
| No. | Project | Time | CC (M SAR) | OEC (M SAR) | R | No. | Project | Time | CC (MSAR) | OEC (M SAR) | R |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Building | 2013 | 0.207 | 0.250 | 0.830 | 48 | Building | 2019 | 0.498 | 0.499 | 0.999 |
| 2 | 2013 | 0.285 | 0.285 | 1.000 | 49 | 2018 | 0.285 | 0.290 | 0.983 | ||
| 3 | 2014 | 0.241 | 0.250 | 0.965 | 50 | 2018 | 0.297 | 0.297 | 1.000 | ||
| 4 | 2014 | 140.727 | 200.000 | 0.704 | 51 | 2017 | 0.297 | 0.297 | 1.000 | ||
| 5 | 2014 | 0.404 | 0.990 | 0.408 | 52 | 2017 | 0.285 | 0.300 | 0.950 | ||
| 6 | 2014 | 0.321 | 0.380 | 0.845 | 53 | 2018 | 0.248 | 0.239 | 1.037 | ||
| 7 | 2017 | 0.155 | 0.240 | 0.644 | 54 | 2018 | 0.230 | 0.230 | 1.000 | ||
| 8 | 2017 | 0.075 | 0.075 | 0.996 | 55 | 2017 | 0.247 | 0.450 | 0.550 | ||
| 9 | 2017 | 32.142 | 35.000 | 0.918 | 56 | 2017 | 0.469 | 0.490 | 0.957 | ||
| 10 | 2017 | 0.288 | 0.288 | 1.000 | 57 | 2018 | 0.499 | 0.499 | 1.000 | ||
| 11 | 2018 | 0.018 | 0.018 | 1.011 | 58 | 2018 | 0.478 | 0.478 | 1.000 | ||
| 12 | 2018 | 0.168 | 0.180 | 0.932 | 59 | 2018 | 0.451 | 0.451 | 1.000 | ||
| 13 | 2018 | 0.265 | 0.300 | 0.885 | 60 | Electric | 2018 | 2.849 | 3.500 | 0.814 | |
| 14 | 2018 | 0.087 | 0.610 | 0.143 | 61 | 2018 | 0.227 | 0.260 | 0.872 | ||
| 15 | 2018 | 0.480 | 0.500 | 0.960 | 62 | 2018 | 0.105 | 0.135 | 0.778 | ||
| 16 | 2018 | 0.257 | 0.260 | 0.988 | 63 | 2015 | 0.479 | 0.490 | 0.978 | ||
| 17 | 2018 | 0.690 | 0.700 | 0.986 | 64 | 2018 | 0.290 | 0.300 | 0.967 | ||
| 18 | 2018 | 0.206 | 0.207 | 0.997 | 65 | 2018 | 0.295 | 0.300 | 0.983 | ||
| 19 | 2018 | 0.180 | 0.180 | 1.000 | 66 | 2019 | 5.782 | 6.000 | 0.964 | ||
| 20 | 2019 | 0.489 | 0.490 | 0.998 | 67 | 2020 | 0.464 | 0.500 | 0.928 | ||
| 21 | 2019 | 0.491 | 0.495 | 0.993 | 68 | 2020 | 0.482 | 0.500 | 0.965 | ||
| 22 | 2019 | 0.482 | 0.485 | 0.994 | 69 | 2017 | 0.300 | 0.300 | 1.000 | ||
| 23 | 2019 | 0.490 | 0.495 | 0.989 | 70 | 2017 | 0.300 | 0.300 | 1.000 | ||
| 24 | 2019 | 11.835 | 10.000 | 1.184 | 71 | 2017 | 0.096 | 0.120 | 0.802 | ||
| 25 | 2019 | 0.232 | 0.299 | 0.776 | 72 | Mechanical | 2010 | 8.850 | 10.00 | 0.885 | |
| 26 | 2014 | 0.497 | 0.498 | 0.998 | 73 | 2011 | 21.49 | 24.00 | 0.895 | ||
| 27 | 2014 | 0.210 | 0.230 | 0.912 | 74 | 2011 | 13.96 | 10.00 | 1.396 | ||
| 28 | 2014 | 0.464 | 0.500 | 0.927 | 75 | 2018 | 0.353 | 0.353 | 1.000 | ||
| 29 | 2015 | 0.479 | 0.490 | 0.978 | 76 | 2017 | 0.260 | 0.270 | 0.961 | ||
| 30 | 2015 | 0.320 | 0.350 | 0.915 | 77 | 2017 | 0.291 | 0.299 | 0.973 | ||
| 31 | 2015 | 0.769 | 0.800 | 0.961 | 78 | 2017 | 0.260 | 0.270 | 0.961 | ||
| 32 | 2015 | 0.493 | 0.498 | 0.989 | 79 | 2017 | 0.593 | 0.593 | 1.000 | ||
| 33 | 2015 | 0.498 | 0.499 | 0.998 | 80 | 2017 | 0.259 | 0.285 | 0.910 | ||
| 34 | 2015 | 0.700 | 0.800 | 0.875 | 81 | 2017 | 0.296 | 0.298 | 0.992 | ||
| 35 | 2016 | 0.492 | 0.498 | 0.987 | 82 | 2018 | 0.072 | 0.075 | 0.955 | ||
| 36 | 2017 | 0.260 | 0.275 | 0.945 | 83 | 2018 | 0.036 | 0.040 | 0.888 | ||
| 37 | 2018 | 0.221 | 0.240 | 0.922 | 84 | 2018 | 0.593 | 0.600 | 0.989 | ||
| 38 | 2018 | 0.186 | 0.200 | 0.929 | 85 | 2018 | 0.296 | 0.300 | 0.987 | ||
| 39 | 2018 | 0.223 | 0.235 | 0.947 | 86 | 2018 | 0.042 | 0.061 | 0.683 | ||
| 40 | 2018 | 0.097 | 0.099 | 0.979 | 87 | 2018 | 0.036 | 0.038 | 0.934 | ||
| 41 | 2018 | 0.040 | 0.042 | 0.950 | 88 | 2018 | 0.090 | 0.090 | 1.000 | ||
| 42 | 2019 | 0.475 | 0.480 | 0.989 | 89 | 2018 | 0.072 | 0.120 | 0.597 | ||
| 43 | 2019 | 0.375 | 0.390 | 0.962 | 90 | 2019 | 0.229 | 0.320 | 0.717 | ||
| 44 | 2019 | 0.067 | 0.070 | 0.963 | 91 | 2019 | 0.375 | 0.375 | 1.000 | ||
| 45 | 2021 | 34.669 | 35.000 | 0.991 | 92 | 2019 | 0.229 | 0.320 | 0.717 | ||
| 46 | 2021 | 59.139 | 65.000 | 0.910 | 93 | 2018 | 0.460 | 0.490 | 0.938 | ||
| 47 | 2019 | 0.469 | 0.469 | 1.000 | 94 | 2019 | 0.116 | 0.135 | 0.853 |
| No. | Project | Time | CC (M SAR) | OEC (M SAR) | R | No. | Project | Time | CC (MSAR) | OEC (M SAR) | R |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Building | 2013 | 0.207 | 0.250 | 0.830 | 48 | Building | 2019 | 0.498 | 0.499 | 0.999 |
| 2 | 2013 | 0.285 | 0.285 | 1.000 | 49 | 2018 | 0.285 | 0.290 | 0.983 | ||
| 3 | 2014 | 0.241 | 0.250 | 0.965 | 50 | 2018 | 0.297 | 0.297 | 1.000 | ||
| 4 | 2014 | 140.727 | 200.000 | 0.704 | 51 | 2017 | 0.297 | 0.297 | 1.000 | ||
| 5 | 2014 | 0.404 | 0.990 | 0.408 | 52 | 2017 | 0.285 | 0.300 | 0.950 | ||
| 6 | 2014 | 0.321 | 0.380 | 0.845 | 53 | 2018 | 0.248 | 0.239 | 1.037 | ||
| 7 | 2017 | 0.155 | 0.240 | 0.644 | 54 | 2018 | 0.230 | 0.230 | 1.000 | ||
| 8 | 2017 | 0.075 | 0.075 | 0.996 | 55 | 2017 | 0.247 | 0.450 | 0.550 | ||
| 9 | 2017 | 32.142 | 35.000 | 0.918 | 56 | 2017 | 0.469 | 0.490 | 0.957 | ||
| 10 | 2017 | 0.288 | 0.288 | 1.000 | 57 | 2018 | 0.499 | 0.499 | 1.000 | ||
| 11 | 2018 | 0.018 | 0.018 | 1.011 | 58 | 2018 | 0.478 | 0.478 | 1.000 | ||
| 12 | 2018 | 0.168 | 0.180 | 0.932 | 59 | 2018 | 0.451 | 0.451 | 1.000 | ||
| 13 | 2018 | 0.265 | 0.300 | 0.885 | 60 | Electric | 2018 | 2.849 | 3.500 | 0.814 | |
| 14 | 2018 | 0.087 | 0.610 | 0.143 | 61 | 2018 | 0.227 | 0.260 | 0.872 | ||
| 15 | 2018 | 0.480 | 0.500 | 0.960 | 62 | 2018 | 0.105 | 0.135 | 0.778 | ||
| 16 | 2018 | 0.257 | 0.260 | 0.988 | 63 | 2015 | 0.479 | 0.490 | 0.978 | ||
| 17 | 2018 | 0.690 | 0.700 | 0.986 | 64 | 2018 | 0.290 | 0.300 | 0.967 | ||
| 18 | 2018 | 0.206 | 0.207 | 0.997 | 65 | 2018 | 0.295 | 0.300 | 0.983 | ||
| 19 | 2018 | 0.180 | 0.180 | 1.000 | 66 | 2019 | 5.782 | 6.000 | 0.964 | ||
| 20 | 2019 | 0.489 | 0.490 | 0.998 | 67 | 2020 | 0.464 | 0.500 | 0.928 | ||
| 21 | 2019 | 0.491 | 0.495 | 0.993 | 68 | 2020 | 0.482 | 0.500 | 0.965 | ||
| 22 | 2019 | 0.482 | 0.485 | 0.994 | 69 | 2017 | 0.300 | 0.300 | 1.000 | ||
| 23 | 2019 | 0.490 | 0.495 | 0.989 | 70 | 2017 | 0.300 | 0.300 | 1.000 | ||
| 24 | 2019 | 11.835 | 10.000 | 1.184 | 71 | 2017 | 0.096 | 0.120 | 0.802 | ||
| 25 | 2019 | 0.232 | 0.299 | 0.776 | 72 | Mechanical | 2010 | 8.850 | 10.00 | 0.885 | |
| 26 | 2014 | 0.497 | 0.498 | 0.998 | 73 | 2011 | 21.49 | 24.00 | 0.895 | ||
| 27 | 2014 | 0.210 | 0.230 | 0.912 | 74 | 2011 | 13.96 | 10.00 | 1.396 | ||
| 28 | 2014 | 0.464 | 0.500 | 0.927 | 75 | 2018 | 0.353 | 0.353 | 1.000 | ||
| 29 | 2015 | 0.479 | 0.490 | 0.978 | 76 | 2017 | 0.260 | 0.270 | 0.961 | ||
| 30 | 2015 | 0.320 | 0.350 | 0.915 | 77 | 2017 | 0.291 | 0.299 | 0.973 | ||
| 31 | 2015 | 0.769 | 0.800 | 0.961 | 78 | 2017 | 0.260 | 0.270 | 0.961 | ||
| 32 | 2015 | 0.493 | 0.498 | 0.989 | 79 | 2017 | 0.593 | 0.593 | 1.000 | ||
| 33 | 2015 | 0.498 | 0.499 | 0.998 | 80 | 2017 | 0.259 | 0.285 | 0.910 | ||
| 34 | 2015 | 0.700 | 0.800 | 0.875 | 81 | 2017 | 0.296 | 0.298 | 0.992 | ||
| 35 | 2016 | 0.492 | 0.498 | 0.987 | 82 | 2018 | 0.072 | 0.075 | 0.955 | ||
| 36 | 2017 | 0.260 | 0.275 | 0.945 | 83 | 2018 | 0.036 | 0.040 | 0.888 | ||
| 37 | 2018 | 0.221 | 0.240 | 0.922 | 84 | 2018 | 0.593 | 0.600 | 0.989 | ||
| 38 | 2018 | 0.186 | 0.200 | 0.929 | 85 | 2018 | 0.296 | 0.300 | 0.987 | ||
| 39 | 2018 | 0.223 | 0.235 | 0.947 | 86 | 2018 | 0.042 | 0.061 | 0.683 | ||
| 40 | 2018 | 0.097 | 0.099 | 0.979 | 87 | 2018 | 0.036 | 0.038 | 0.934 | ||
| 41 | 2018 | 0.040 | 0.042 | 0.950 | 88 | 2018 | 0.090 | 0.090 | 1.000 | ||
| 42 | 2019 | 0.475 | 0.480 | 0.989 | 89 | 2018 | 0.072 | 0.120 | 0.597 | ||
| 43 | 2019 | 0.375 | 0.390 | 0.962 | 90 | 2019 | 0.229 | 0.320 | 0.717 | ||
| 44 | 2019 | 0.067 | 0.070 | 0.963 | 91 | 2019 | 0.375 | 0.375 | 1.000 | ||
| 45 | 2021 | 34.669 | 35.000 | 0.991 | 92 | 2019 | 0.229 | 0.320 | 0.717 | ||
| 46 | 2021 | 59.139 | 65.000 | 0.910 | 93 | 2018 | 0.460 | 0.490 | 0.938 | ||
| 47 | 2019 | 0.469 | 0.469 | 1.000 | 94 | 2019 | 0.116 | 0.135 | 0.853 |
Source(s): Authors’ own work
3.2 Implement size and normality test
As the sample space (construction projects) is ample and unknown, the sample size can be computed using Eq. (2) (Badawy et al., 2022).
where Z is a value corresponding to a 95% confidence level and is equal to 1.96 and p represents the probability choice of 0.5. C is the confidence interval, which should be less than 0.2 (Badawy et al., 2022). Therefore, the minimum sample size for a confidence level of 95% was 44, which was less than the number of data (97 data). The low bid to OEC data ratio should be a normal distribution regarding project classifications. Hence, the data were tested using Kolmogorovand Shapiro tests in SPSS. The results revealed that the significant value of the two tests for the building, electric and mechanic projects was less than 0.05, as shown in Table 5. Therefore, the cost deviation for the three project types followed a normal distribution. However, the two tests cannot apply to highway projects due to the limited number of highway projects.
Normality test
| Tests of normality | ||||||
|---|---|---|---|---|---|---|
| Project type | Kolmogorov–Smirnova | Shapiro–Wilk | ||||
| Statistic | df | Sig. | Statistic | df | Sig. | |
| Building | 0.296 | 59 | 0.000 | 0.560 | 59 | 0.000 |
| Electric | 0.282 | 12 | 0.009 | 0.829 | 12 | 0.020 |
| Mechanic | 0.227 | 24 | 0.002 | 0.782 | 24 | 0.000 |
| Tests of normality | ||||||
|---|---|---|---|---|---|---|
| Project type | Kolmogorov–Smirnova | Shapiro–Wilk | ||||
| Statistic | df | Sig. | Statistic | df | Sig. | |
| Building | 0.296 | 59 | 0.000 | 0.560 | 59 | 0.000 |
| Electric | 0.282 | 12 | 0.009 | 0.829 | 12 | 0.020 |
| Mechanic | 0.227 | 24 | 0.002 | 0.782 | 24 | 0.000 |
Note(s): a. Lilliefors significance correction
Source(s): Authors’ own work
3.3 Develop ANN model
The primary purpose of the developed ANN model is to compute the TS of the ratio of the lower bid to OEC in several OEC and project types. The ANN model generally consists of three layers: input, hidden and output layer. The hidden layer also comprises one or two layers with different nodes (neurons). ANN is a method that computes the output by learning an algorithm from any function (Loy, 2019). The benefits of applying the ANN are simplicity and enables dealing with the nominal data such as the project type and time in this paper. The SPSS IBM software can sketch an ANN model with bias values and weight connections between neurons. Also, the SPSS IBM software, which offers relative errors at the two data with anticipated result values, simplifies choosing the training and testing data percentage.
3.3.1 Establish the ANN model' structure
Data are split into training and testing datasets. Data from January 2010 to December 2017 are used for fitting bid distributions and training the neural network. The constructed network forecasts low bids in 2018 (out-of-sample predictability). No unique relation or rule controls the number of neurons at hidden layers. Zayed (2001) suggested a formula to determine the neurons of hidden layers as 2m + 1, where m is the number of neurons at the input layer. The data utilized in the input layer was OEC, time (year of award) and project time, while the R was considered the output. The time ranged from 2010 to 2021 and was considered nominal data in the ANN model as factors whose value changed to zero or one. Therefore, the time can be represented in the ANN as 12 neurons (neurons for each year). In addition, the project types were considered as the nominal data and considered in the input layer of the ANN model as three neurons: B (building), E (electric) and M (Mechanic). Hence, the number of neurons (m) was 16 (1 for OEC, 12 for time and 3 for project types).
Regarding the hidden layer, two layers with thirty-three neurons of each layer (2m + 1) were considered in the ANN model. The ANN model's structures are shown in Figure 2. For example, when using data from the project, No. 1, as shown in Table 4, was used in the ANN model, the neurons of the OEC, B, and 2013 were set as 0.207, 1, and 1, respectively, while the other neurons of the input layers were set as zero. On the other hand, the R′ neuron was set as 0.830 in the output layer.
3.3.2 Maximize data
Because the data were respectively small, consisting of 94 data sets, the data was augmented using the method introduced by (Pasini, 2015). The data was divided, in this paper, into 10 subgroups, and one of them was considered as test data while the other was train data and inserted as fully train data. Based on the location of the test data subgroup, the 10 train data groups were generated, as shown in Figure 3. The first and second analyses were implemented, as illustrated in the following section.
3.3.3 First analysis
The first analysis consists of three steps: running the ANN model, assessing the ANN model and enhancing training data.
3.3.3.1 Run ANN model
The ANN model was run several times for each train data group, and the ANN model for each group was then taken as the average computed of the R (). Notably, the structure of the ANN model for the 10 train groups is the same. However, the weight values of the connection among the layers were different. Therefore, the ANN models for the 10 groups differed; there were 10 ANN models (ANN model per train data group).
3.3.3.2 Assess the ANN models
The ANN model was evaluated using three statistic indicators: MAPE, MSSE and RMSSE. The formula of the three indicators was shown in Eq. (3)–(5), respectively, as:
where and are the observed and computed of the R, respectively, and n is the data set number in the train data group.
3.3.3.3 Improve the training data
For each training data group, the training data was improved by deleting the abnormal data, which provided a significant error. The purpose of deleting the data with a critical residual error. Hence, the absolute percentage error (APE) was utilized to identify the abnormal data, and the APE can be computed using Eq. (6) as:
where is the observed R of ith contact, is the computed R of the ith case (contract) by the ANN model. Badawy (2020) considers 0.2 as a threshold value for the allowable and not allowable relative error. In this paper, the data with an value of more than 0.2 is considered abnormal and deleted from the train data group. After the deletion of the abnormal data, the modified train data group was established and utilized in the second analysis.
3.3.4 Second analysis
The ANN model was trained through the ten modified train data groups and performed the 10 ANN models. The output of the models was evaluated using MAPE, MSSE and RMSSE. One of the 10 ANN models was used to generate the TS.
3.4 Generate TS from 2010 to 2021 using the ANN model
The appropriate ANN model was utilized to generate the TS of the R from 2010 to 2021 for the three types of projects (building, electric and mechanic) and several OECs (10,000 SAR, 100,000 SAR, 1,000,000 SAR, 10,000,000 and 100,000,000SAR).
3.5 Forecast TS using DNN
DNN was utilized for each TS using the ANN model to predict the future TS. The DNN structure consists of the five-layer input, LSTM, drop, full connection function and regression, as shown in Figure 4 (MATLAB, 2021). The input layer represents the TS generated by the ANN model. The data type is a sequence due to the nature of the TS. Regarding the LSTM layer, the RNN has the issue of vanishing gradient learning; gradient learning represents the primary component of principle learning. Hochreiter and Schmidhuber (1997) designed the LSTM to overcome the vanishing gradient issue. An LSTM layer learns the long-term relationships between the sequence data and the time step in the series. The layer's addition function can improve gradient flow over extended training sequences. In addition, the DNN has a large number of layers. However, overfitting is a serious concern in such networks since merging the prediction of many outcomes is difficult. Thus, Drop is considered the problem-solving technique in such cases. The fundamental idea is to remove weight values randomly and of the connections from the neural network while it is being trained. As a result, units are prevented from over-co-adapting (Srivastava et al., 2014).
The TS was divided into training and testing TS at 65 and 35%, respectively. The training and testing data were standardized based on the mean of the ratio () and standardized deviation (std), as shown in Eq. (7):
The DNN model was evaluated by computing the RMSSE between and throughout the testing process using Eq. (3). It should be noted that the output data of the DNN should be unstandardized before they implement the evaluation processes. Unstandardized output data can be performed using Eq. (8) as
4. Results and discussions
There is a question about using TS to predict missing data instead of the ANN. The impact of missing data on TS analysis increases as the proportion of missing values grows. The TS can handle low missing data (5%) accurately. At the same time, accuracy degenerates when the proportion of missing values exceeds 10% (Junger and Leonm, 2015). In this paper, the missing data for TS after taking the average R per year is shown in Table 6. Therefore, the TS data used in this paper has a significant amount of missing data, reaching at least 30%. Therefore, TS cannot handle the extensive missing data (more than 10%).
Proportion of missing data for different project type
| Project type | Building | Electric | Mechanic |
|---|---|---|---|
| Proportion of missing data | 33% | 41.67% | 58.33% |
| Project type | Building | Electric | Mechanic |
|---|---|---|---|
| Proportion of missing data | 33% | 41.67% | 58.33% |
Source(s): Authors’ own work
Figure 5 shows the results assessments of the ten ANN models for the first and second analyses in terms of MAPE; its values in the first analysis range from 4% to 14.485. On the other hand, the MAPE values of the second analysis do not exceed 5.2%, which is close to 5%. It is included that the ANN models' accuracy is very accurate based on the accuracy classification. In addition, Figure 8 also shows the significant reduction of the MAPE of the models between the first and second analyses. In other words, deleting the abnormal data in the second analysis increases the accuracy of the ANN models. The ANN3 model provides the minimum value of MAPE for the two analyses. On the other hand, the ANN8 and ANN2 provided the maximum MAPE value for the first and second analyses, respectively.
For evaluating the ANN models in terms of MSSE and RMSSE, Figure 6 displays the MSSE of the ten ANN models. The two analyses' values were generally close to zero, indicating that the observed and computed R values were identical. For the first analysis, the MSSE was maximum at ANN8 (0.0154) and minimum value at the ANN1 model. Furthermore, for the second analysis, the MSSE of the ANN models was less than 0.002, except for the ANN2 model, whose value was 0.005. Figure 7 shows the RMSSE of the ten models for the two analyses. The minimum and maximum of the RMSSE for the first analysis were 0.007 (ANN1) and 0.12 (ANN8), respectively. Moreover, the ANN1 and ANN2 provide the minimum (0.024) and maximum values (0.071) for the second analysis.
The average MAPE for the first and second analyses was 11.11 and 2.94%, respectively, as shown in Figure 8. Deleting the abnormal data reduces the MAPE by 8.17% on average. In terms of MSSE, the average value of the MSSE and RMSSE was 0.109 and 0.104, respectively, for the first analysis. In addition, the value for the second analysis was 0.0016 and 0.039. Deleting the abnormal data in the second analysis decreases the MSSE and RMSSE by 0.108 and 0.065 on average, respectively. Based on the high accuracy of the ANN models, especially in the second analysis, the ANN models can be utilized to compute the TS of the R from 2010 to 2021 for different OECs.
The ANN2 model was considered to generate TS of the R due to several reasons: (1) the data training contains all years (2010–2021), the other ANN models have lost in one or more years, (2) the MAPE, MSSE and RMSSE of the ANN2 models provide a maximum value compared the other model. Hence, the computed R values are a conservative and upper estimate. Using the SPSS and setting the OEC, the R TS was generated for Building, Electric, and Mechanic contracts, as shown in Figures 9–11, respectively. In general, the R′ TS leads to a slight increase.
For building contracts, the R TS for OEC of 10,000, 1,000,000 and 100,000,000 SAR has a trend to stable, and the R-value was less than one (contract cost is lower than the owner estimate cost). On the other hand, the TS for the OEC of 100,000 and 10,000,000 SAR has notable variance with time, and the R-value exceeds the unit in several years, as shown in Figure 9.
Regarding the electric contracts, all R TS has a value less than the unit except the OEC of 1,000,000 SAR. In addition, the OEC 10,000, 100,000 and 100,000,000 SAR TS slightly trend toward increasing with a value less than a unit. However, the TS with the OEC of 1,000,000 and 10,000,000 SAR exhibit evident variance, as shown in Figure 10.
The TS of the mechanic project shown in Figure 11 has a remarkable change in value throughout time except for the TS of the OEC 1,000,000 SAR.
Figure 12 shows a typical observed and forecast TS of the ratio lower bid to the OEC through test data for mechanic contracts with an OEC of 100,000 SAR. The differences between the two TS were too slight, with an RMSSE of 0.065. Therefore, the DNN model provides reasonable accuracy for predicting a TS of a Mechanic contract with an OEC of 100,000 SAR. For evaluating the accuracy of the DNN model for different contract types and sizes through the testing process, Figure 13 shows the RMSSE for building, electric and mechanic, and the OEC ranges from 10,000 SAR to 100,000,000 SAR. In general, the DNN model provides reasonable accuracy for predicting the R-value, where the maximum value of the RMSSE does not exceed 0.3, representing a small value. The RMSSE increases with increasing the OEC's value for the mechanic contract. In other words, the accuracy of the DNN model decreases with increasing the OEC. The RMSSE's performance follows as bell form for building and electric contracts, with their maximum values at the OEC's value of 10,000,000 SAR.
Typical observed and forecast of R of testing data (mechanic contract with OEC = 100,000 SAR)
Typical observed and forecast of R of testing data (mechanic contract with OEC = 100,000 SAR)
RMSSE of observed and forecast of R in the testing process for different projects and OEC
RMSSE of observed and forecast of R in the testing process for different projects and OEC
For building contracts, the forecasting TS of the R can be categorized into three classifications: (1) periodic, (2) semi-periodic (slightly increasing of the R with time) and (3) attenuation series (decreasing of the R with time), as shown in Figure 14. The TS, for OEC equal to 100,000 SAR, has periodic performance with significant changes in the R-value. The TS of the OEC is equal to 10,000 SAR, and 10,000,000 SAR has a semi-periodic performance. The curve periodic time is 13 years and eight years, respectively, varying the R-value range from 0.88 to 1.04 for the OEC of 10,000 SAR and ranges from 0.44 to 1.3 for the OEC of 10,000,000 SAR. On the other hand, the performance TS of the OEC equal to 1,000,000 SAR and 100,000,000 SAR decayed to 0.82 and 0.75, respectively. Therefore, the R-value tends to equal 1 unit for the owner's estimate of 10,000 SAR and 10,000,000 SAR. The study's results are consistent with the Li et al. (2022) study on road projects, as the percentage increases with time, with the ratio of the lower bid to the OEC ranging from 0.8 to 1.1.
Regarding the electric contract, the forecasting TS of the R has only a periodic curve with different periods (T). It is defined as the time that it takes for two successive crests. The T value for the five-TS (from the OEC = 10,000 SAR to 100,000,000 SAR) was 2, 13.5, 7, 7 and 15 years, respectively, as shown in Figure 15. The R-value ranges from 0.6 to 1.1. The TS of the mechanic contract is similar to the Electric contracts except for the OEC of 100,000 SAR; it is decayed to the R-value of 0.89. In addition, the TS repeats itself in 15, 2, 16 and 8 years for the OEC of 10,000 SAR, 1,000,000 SAR, 10,000,000 and 100,000,000 SAR, respectively, as shown in Figure 16. The TS of the OEC of 10,000,000 SAR and 100,000,000 SAR suffer a significant change in the R-value, which varies from 0.5 to 1.1 and from 0.6 to 1.0, respectively. While the range of the R-value for the OEC of 10,000 SAR and 1,000,000 SAR is narrow, starting from 0.83 to 1.03 and from 0.93 to 1.09.
According to the above information, the TS of the electric and mechanic contract are stationary with time, which repeats itself for several periods. However, the TS of the building contracts sometimes suffers from non-periodic performance, and the R increases with time, especially in a building contract with an OEC of 10,000,000 SAR. The study's results are consistent with the study of (Li et al., 2022) on highway projects, as the percentage increases with time, with the ratio of the lower bid to the OEC ranging from 0.8 to 1.1. No studies performed TS on the relation between OEC and contract cost to implement more discussion with the study results. For three contracts and the OEC of 100,000,000 SAR, the contract (project) has an R-value of less than 1.0, where the low bid is less than the owner's estimate. As a result, the contract is expensive for both the owner and the contractor. It may result in disagreements, change orders, financial constraints on the contractor and project cost overruns (Jahren and Ashe, 1990; Li et al., 2021).
On the other hand, in the three contracts with the OEC of 10,000 SAR, the R-value is close to 1.0. The contract is close to stable through the construction stage and does not suffer a heavy burden on the owner or fund difficulties for the contractor. Regarding the remaining contracts, the R-value remarkably fluctuates around the 1.0 value. The contracts suffer either because the contract places a significant responsibility load on the owner to complete the planned projects on time in cases where R is greater than 1.0 or because the contract is cumbersome since it may lead to change orders, disagreements, financial pressure on the contractor and project cost overruns when the R-value is less than 1.0.
5. Conclusion
The paper aims to estimate the ratio of a low bid to an OEC using TS, ANN and DNN to enhance cost estimation processing in the future. Data from ninety-four contracts were collected from KSU in Riyadh, KSA, for three contract types (building, electric and mechanic). After performing the size and normality test, the data were classified into underestimated, optimum and overestimated data depending on the R-value (ratio of the low bid to OEC). The underestimated data were considered to develop the ANN model after using the maximize techniques to overcome the minor data issues. Then, the evaluation of the ANN models was implemented using MAPE, MSSE and RMSSE indicators to check the accuracy of the models. After that, the appropriate ANN model was selected and utilized to generate TS of the R from 2010 to 2021 for the three contract types and different amounts of the OEC. The generated TS was inserted into the DNN, which divided the TS data into training data (65%) and testing data (35%). Finally, the forecasting TS was estimated using the DNN for the three contracts and the different OECs. The finding revealed that the percentage of underestimated, optimum and over-estimated data was 4.2%, 81.1 and 14.7, respectively. The ANN models' MAPE, MSSE and RMSSE were 2.94%, 0.0015, and 0.039, respectively. The DNN's results revealed that the three types of contracts with an OEC of 100,000,000 SAR need more accurate. However, they are close to the optimum for the OEC of 10,000 SAR. This study provides the body of knowledge by developing an ANN and DNN model that enhances the accuracy of the OEC and narrows the discrepancy between the OEC and the lowest submitted offer. The owner and consultant should be able to use the study's findings to create more precise cost estimates and budget plans for better decision-making.
The authors want to thank King Saud University (KSU) for funding this research and providing study data.
Funding: The authors thank the Deputyship for Research and Innovation, Ministry of Education in Saudi Arabia, for funding this research work through project no. (IFKSUOR3-380-5).
Disclosure statement: No conflicts of interest exist. The submitting author is responsible for the co-author's interests.
Author contributions: Conceptualization, Almohsen, Alsanabani, Alsugair and Al-Gahtani; Data curation, Alsanabani; Formal analysis, Alsanabani and Al-Gahtani; Funding acquisition, Almohsen, Alsugair and Al-Gahtani; Investigation, Almohsen, Alsanabani, Alsugair and Al-Gahtani; Methodology, Alsanabani and Al-Gahtani; Project administration, Almohsen, Alsugair and Al-Gahtani; Resources, Almohsen, Alsugair and Al-Gahtani; Software, Alsanabani; Supervision, Almohsen, Alsugair and Al-Gahtani; Validation, Alsanabani and Al-Gahtani; Visualization, Almohsen, Alsanabani, Alsugair and Al-Gahtani; Roles/Writing: original draft, Alsanabani and Al-Gahtani; Writing: review and editing, Almohsen, Alsanabani, Alsugair and Al-Gahtani.
Data availability statement: The raw data that support the findings of this paper are available on request from the corresponding author.
















