Forecasts of commodity prices are vital issues to market participants and policy makers. Those of corn are of no exception, considering its strategic importance. In the present study, the authors assess the forecast problem for the weekly wholesale price index of yellow corn in China during January 1, 2010–January 10, 2020 period.
The authors employ the nonlinear auto-regressive neural network as the forecast tool and evaluate forecast performance of different model settings over algorithms, delays, hidden neurons and data splitting ratios in arriving at the final model.
The final model is relatively simple and leads to accurate and stable results. Particularly, it generates relative root mean square errors of 1.05%, 1.08% and 1.03% for training, validation and testing, respectively.
Through the analysis, the study shows usefulness of the neural network technique for commodity price forecasts. The results might serve as technical forecasts on a standalone basis or be combined with other fundamental forecasts for perspectives of price trends and corresponding policy analysis.
1. Introduction
Forecasting agricultural commodity prices has always been a significant task for policy makers and different agricultural market participants (Ouyang, Hu, Yang, Yao, & Lin, 2022; Wang, Wang, Li, & Zhou, 2022; Xu, 2017, Xu, 2018). This could be particularly the case when one considers the fact that agricultural commodities generally carry with them natural importance from a strategic perspective to a country or region (Xu and Zhang, 2022). Importance of forecasting corn prices is of no exception when one takes into consideration strategic importance of corn, which could include its close relationship with the energy economic sector (Alola, 2022; Forhad & Alam, 2022; Liu & Wang, 2022; Wu, Weersink, & Maynard, 2022), deep financialization of trading (Abuselidze, Alekseieva, Kovtun, Kostiuk, & Karpenko, 2022; Penone, Giampietri, & Trestini, 2022; Wang, Zhang, Wang, & Meng, 2022; Xu and Zhang, 2022; Xu, Li, Wang, & Li, 2022), and the role of serving as an important food source across the globe (Li et al., 2022; Lu et al., 2022; Niu et al., 2022; Yu, Yue, & Wang, 2022). Price forecasts are required by different forecast users in agricultural markets. For example, they offer useful insights into setting future sales prices to agricultural commodity processors, provide necessary information for reaching contractual requirements to trading partners, shed light on potential opportunities for seeking profits in spot and futures markets and suggest possible gaps in risk management and policy assessments to policy makers. As price volatilities tend to be rather irregular (Marfatia, Ji, & Luo, 2022; Xu, 2017, Xu, 2020; Yang, Du, Lu, & Tejeda, 2022, Yang, Ge, & Li, 2022), different price levels have immense impacts on business and policy decisions (Ricome & Reynaud, 2022; Wang et al., 2022; Warren-Vega, Aguilar-Hernández, Zárate-Guzmán, Campos-Rodríguez, & Romero-Cano, 2022; Xu, 2014; Xu & Thurman, 2015), and ultimately on allocations of resources and social welfare (Liu, Fang, Zhang, Zhong, & Chen, 2022; Ma, Zhang, Song, & Yu, 2022; Xu, 2019, Xu, 2019); price forecasting’s significance to the agricultural economic sector should not call for too much motivation.
One direction that has been pursued in the applied econometrics literature is utilizing time-series models for the purpose of building accurate and stable forecast results of commodity prices (Awokuse & Yang, 2003; Babula, Bessler, Reeder, & Somwaru, 2004; Bessler, 1982, Bessler, 1990; Bessler & Babula, 1987; Bessler & Brandt, 1981, Bessler & Brandt, 1992; Bessler & Chamberlain, 1988; Bessler & Hopkins, 1986; Bessler & Kling, 1986; Bessler, Yang, & Wongcharupan, 2003; Brandt & Bessler, 1981, Brandt & Bessler, 1982; Brandt & Bessler, 1983, Brandt & Bessler, 1984; Chen & Bessler, 1987, Chen & Bessler, 1990; Kling & Bessler, 1985; McIntosh & Bessler, 1988; Wang & Bessler, 2004; Xu, 2014, Xu, 2015; Xu & Thurman, 2015; Yang & Awokuse, 2003; Yang, Haigh, & Leatham, 2001; Yang & Leatham, 1998; Yang, Li, & Wang, 2021; Yang, Zhang, & Leatham, 2003). Some typical models sought in previous studies include the ARIMA model, VAR model and VECM model. Over the past decade, computational power has becoming much more affordable, and the interest among researchers in building machine learning models aiming at offering good forecasts in economics and finance has been well documented (Ge, Jiang, He, Zhu, & Zhang, 2020; Yang & Wang, 2019), including, of course, forecasts of commodity prices for the agricultural market (Abreham, 2019; Ali, Deo, Downs, & Maraseni, 2018; Antwi, Gyamfi, Kyei, Gill, & Adam, 2022; Ayankoya, Calitz, & Greyling, 2016; Bayona-Oré, Cerna, & Hinojoza, 2021; Degife & Sinamo, 2019; Deina et al., 2021; Dias & Rocha, 2019; Fang, Guan, Wu, & Heravi, 2020; Filippi et al., 2019; Gómez, Salvador, Sanz, & Casanova, 2021; Handoyo & Chen, 2020; Harris, 2017; Huy, Thac, Thu, Nhat, & Ngoc, 2019; Jiang, He, & Zeng, 2019; Khamis & Abdullah, 2014; Kohzadi, Boyd, Kermanshahi, & Kaastra, 1996; Kouadio et al., 2018; Li, Chen, Li, Wang, & Xu, 2020, Li, Li, Liu, Zhu, & Wei, 2020; Lopes, 2018; Mayabi, 2019; de Melo, Júnior, & Milioni, 2004; Melo, Milioni, & Nascimento Júnior, 2007; Moreno et al., 2018; Naveena et al., 2017; Rasheed, Younis, Ahmad, Qadir, & Kashif, 2021; dos Reis Filho, Correa, Freire, & Rezende, 2020; Ribeiro & Oliveira, 2011; Ribeiro, Ribeiro, Reynoso-Meza, & dos Santos Coelho, 2019; Ribeiro & dos Santos Coelho, 2020; RL & Mishra, 2021; Shahhosseini, Hu, & Archontoulis, 2020, Shahhosseini, Hu, Huber, & Archontoulis, 2021; Silalahi et al., 2013; Silva, Siqueira, Okida, Stevan, & Siqueira, 2019; Storm, Baylis, & Heckelei, 2020; Surjandari, Naffisah, & Prawiradinata, 2015; Wan & Zhou, 2021; Wen et al., 2021; Xu & Zhang, 2022, Xu & Zhang, 2022; Yoosefzadeh-Najafabadi, Earl, Tulpan, Sulik, & Eskandari, 2021; Yuan, San, & Leong, 2020; Zelingher, Makowski, & Brunelle, 2020, Zelingher, Makowski, & Brunelle, 2021; Zhang, Meng, Wei, Chen, & Qin, 2021; Zhao, 2021; Zou, Xia, Yang, & Wang, 2007), such as corn (Antwi et al., 2022; Ayankoya et al., 2016; Mayabi, 2019; Moreno et al., 2018; dos Reis Filho et al., 2020; Ribeiro et al., 2019; Shahhosseini et al., 2020, 2021; Surjandari et al., 2015; Wan & Zhou, 2021; Xu & Zhang, 2021; Zelingher et al., 2020, 2021), soybean oil (Li et al., 2020; Silalahi et al., 2013; Xu & Zhang, 2022), coffee (Abreham, 2019; Degife & Sinamo, 2019; Deina et al., 2021; Huy et al., 2019; Kouadio et al., 2018; Lopes, 2018; Naveena et al., 2017), peanut oil (Mishra & Singh, 2013; Quan-Yin, Yong-Hu, Yun-Yang, & Tian-Feng, 2014; Singh & Mishra, 2015; Yin & Zhu, 2012; Zhu, Yin, Zhu, & Zhou, 2014; Zong & Zhu, 2012, Zong & Zhu, 2012), palm oil (Kanchymalay, Salim, Sukprasert, Krishnan, & Hashim, 2017), wheat (Dias & Rocha, 2019; Fang et al., 2020; Gómez et al., 2021; Khamis & Abdullah, 2014; Kohzadi et al., 1996; Rasheed et al., 2021; Ribeiro & dos Santos Coelho, 2020; Zou et al., 2007), oats (Harris, 2017), soybeans (Handoyo & Chen, 2020; Jiang et al., 2019; Li et al., 2020; dos Reis Filho et al., 2020; Ribeiro & dos Santos Coelho, 2020; Yoosefzadeh-Najafabadi et al., 2021; Zhao, 2021), canola (Filippi et al., 2019; Shahwan & Odening, 2007; Wen et al., 2021), cotton (Ali et al., 2018; Fang et al., 2020) and sugar (de Melo et al., 2004; Melo et al., 2007; Ribeiro & Oliveira, 2011; Silva et al., 2019; Surjandari et al., 2015; Zhang et al., 2021). The machine learning forecasting tools often observed in the literature include deep learning (RL & Mishra, 2021), random forest (Dias & Rocha, 2019; Filippi et al., 2019; Gómez et al., 2021; Kouadio et al., 2018; Li et al., 2020; Lopes, 2018; Ribeiro & dos Santos Coelho, 2020; Shahhosseini et al., 2020, 2021; Wen et al., 2021; Yoosefzadeh-Najafabadi et al., 2021; Zelingher et al., 2020, 2021), K-nearest neighbor (Abreham, 2019; Gómez et al., 2021; Lopes, 2018), genetic programming Ali et al. (2018), support vector regression (Abreham, 2019; Dias & Rocha, 2019; Fang et al., 2020; Gómez et al., 2021; Harris, 2017; Kanchymalay et al., 2017; Li et al., 2020, 2020; Lopes, 2018; dos Reis Filho et al., 2020; Ribeiro & dos Santos Coelho, 2020; Surjandari et al., 2015; Yoosefzadeh-Najafabadi et al., 2021; Zhang et al., 2021; Zhao, 2021), decision tree (Abreham, 2019; Degife & Sinamo, 2019; Dias & Rocha, 2019; Harris, 2017; Lopes, 2018; Surjandari et al., 2015; Zelingher et al., 2020, 2021), extreme learning (Deina et al., 2021; Jiang et al., 2019; Kouadio et al., 2018; Silva et al., 2019), neural network (Abreham, 2019; Antwi et al., 2022; Ayankoya et al., 2016; Deina et al., 2021; Fang et al., 2020; Harris, 2017; Huy et al., 2019; Khamis & Abdullah, 2014; Kohzadi et al., 1996; Li et al., 2020, 2020; Mayabi, 2019; de Melo et al., 2004; Melo et al., 2007; Mishra & Singh, 2013; Moreno et al., 2018; Naveena et al., 2017; Quan-Yin et al., 2014; Rasheed et al., 2021; Ribeiro & Oliveira, 2011; Ribeiro & dos Santos Coelho, 2020; Shahwan & Odening, 2007; Silalahi et al., 2013; Silva et al., 2019; Singh & Mishra, 2015; Wan & Zhou, 2021; Xu & Zhang, 2021, 2022; Yin & Zhu, 2012; Yoosefzadeh-Najafabadi et al., 2021; Yuan et al., 2020; Zhang et al., 2021; Zhu et al., 2014; Zong & Zhu, 2012, Zong & Zhu, 2012; Zou et al., 2007), boosting (Gómez et al., 2021; Lopes, 2018; Ribeiro & dos Santos Coelho, 2020; Shahhosseini et al., 2020, 2021; Zelingher et al., 2020, 2021), multivariate adaptive regression splines (Dias & Rocha, 2019) and ensemble (Fang et al., 2020; Ribeiro et al., 2019; Ribeiro & dos Santos Coelho, 2020; Shahhosseini et al., 2020, 2021). With these reviews, although not exhaustive, it appears that the neural network model is one of the most useful techniques in terms of constructing price forecasts for agricultural commodities (Bayona-Oré, Cerna, & Tirado Hinojoza, 2021). More specifically, a wide variety of time-series variables that are chaotic and noised could be well forecasted through the neural network model (Karasu, Altan, Bekiros, & Ahmad, 2020; Wang & Yang, 2010; Wegener, von Spreckelsen, Basse, & von Mettenheim, 2016; Xu, 2015, Xu, 2018, Xu, 2018, Xu, 2018; Yang, Cabrera, & Wang, 2010, Yang, Su, & Kolari, 2008), including many different types of economic and financial time series (Xu & Zhang, 2022). This fact could stem from the good capability of the neural network model for self-learning (Karasu, Altan, Saraç, & Hacioğlu, 2017, Karasu, Altan, Saraç, & Hacioğlu, 2017) and characterizing nonlinear features (Altan, Karasu, & Zio, 2021; Karasu et al., 2020; Xu & Zhang, 2022, Xu & Zhang, 2022) in various time series (Xu, 2018; Xu & Zhang, 2021, Xu & Zhang, 2021). Here, we adopt the neural network for the forecasting exercise of the price of yellow corn.
To conduct our analysis, the forecast problem in a data set of weekly wholesale price indices of yellow corn in China from January 1, 2010 to January 10, 2020 is examined via the nonlinear auto-regressive neural network technique. We assess performance of forecasts stemming from different settings of models, which include considerations of training algorithms, hidden neurons, delays and how the data are segmented. With the analysis, a relatively simple model is constructed, and it produces performance that is rather accurate and stable. The present work serves as the first one in addressing the price forecast problem for wholesale yellow corn in the Chinese market. Forecast results here could be utilized as part of technical analysis and/or combined with other fundamental forecasts as part of policy analysis.
2. Literature review
For price forecasting tasks in the agricultural sector, the literature has witnessed a great amount of studies that explore the use of econometric methods with the goal of producing stable and accurate forecasts. For example, the ARIMA model has been a great success in this field. It is univariate and generally relies on past values of a variable to be forecasted. Previous work has found it helpful for forecasting prices of wheat (Bessler & Babula, 1987) and cattle and hog (Bessler, 1990; Bessler & Brandt, 1981; Brandt & Bessler, 1981, 1982, 1983, 1984; Kling & Bessler, 1985). Instead of utilizing a single source of information for forecasting, the VAR, as another popular econometric forecasting tool, is built upon investigated economic variables’ relations (Awokuse & Yang, 2003; Bessler & Brandt, 1992; Bessler & Chamberlain, 1988; Bessler & Hopkins, 1986; Chen & Bessler, 1987; McIntosh & Bessler, 1988; Rezitis, 2015). Previous studies have demonstrated that it has good potential for forecasting prices of cotton (Chen & Bessler, 1990), wheat (Yang, Zhang, & Leatham, 2003), and soybeans (Babula et al., 2004). As compared to the VAR, the VECM is built upon the concept of cointegration, which is used to further incorporate long-run relationships among investigated economic variables (Xu, 2019, Xu, 2019; Xu & Zhang, 2023; Yang & Awokuse, 2003; Yang & Leatham, 1998; Yang et al., 2021). The VECM is usually found to be particularly useful for long-term price forecasting tasks (Bessler et al., 2003; Wang & Bessler, 2004).
The good potential of the econometric techniques mentioned above has been found as well among various forecasting research regarding prices of corn. For example, Zhou (2021) used the ARIMA for modeling monthly corn prices in China during April 2019–February 2021 and forecasting the price in March 2021, and obtained good accuracy. Crespo Cuaresma, Hlouskova and Obersteiner (2021) studied auto-regressive models, VARs, VECMs and their variations and combinations for forecasts of different agricultural commodity prices that include those of corn. They found that market fundamentals and macroeconomic developments contribute systematic predictive information for the forecast purpose. Albuquerquemello, Medeiros, Jesus and Oliveira (2021) assessed ARIMAs, VARs and their variations, particularly the consideration of transition regime models, for monthly U.S. corn price forecasts and pointed out the importance of incorporating nonlinear patterns in the model. Wan and Zhou (2021) examined corn futures price forecasts based on the ARIMA with data from China Dalian Commodity Exchange during 2018–2021 and concluded that a deeper consideration of parameter selection might improve model performance. Antwi, Gyamfi, Kyei, Gill and Adam (2022) investigated the ARIMA for corn futures price forecasts from Bloomberg during 2016–2021 and found that data decomposition techniques could help improve model accuracy, Jaiswal, Jha, Kumar and Choudhary (2021) researched the ARIMA for forecasts of monthly corn prices from World Bank Commodity Price Data during 1980–2020 and found that it achieved decent accuracy, although not optimal as compared to some machine learning models they considered. Silva, Barreira and Cugnasca (2021) evaluated the ARIMA for corn price forecasts in Brazil and found that it consistently underperforms as compared to machine learning models.
Advancements of machine learning techniques have been discovered in a diverse variety of forecasting work. For prices of corn investigated here, there does not exist an exception. For example, Wan and Zhou (2021) examined the comparison between the long short-term memory neural network and the ARIMA for corn futures price forecasts from China Dalian Commodity Exchange during 2018–2021 and found that the former leads to consistent better performance than the latter. Antwi et al. (2022) investigated the back propagation neural network for corn futures price forecasts from Bloomberg during 2016–2021 and determined that data decomposition techniques contribute to improved performance in terms of accuracy. Jaiswal et al. (2021) developed a deep long short-term memory neural network for forecasts of monthly corn prices from World Bank Commodity Price Data during 1980–2020 and concluded that it beats both the ARIMA and conventional time-delay neural network. Silva et al. (2021) studied the corn price forecast problem in Brazil by considering different machine learning models and found that the performance rank from the best to worst is: the support vector regression, the ensemble of the support vector regression and long short-term memory neural network, the ensemble of the AdaBoost and support vector regression, and the ensemble of the AdaBoost and long short-term memory neural network.
3. Data
We analyze weekly wholesale price indices of corn in the Chinese market from January 1, 2010, to January 10, 2020. In Figure 1, we plot the price series in the top left panel, the first differences of prices in the top right panel, the histogram of forty bins and the corresponding kernel estimates of price in the bottom left panel and the histogram of forty bins and the corresponding kernel estimates of the first differences of prices in the bottom right panel. We note that average weekly price of June 1994 serves as the price of the base period, and its value is set to 100, which indicates fifty-kilogram’s price of wholesale yellow corn. Table 1 presents the usual summary statistics of the prices, where we could see that they do not follow normal distributions like most of financial time series (Xu, 2017, Xu, 2019; Xu & Zhang, 2022, Xu & Zhang, 2022). Finally, we note that the price index is missing on February 19, 2010, and we apply the cubic spline interpolation technique for an approximated value of 122.839, which is rather close to 122.85 on February 12, 2010, and 122.53 on February 26, 2010.
Top panel: The weekly price index of yellow corn (left) and first differences of prices (right); bottom panel: histograms of forty bins and kernel estimates for the weekly price index and its first differences
Top panel: The weekly price index of yellow corn (left) and first differences of prices (right); bottom panel: histograms of forty bins and kernel estimates for the weekly price index and its first differences
Summary statistics of the weekly price index and its first differences of yellow corn
| Commodity | Series | Minimum | Mean | Median | Std | Maximum | Skewness | Kurtosis | Jarque-Bera |
|---|---|---|---|---|---|---|---|---|---|
| Yellow corn | Price | 108.1800 | 139.2282 | 134.0200 | 16.6369 | 177.4100 | 0.0683 | 1.7032 | <0.001 |
| First difference | −8.6300 | −0.0044 | 0.0400 | 1.5583 | 6.2000 | −0.8492 | 6.4195 | <0.001 |
| Commodity | Series | Minimum | Mean | Median | Std | Maximum | Skewness | Kurtosis | Jarque-Bera |
|---|---|---|---|---|---|---|---|---|---|
| Yellow corn | Price | 108.1800 | 139.2282 | 134.0200 | 16.6369 | 177.4100 | 0.0683 | 1.7032 | <0.001 |
| First difference | −8.6300 | −0.0044 | 0.0400 | 1.5583 | 6.2000 | −0.8492 | 6.4195 | <0.001 |
Source(s): Elaborated by the authors
4. Method
The nonlinear auto-regressive neural network model is adopted here for weekly price forecasts of wholesale yellow corn. It can be represented as yt = f(yt−1, …, yt−d), where y is the price series of corn that will be forecasted, t is used to index time, d is used to denote the number of delays and f is used to represent the function. We note that f will need to be estimated as , where k is used to denote the number of hidden layers whose transfer function is represented by ϕ, βij is used to denote the parameter that corresponds to the weight associated with the connection between the i − th input unit and j − th hidden unit, αj is used to denote the weight associated with the connection between the j − th hidden unit and output unit, β0j is used to denote the constant that corresponds to the j − th hidden unit, α0 is used to denote the constant that corresponds to the output unit and ɛ is used to denote the error. The current work concentrates on forecasts that are one-week ahead.
The model with the structure of a two-layer feed-forward network is applied here. It uses a sigmoid transfer function among the hidden layers and a linear transfer function for the output layer. More specifically, the logistic function of serves as the sigmoid transfer function. yt, the output, would be fed back through the delays back to the network’s input, and for the purpose of efficiency, the model training would adopt the form of an open loop, in which the real output is employed instead of the output that is estimated. The adoption of the open loop would ensure that the network’s inputs are more accurate, and as a result, the network would be purely feedforward.
For model training algorithm, we explore two options. One is the LM (Levenberg–Marquardt) algorithm (Levenberg, 1944; Marquardt, 1963) and the other is the SCG (scaled conjugate gradient) algorithm (Møller, 1993). These two algorithms have witnessed wide successful applications for forecasting purposes from different research areas (Doan & Liong, 2004; Kayri, 2016; Khan, Alam, Shahid, & Mazliham, 2019; Selvamuthu, Kumar, & Mishra, 2019; Xu & Zhang, 2021, Xu & Zhang, 2021, Xu & Zhang, 2022, Xu & Zhang, 2022, Xu & Zhang, 2022, Xu & Zhang, 2022). Their comparisons have been illustrated in previous research (Al Bataineh & Kaur, 2018; Baghirli, 2015; Xu & Zhang, 2022, Xu & Zhang, 2022, Xu & Zhang, 2022). Basically, the LM algorithm could robustly handle the problem of slow convergence (Hagan & Menhaj, 1994) by approximating the Hessian matrix (Paluszek & Thomas, 2020), and the SCG algorithm generally executes even faster as it does not involve line searches. Figure 2 shows the architecture of the final neural network model built in this work.
The block diagram of the neural network model of the two-layer feedforward structure with a logistic sigmoid transfer function for the hidden layer and a linear transfer function for the output layer based on 5 delays and 5 hidden neurons
The block diagram of the neural network model of the two-layer feedforward structure with a logistic sigmoid transfer function for the hidden layer and a linear transfer function for the output layer based on 5 delays and 5 hidden neurons
LM algorithm. In this algorithm, using a system whose weights are denoted as w1 and w2 as an example, the approximation of the Hessian matrix, H, is made as H ≈ JTJ, where for a nonlinear function E(⋅) that contains the information of the sum square error whose . The gradient could be expressed as g = JTe, where e denotes an error vector. For updating weights and biases, the rule of is adopted, where w denotes the weight vector, k denotes the index of the iteration during model training, I denotes the identity matrix and μ denotes the combination coefficient that is always positive. When μ = 0, the LM algorithm will be similar to Newton’s method. If μ is large, it would turn to be gradient descent with small step sizes. μ would be decreased after successful steps due to less need for faster gradient descent.
SCG algorithm. Weight adjustments in backpropagation algorithms are in the steepest descent because, in that direction, the performance function would decrease rapidly. However, this does not guarantee fastest convergence. As compared to the steepest descent, searches are conducted along conjugate directions in conjugate gradient algorithms for determining step sizes to reduce the performance function in iterations and convergence is generally faster. In addition, to avoid line searches in conjugate gradient algorithms, which could be time consuming, the SCG algorithm is adopted here as a fully automated supervised algorithm.
During the arrivals of our final model, different settings over delays, hidden neurons and data spitting ratios, in addition to algorithms, are tested. Specifically, delays of 2, 3, 4, 5 and 6, hidden neurons of 2, 3, 5 and 10, and data spitting ratios of 60%–20%–20%, 70%– 15%–15% and 80%–10%–10% for training–validation–testing are evaluated. Only training and validation part of the data are involved in selecting model parameters. Put in another way, only training and validation part of the data have been “seen” by a model. The testing part of the data has not been involved in selecting model parameters, and this part is only for testing a constructed model using the training and validation part of the data. For terminating the process of model training, we consider two options: the gradient’s magnitude and the validation check number. When model training has reached a performance minimum, the gradient would turn to be pretty small. Model training would be terminated if the gradient’s magnitude is smaller than 10−5. The validation check number refers to successive iterations whose performance based upon the validation part of the data no longer decreases. We adopt six as the validation check number, and model training would be terminated once it reaches six validation checks. Further, the maximal training iteration number is one thousand, and model training would be terminated once it reaches this iteration number. Other settings for the LM algorithm are as follows. μ’s initial value is set to 0.001, μ’s decreasing factor is set to 0.1, μ’s increasing factor is set to 10 and μ’s maximal value is set to 1010. Other settings for the SCG algorithm are as follows. The Marquardt adjustment parameter is set to 0.005, the weight change determinant is set to 5 × 10−5 for approximating second derivatives and the parameter for regulating the Hessian’s indefiniteness is set to 5 × 10−7. Table 2 contains all evaluated model settings, where the #67 is applied for building our final model for the price index of yellow corn. It is using 5 delays and 5 hidden neurons and trained with the LM algorithm and the training–validation–testing ratio of 60%–20%–20%.
Explored model settings for the weekly price index of yellow corn
| Model setting | ||
|---|---|---|
| Algorithm | LM | 1 + 2i (i = 0,1,…,59) |
| SCG | 2 + 2i(i = 0,1,…,59) | |
| Delay | 2 | 1 + 10j–2 + 10j (j = 0,1,…,11) |
| 3 | 3 + 10j–4 + 10j (j = 0,1,…,11) | |
| 4 | 5 + 10j–6 + 10j (j = 0,1,…,11) | |
| 5 | 7 + 10j–8 + 10j (j = 0,1,…,11) | |
| 6 | 9 + 10j–10 + 10j (j = 0,1,…,11) | |
| Hidden neuron | 2 | 1 + 40k–10 + 40k (k = 0,1,2) |
| 3 | 11 + 40k–20 + 40k (k = 0,1,2) | |
| 5 | 21 + 40k–30 + 40k (k = 0,1,2) | |
| 10 | 31 + 40k–40 + 40k (k = 0,1,2) | |
| Training vs validation vs testing ratio | 70% vs 15% vs 15% | 1–40 |
| 60% vs 20% vs 20% | 41–80 | |
| 80% vs 10% vs 10% | 81–120 |
| Model setting | ||
|---|---|---|
| Algorithm | LM | 1 + 2i (i = 0,1,…,59) |
| SCG | 2 + 2i(i = 0,1,…,59) | |
| Delay | 2 | 1 + 10j–2 + 10j (j = 0,1,…,11) |
| 3 | 3 + 10j–4 + 10j (j = 0,1,…,11) | |
| 4 | 5 + 10j–6 + 10j (j = 0,1,…,11) | |
| 5 | 7 + 10j–8 + 10j (j = 0,1,…,11) | |
| 6 | 9 + 10j–10 + 10j (j = 0,1,…,11) | |
| Hidden neuron | 2 | 1 + 40k–10 + 40k (k = 0,1,2) |
| 3 | 11 + 40k–20 + 40k (k = 0,1,2) | |
| 5 | 21 + 40k–30 + 40k (k = 0,1,2) | |
| 10 | 31 + 40k–40 + 40k (k = 0,1,2) | |
| Training vs validation vs testing ratio | 70% vs 15% vs 15% | 1–40 |
| 60% vs 20% vs 20% | 41–80 | |
| 80% vs 10% vs 10% | 81–120 |
Source(s): Elaborated by the authors
5. Result
We evaluate each model setting contained in Table 2 for weekly prices of wholesale yellow corn. We adopt the relative root mean square error (RRMSE) for measuring forecast performance and calculate RRMSEs generated from each model setting across the training phase, validation phase and testing phase. Figure 3 reports the results of all RRMSEs. During the process of determining the final model setting for the price series, we take into consideration the need to balance forecast accuracy and forecast stabilities across the three phases, and select the setting #67 (5 delays and 5 hidden neurons). This setting is applying the LM algorithm and the data segmentation ratio of 60%–20%–20% for training–validation–testing, thus reserving the largest amount of the data for model testing purposes among three different data segmentation rations examined. We can observe from Figure 3, where the setting #67 is indicated via a dark arrow, that for the selected setting, the diamond for the training phase, the square for the validation phase and the triangular for the testing phase are rather close to each other. As compared to the selected setting, there exist others that generate a lower RRMSE for a specific subsample but with higher RRMSEs for the remaining subsamples, suggesting lower stabilities. For example, the setting #71 generates a lower RRMSE than the setting #67 for the training phase but higher RRMSEs for the validation and testing phases. By selecting the model setting with relatively stable performance across the training phase, validation phase and testing phase, we try to avoid potential problems of model overfitting or underfitting.
RRMSEs across all model settings for the weekly price index of yellow corn
With the selected setting determined for prices of yellow corn, we turn to assess sensitivities of performance to different settings through switching one model setting each time. Figure 4 shows the results of assessments of performance sensitivities, where RRMSEs corresponding to the training phase, validation phase and testing phase are reported. The performance comparison between the model setting #67 and the model setting #68 aims at evaluating the sensitivity to training algorithm as the former is based upon the LM algorithm while the latter is based upon the SCG algorithm. Performance comparisons between the model setting #67 and model settings #61, #63, #65 and #69 aim at evaluating sensitivities to delays as the former is based upon 5 delays while the latter four are based upon 2, 3, 4 and 6 delays, respectively. Performance comparisons between the model setting #67 and model settings #47, #57 and #77 aim at evaluating sensitivities to hidden neurons as the former is based upon 5 hidden neurons while the latter three are based upon 2, 3 and 10 hidden neurons, respectively. Performance comparisons between the model setting #67 and model settings #27 and #107 aim at evaluating sensitivities to how the price series is segmented into the training phase, validation phase and testing phase as the former is based upon the ratio of 60%–20%–20% while the latter two are based upon ratios of 70%–15%–15% and 80%–10%–10%, respectively. With these performance comparisons, the model setting #67 is selected for the price series of yellow corn. Based upon the model setting #67, RRMSEs are 1.05%, 1.08% and 1.03%, respectively, corresponding to the training phase, validation phase and testing phase, and the overall RRMSE is 1.05%. From Figure 4, we can observe that the LM algorithm leads to lower RRMSEs than the SCG algorithm. Specifically, this is evidenced through the performance comparison between the model setting #67 and the model setting #68. The achievement of higher accuracy via the LM algorithm for neural networks based upon the multilayer perceptron structure and two hidden layers as compared to the SCG algorithm tends to be consistent with the finding in previous work (Batra, 2014; Xu & Zhang, 2022). Overall performance is slightly better based on the model settings #27 and #107 than the model setting #67 because the model setting #67 reserves fewer data for training and validation phases than the model settings #27 and #107. But the minor performance differences between the model setting #67 and model settings #27 and #107 suggest that the results are generally robust to data segmentation ratios.
Sensitivities of model performance (the RRMSE) to different model settings for the weekly price index of yellow corn
Sensitivities of model performance (the RRMSE) to different model settings for the weekly price index of yellow corn
We present plots of detailed forecasted results based upon the selected model setting in the top panel of Figure 5 and corresponding detailed forecast errors in the bottom panel of Figure 5 across the training phase, validation phase and testing phase. Overall, the selected model setting for the price series of yellow corn generates good forecast performance results that are also stable across different phases. In addition, as can be seen from Figure 5, the selected model setting does not lead to the issue of consistent overprediction or underprediction across the phases. To assess the adequacy of the selected model setting, analysis of auto-correlations of errors has been conducted (results are omitted here for brevity but are available upon request) for up to 20 lags, and it is found that they generally do not breach the 95% confidence limits with the two exceptions of the 4-th and 18-th lags, for which slight breaches are determined. These slight breaches would have been avoided with the use of the 99% confidence limits. Thus, the analysis of auto-correlations of errors confirms the adequacy of the selected model.
Top panel: forecasts of the weekly price index of yellow corn; bottom panel: forecast errors calculated as observations minus forecasts
Top panel: forecasts of the weekly price index of yellow corn; bottom panel: forecast errors calculated as observations minus forecasts
Inhabitancies of potential nonlinearities in the higher moments in financial or economic time series have been widely reported in the literature (Karasu et al., 2020; Wang & Yang, 2010; Yang et al., 2010, 2008). Here, we use the BDS test (Brock, Scheinkman, Dechert, & LeBaron, 1996; Dergiades, Martinopoulos, & Tsoulfidis, 2013; Fujihara & Mougoué, 1997) on weekly prices of yellow corn and determine that the corresponding p − values are all nearly zero based upon different testing scenarios. Given this situation, neural network models are suitable for modeling nonlinear features in the price series (Altan et al., 2021; Karasu et al., 2020). There are other machine learning approaches that could be considered for modeling nonlinearities. One advantage of neural network models is the use of combinations of different nonlinear functions rather than the use of one particular nonlinear function for approximations of the underlying price time series (Wang & Yang, 2010; Yang et al., 2010, 2008). With forecast results achieved here that are rather accurate and stable, our analysis demonstrates the potential of neural network models for forecasting prices of wholesale yellow corn.
6. Robustness analysis
Determining the number of layers needed for particular tasks has been an interesting topic for both theoretical and empirical research on neural networks, and the theoretical literature has not yet provided explicit guidelines in this regard (Gershenson, 2003; Jain, Mao, & Mohiuddin, 1996). From a practical standpoint, the implementation of the neural network generally does not require too many layers because training time would grow exponentially following the increase in the number of layers used (i.e. much more computation would be needed) and the tendency of model overfitting would also be elevated (Gershenson, 2003). A two-layer network could already form rather complex decision boundaries (Jain et al., 1996). For our particular case without many predictors, a two-layer network seems sufficient. A seminal study pointed out that a neural network generally does not need more than two hidden layers to solve most problems (Lapedes & Farber, 1987). Thus, to assess sensitivities of model performance to the number of layers used, we consider another neural network that has an additional hidden layer than our selected setting #67 and compare the resultant RRMSEs. We call this alternative neural network model “NN#67–MoreLayers,” which also uses 5 delays, 5 hidden neurons, the LM algorithm and the data splitting ratio of 60% vs 20% vs 20% for training, validation and testing.
When making comparisons of different models’ performance, we also adopt a modified Diebold–-Mariano (Diebold & Mariano, 2002) test (Harvey, Leybourne, & Newbold, 1997). This test of differences in performance of different models is based upon , where and are used to denote two errors associated with time t that stem from models M1 and M2, respectively. Here for our case, we could let M1 denote NN#67–MoreLayers and M2 denote our selected setting #67. The test statistic, which could be denoted as MDM, can be expressed as , where T is used to denote the length of the time period based on which comparisons of performance are carried out, h is used to denote the time horizon (for our case, h = 1), is used to denote dt’s sample mean, is used to denote dt’s variance and is used to denote dt’s kth auto-covariance for k = 1, …, h − 1 and h ≥ 2. Under the null that two particular models being compared produce equal MSEs (mean squared errors), the MDM test would follow the t–distribution with T − 1 degrees of freedom. Figure 6 shows comparisons of the setting #67 and NN#67–MoreLayers in terms of RRMSEs, where it could be observed that the two models lead to close performance. Specifically, NN#67–MoreLayers leads to slightly better performance for the training phase, and the setting #67 leads to slightly better performance for the validation and testing phases. The p − value of the MDM test for the testing phase is 0.263, indicating that the two models do not lead to performance differences that are statistically significant. As NN#67–MoreLayers is more complicated than the setting #67 but does not lead to significant better performance, the setting #67 appears to be a better choice for our case.
7. Benchmark analysis
Analysis so far has focused on the neural network. Here, we consider the following benchmark models against our selected setting #67: the random walk (RW) model, the autoregressive (AR) model, the autoregressive-generalized autoregressive conditional heteroskedasticity (AR-GARCH) model, the support vector regression (SVR) model, the regression tree (RT) model and the long short-term memory neural network (LSTM) model. Similar to robustness analysis aforementioned, when comparing model performance, we consider both RRMSEs and MDM tests of differences in MSEs.
Details of the six benchmark models are as follows. The RW model uses the price of the previous week as the forecast. The AR model uses the same number of lags as the setting #67, which is 5. The AR-GARCH model also uses the same number of lags as the setting #67 for the AR part and the GARCH(1,1) structure for the GARCH part. The linear ϵ-insensitive SVR model is adopted here, with the box constraint set to be the interquartile range of the target variable divided by 1.349 and the half the width of the ϵ-insensitive band set to be the interquartile range of the target variable divided by 13.49, which uses lagged one to lagged five price series as predictors. The RT model is based upon the classification analysis and regression tree (CART) algorithm (Breiman, 2017), with the minimum number of branch node observations set to 10 and the minimum number of leaf node observations set to 4, which uses lagged one to lagged five price series as predictors. The LSTM model uses the two-layer structure in the open loop form with the number of time steps set to 5 and the number of LSTM units set to 10, which employs the Adam optimizer for training.
Figure 7 presents performance comparisons of the setting #67 and the six benchmark models based upon the RRMSE. Table 3 presents performance comparisons based upon the MDM test for the testing phase. From these results, we could observe that the RW model leads to relatively close performance to the setting #67 based upon the RRMSE, but the MDM test suggests that performance generated by the setting #67 is significantly better than that by the RW model for the testing phase at the 5% level. While the AR-GARCH model helps improve performance based upon the AR model, these two models do not lead to as accurate performance as the setting #67, and the corresponding MDM tests both lead to p − values well below 0.001. Although the SVR and RT models do not beat the setting #67, their performance in terms of the RRMSE is not far from that based on the setting #67. MDM tests suggest that performance generated by the setting #67 is significantly better than that by the SVR and RT models at the 5% level. The LSTM model could further improve performance based upon the setting #67 but the magnitude is rather limited in terms of the RRMSE, with the MDM test suggesting an insignificant result either.
MDM test results of benchmark analysis
| Comparison | p − value of the MDM test |
|---|---|
| #67 vs RW | 0.042 |
| #67 vs AR | <0.001 |
| #67 vs AR-GARCH | <0.001 |
| #67 vs SVR | 0.029 |
| #67 vs RT | 0.018 |
| #67 vs LSTM | 0.135 |
| Comparison | p − value of the MDM test |
|---|---|
| #67 vs RW | 0.042 |
| #67 vs AR | <0.001 |
| #67 vs AR-GARCH | <0.001 |
| #67 vs SVR | 0.029 |
| #67 vs RT | 0.018 |
| #67 vs LSTM | 0.135 |
Source(s): Elaborated by the authors
8. Conclusion
For diverse varieties of agricultural market participants, constructing price forecasts of different types of agricultural commodities has always been an important task. In the present work, we carry out the forecast exercise by focusing on weekly prices of wholesale yellow corn in the Chinese market from January 1, 2010 to January 10, 2020. For this purpose, we adopt the nonlinear auto-regressive neural network model to tackle this particular forecast problem by taking into consideration different model settings, which include fields of training algorithms, hidden neurons, delays and how the data are segmented. With the analysis, a relatively simple model is constructed which produces performance that is rather accurate and stable. More specifically, the Levenberg–Marquardt algorithm (Levenberg, 1944; Marquardt, 1963) is applied for constructing the model following the ratio of 60%–20%–20% for segmenting the price series into the training phase–validation phase–testing phase. The model is based upon 5 delays and 5 hidden neurons. It leads to relative root mean square errors of 1.05%, 1.08% and 1.03%, respectively, for the training phase, validation phase and testing phase, and the relative root mean square error of 1.05% for the overall data. Forecast results here could be utilized as part of technical analysis and/or combined with other fundamental forecasts as part of policy analysis. The forecast framework utilized here should be rather straightforward, which represents an essential consideration to policy makers and a significant number of market participants (Brandt & Bessler, 1983). Such a forecast framework could be applied to relevant forecast problems across many other commodity price series from different economic segments. For future work, one potential interesting avenue would be making forecasts of commodity prices by utilizing the combination of graph theory and time series models (Bessler & Wang, 2012; Kano et al., 2003; Shimizu, Hoyer, Hyvärinen, Kerminen, & Jordan, 2006, 2011; Shimizu & Kano, 2008; Xu, 2014; Xu & Zhang, 2022). Another worthwhile path would be examining economic significance that stems from price forecasts based upon different machine learning models (Wang & Yang, 2010; Yang et al., 2010, 2008). For example, one study (Colino & Irwin, 2010) found that a root mean square error reduction of 1% would translate to $11,500 for a risk-averse hog producer, who utilizes price forecast information as part of decision-making, in the agricultural sector with production of 10,000 head per year.
References
Competing Interests
Competing interests: The authors did not receive support from any organization for the submitted work. The authors have no relevant financial or nonfinancial interests to disclose.







