For policymakers and participants of financial markets, predictions of trading volumes of financial indices are important issues. This study aims to address such a prediction problem based on the CSI300 nearby futures by using high-frequency data recorded each minute from the launch date of the futures to roughly two years after constituent stocks of the futures all becoming shortable, a time period witnessing significantly increased trading activities.
In order to answer questions as follows, this study adopts the neural network for modeling the irregular trading volume series of the CSI300 nearby futures: are the research able to utilize the lags of the trading volume series to make predictions; if this is the case, how far can the predictions go and how accurate can the predictions be; can this research use predictive information from trading volumes of the CSI300 spot and first distant futures for improving prediction accuracy and what is the corresponding magnitude; how sophisticated is the model; and how robust are its predictions?
The results of this study show that a simple neural network model could be constructed with 10 hidden neurons to robustly predict the trading volume of the CSI300 nearby futures using 1–20 min ahead trading volume data. The model leads to the root mean square error of about 955 contracts. Utilizing additional predictive information from trading volumes of the CSI300 spot and first distant futures could further benefit prediction accuracy and the magnitude of improvements is about 1–2%. This benefit is particularly significant when the trading volume of the CSI300 nearby futures is close to be zero. Another benefit, at the cost of the model becoming slightly more sophisticated with more hidden neurons, is that predictions could be generated through 1–30 min ahead trading volume data.
The results of this study could be used for multiple purposes, including designing financial index trading systems and platforms, monitoring systematic financial risks and building financial index price forecasting.
1. Introduction
For policymakers and participants of financial markets, predictions of trading volumes of financial indices are important issues. This is because such predictions carry significant market implications for financial index prices and their movements (Wang et al., 2013, 2019; Hou and Li, 2014; Sohn and Zhang, 2017; Susheng and Zhen, 2014; Yan and Hongbing, 2018; Ausloos et al., 2020), which are an essential part of various decisioning processes with purposes of generating trading profits and preventing trading losses. Considering the tremendous importance and need to monitor ever-changing financial markets that ultimately determine the safety and soundness of the financial and economic environment of different countries and regions, it is of particular interest to regulators and traders to understand the issue of financial trading in the high-frequency domain (Xu and Zhang, 2023a). To fulfill this mission, the literature has witnessed a great amount of effort in constructing different types of models for making predictions. These modeling techniques have included traditional regression types of (time-series) econometric models and machine learning models.
1.1 Traditional regression and time series models
Chen et al. (2011) have proposed a hierarchical model that has two different components, which combine an intraday approach and a daily approach, to predict the trading volumes of 30 DJIA stocks. Chen et al. (2011) have found that their proposed hierarchical method leads to higher prediction accuracy than any of the two individual approaches. Joseph et al. (2011) have used a simple linear regression model for predictions of abnormal activities of trading for 470 S&P 500 companies. Joseph et al. (2011) have determined that online search activities offer useful predictive information for the prediction horizon of one week. Brownlees et al. (2011) have compared a rolling average method and a multiplicative error model for prediction purposes of different exchange-traded fund volumes. Brownlees et al. (2011) have found that the multiplicative error model results in higher accuracy. Gharehchopogh et al. (2013) have applied a simple linear regression model for predicting the trading volume of the S&P 500 index by using predictive information from the price of the index. Ye et al. (2014) have compared static and dynamic versions of a volume-weighted approach for predicting SSE 50 stocks' intra-daily volumes. Ye et al. (2014) have suggested that the dynamic version leads to better predictions. Satish et al. (2014) have proposed combining an ARIMA model and a rolling average approach for predictions of trading volumes of 30 DJIA stocks. Bordino et al. (2014) have found that predictive information from Yahoo Finance could benefit predictions of trading volumes of NYSE and Nasdaq stocks on both daily and hourly frequency. Nasir et al. (2019) have used a hybrid framework of vector auto-regressive models and non-parametric methods for predictions of trading volumes of Bitcoin on a weekly basis through predictive information from Google searching activities. Nasir et al. (2019) have found that more searching activities are associated with higher trading volumes of Bitcoin. Kao et al. (2020) have combined a vector auto-regressive model with a smoothing approach for the purpose of assessing causality between trading volumes and financial returns.
1.2 Modern methods
Chen et al. (2016) have compared a state-space model based upon Kalman filtering with a rolling average approach and a multiplicative error method for predictions of trading volumes of many stocks from different exchanges. Chen et al. (2016) have determined that the state-space technique leads to higher accuracy for intraday predictions. Ma and Li (2021) have proposed multi-state Kalman filtering for the purpose of generating higher prediction accuracy as compared to two-state filtering for trading volumes of nearly a thousand stocks in the USA.
1.3 Machine learning and deep learning techniques
Kaastra and Boyd (1995) have explored the comparison between a neural network and an ARIMA model for predictions of trading volumes of different agricultural commodities' futures contracts. Kaastra and Boyd (1995) have determined that the neural network generates higher prediction accuracy. Alvim et al. (2010) have examined comparisons among a partial least squares, a support vector machine and a naive no-change model for predicting the trading volumes of nine Bovespa stocks. Alvim et al. (2010) have found that the naive model leads to the lowest prediction accuracy. Oliveira et al. (2017) have evaluated the usefulness of a support vector machine for predicting trading volumes of Dow Jones and S&P 500 on a daily basis by using predictive information from micro-blogging. Lu et al. (2020b) have combined feature extracting functions of a CNN model and predicting functions of an LSTM model for predicting prices of stocks by using predictive information from trading volumes and historical prices. Yan and Yang (2021) have utilized the same predictive information set as that of Lu et al. (2020b) in predicting prices of stocks through encoder/decoder LSTM models. Zhao et al. (2021) have considered different machine learning models that include a support vector machine, a random forest and an LSTM, and a graph-based method for predicting trading volumes' movement patterns by using predictive information from prices of stocks. Zhao et al. (2021) have determined that the graph-based method leads to higher prediction accuracy. Separate from stock markets, Shen et al. (2021) have illustrated that an LSTM could be useful for predicting trading volumes of foreign business in different countries and regions. Zhang (2020) has demonstrated that a Levenberg–Marquardt trained neural network could be effectively utilized for predicting trading volumes of exports and imports.
1.4 Time series decomposition approaches
Lu et al. (2020a) have explored various machine-learning techniques to predict trading volumes and prices of carbon emission rights. Lu et al. (2020a) have paid special attention to the use of ensemble mode decomposition and data smoothing methods. Xie et al. (2020) have investigated the usefulness of decomposition approaches for predictions of trading volumes of electricity. For financial indices, Liu et al. (2022) have shed light on how to decompose trading activities of stocks into short- and long-run components in assessing potential extreme trading information. Chacón et al. (2020) have incorporated ensemble mode decomposition and data smoothing techniques into an LSTM for predictions of prices of stocks. Chacón et al. (2020) have determined that such a framework could benefit from improving prediction accuracy.
Regarding the case of the CSI300, recent work has generally focused on predictions of prices through the use of different time-series techniques (e.g. Wang and Chen, 2013; Xu, 2017, Xu, 2018, 2019b; Zhang and Sun, 2017; Huang et al., 2018; Zhou et al., 2019a) and machine learning approaches (e.g. Sun et al., 2015; Yang and Cheng, 2015; Wang et al., 2016; Lu and Li, 2017; Yao et al., 2018; Ning, 2020; Long et al., 2019; Zhou et al., 2019b). Therefore, our present work targets at filling the research gap of trading volume predictions for the CSI300. Specifically, we address such a prediction problem based upon the CSI300 nearby futures by using high-frequency data recorded each minute from the launch date of the futures to roughly two years after constituent stocks of the futures all becoming shortable, a time period witnessing significantly increased trading activities.
The stock market was established in China in the early 1990s. Since then, it has undergone through dramatic developments with economic growth. However, until March 2005, there was no financial index designed for the purpose of reflecting the overall market status. This situation was resolved on 04/08/2005 when the CSI300 was launched. This financial index includes 300 stocks traded in Shanghai and Shenzhen exchanges and reflects 70% of the total market capitalization. For further financial developments, the futures of the CSI300 index was launched on 04/16/2010. Since then, it has turned out to be the most actively traded financial contract in China. Two pilot programs by the China Securities Regulatory Commission are worth noting. First, on 12/05/2011, the CSI300's underlying stocks available for shortable trading increased from 90 to 260. Second, on 01/31/2013, the CSI300's underlying stocks available for shortable trading increased from 260 to 300. Such programs have contributed to elevated trading volumes of the CSI300 and the nearby futures have attracted the most trading activities (Xu, 2019b). For understanding more institutional backgrounds of the CSI300, one could refer to Yang et al. (2012), Hou and Li (2013), Xu (2017, 2018, 2019b) and Xu and Zhang (2021c, 2022b).
To perform our prediction exercise, we adopt the neural network (denoted as NN) for modeling the irregular trading volume series of the CSI300 nearby futures. The NN has been found in the literature to have great prediction potential for financial and economic applications in terms of time-series data (e.g. Yang et al., 2008, 2010; Wang and Yang, 2010; Cabrera et al., 2011; Zhang and Pan, 2014; Yang and Cheng, 2015; Kong and Zhu, 2018; Xu and Zhang, 2021b, 2022d). We concentrated on answering the research questions as follows: are we able to utilize the lags of the trading volume series to make predictions; if this is the case, how far can the predictions go and how accurate can the predictions be; can we use predictive information from trading volumes of the CSI300 spot and first distant futures for improving prediction accuracy and what is the corresponding magnitude; how sophisticated is the model; and how robust are its predictions? Our results show that we could construct a rather simple neural network model with 10 hidden neurons to robustly predict the trading volume of the CSI300 nearby futures using 1–20 min ahead trading volume data. The model leads to the root mean square error of about 955 contracts. Utilizing additional predictive information from trading volumes of the CSI300 spot and first distant futures could further benefit prediction accuracy and the magnitude of improvements is about 1–2%. This benefit is particularly significant when the trading volume of the CSI300 nearby futures is close to be zero. Another benefit, at the cost of the model becoming slightly more sophisticated with more hidden neurons, is that predictions could be generated through 1–30 min ahead trading volume data.
Our contributions to the literature are as follows. First, to our knowledge, the present work is the first one on predictions of trading volumes of the CSI300 nearby futures. Our results here would fill the research gap in terms of gaining understanding of the problem of trading volume predictions based on an important financial index. These would have implications from a practical standpoint for many economic agents, including policymakers, traders and investors and regulatory agencies. Specifically, the results would benefit from monitoring ever-changing trading activities for ensuring the safety and soundness of financial systems. Second, the current study is the first one that employs a powerful machine learning approach for the prediction purpose of the trading volume of the CSI300 nearby futures. Under the special and unique market structure, including a domestic individual investor having a high barrier to enter trading of the futures, as well as a foreign institutional investor with qualifications and relatively low participation ratios of institutional traders as compared to individual investors (Ng and Wu, 2007; Xu, 2017), we successfully illustrate that the NN could effectively make rather accurate and robust predictions of the trading volume, which shows the great potential of the NN under different market structures. Such results should be of practical use to financial indices traded in different countries that share a similar market structure with the CSI300, probably for a particular time period. Third, our analysis is the first one that adopts the high-frequency data recorded on a minute basis for financial trading volume predictions, although previous studies have investigated the issue for different financial indices using intraday trading data (Xu, 2018). A good understanding of high-frequency trading volume predictions could greatly benefit investors and policymakers in risk management and financial price index predictions as part of market evaluations.
2. Data
Our data are sourced from Wind Information Co., Ltd, which include trading volumes of the CSI300 spot and CSI300 futures [1]. The trading volumes are recorded on a minute basis, and for each trading day, the data span 9:16 a.m.–11:30 a.m. and 1:01 p.m.–3:15 p.m. The time period analyzed here ranges from 04/16/2010 (the date on which the futures was launched) to 11/14/2014. Thus, there are 299,970 observed trading volumes for each series. The data recorded on the 1-min basis not only reflect more trading activities than the data recorded on the 5- or 10-min basis but also maintain sufficient economic significance so that investigations of the one-minute data carry necessary importance to traders and policymakers (Xu, 2018). Visualization of the trading volumes of the CSI300 spot, CSI300 nearby futures and CSI300 first distant futures [2] is provided in Figure 1. We could see from Figure 1 that these three trading volume series show obvious chaotic and noised patterns. Figure 1 also reveals that the nearby futures contract is most actively traded and its trading volumes have been expanding during the time period considered here. Summary statistics of the three trading volumes series are provided in Table 1. We could observe that the trading volumes are leptokurtic and skewed positively.
3. Models
Two different types of nonlinear autoregressive neural network (denoted as ANN) models have been considered in this work. The first model is named a pure ANN or the CSI300 nearby futures own-lag only model. The model is denoted as follows:
where y is employed to reflect the trading volume of the CSI300 nearby futures, t is employed to reflect the time, d is employed to reflect the number of delays used by the model and f is employed to reflect the function form of the model. It is worth noting that the function form f is yet unknown in advance and the model estimated could be denoted as follows:
where k is employed to reflect the number of hidden layers used by the model with the transfer function being ϕ, βij is employed to reflect the parameter that is associated with the connection's weight between the input unit i and the hidden unit j, αj is employed to reflect the connection's weight between the hidden unit j and the output unit, β0j and α0 are employed to reflect the constants that are associated with, respectively, the hidden unit j and the output unit and ɛ is employed to reflect the error item. The second model is an ANN that has exogenous inputs (ANN–X). The model is denoted as follows:
where x is employed to reflect the trading volume of the CSI300 spot alone or x is employed to reflect the trading volumes of the CSI300 spot and CSI300 first distant futures together, and is employed to reflect the parameter that is associated with the connection's weight between the exogenous input unit i and the hidden unit j. The ANN–X model has included more predictive information as compared to the ANN model and thus could explore the potential usefulness of the additional predictive information for improving prediction accuracy (Xu, 2019a, 2020).
We make use of the ANN models based upon a two-layer feed-forward network whose hidden layer adopts a transfer function in the form of a logistic sigmoid function as follows:
and whose output layer adopts a linear function. It should be noted that the output y(t) would be fed back via the delays to inputs of the neural network and the training process would take on the form of open loops in order to achieve efficiency purposes. In open loops, the true output will be utilized instead of the estimated output. To be more specific, adopting the open loop would help ensure inputs to neural networks being more accurate and that resultant neural networks have pure feed-forward architectures.
In terms of numbers of hidden neurons, we have tested 5, 10, 15, 25, 35 and 50. Regarding the delays, we have examined 1, 2, 5, 10, 20 and 30. As a result, 36 testing pairs are considered. For model estimations, we have segmented the trading volume data by using 70% for training, 15% for validation and 15% for testing. For the consideration of robustness analysis, we have also examined the following alternative data segmentation ratios by reserving 15% of the trading volume series for testing: 60% for training and 25% for validation, 65% for training and 20% for validation, 75% for training and 10% for validation and 80% for training and 5% for validation.
One could employ different algorithms for training a machine learning model. For our case, we explored the following three algorithms: the Levenberg–Marquardt (Levenberg, 1944; Marquardt, 1963) algorithm, the scaled conjugate gradient (Møller, 1993) algorithm and the Bayesian regularization (MacKay, 1992; Foresee and Hagan, 1997) algorithm. These three algorithms have been demonstrated by previous studies in terms of their success in achieving relatively good accuracy under various circumstances (e.g. Doan and Liong, 2004; Kayri, 2016; Khan et al., 2019; Selvamuthu et al., 2019; Xu and Zhang, 2021a, d, 2022a, c, 2023b). Baghirli (2015) and Al Bataineh and Kaur (2018) have carried out targeted studies comparing these three algorithms. Table 2 contains the specification of each ANN and ANN–X model setting examined in the present work. Figure 2 visualizes the architecture of the final neural networks constructed in this study.
3.1 Levenberg–Marquardt algorithm
The Levenberg–Marquardt algorithm targets the realization of approximating the second-order training speed. By doing this, the algorithm could avoid computing H – the Hessian matrix and thus accelerate training speed (Paluszek and Thomas, 2020). For this algorithm, using a system with weights denoted as w1 and w2 as an example, the approximation is as follows:
where
for a non-linear function denoted as E(⋅), which includes information on the sum square error and whose
The gradient would be reflected as follows:
where e is employed to denote a vector of errors. In order to make updates to the weights and the biases, the rule as follows is to be adopted:
where w is employed to denote the weight vector, k is employed to denote the index of the iteration during the model training process, I is employed to denote the identity matrix and μ is employed to denote the combination coefficient (noting that μ is always positive). For the case that μ = 0, this algorithm would turn out to be similar to the Newton approach. When μ is large, this algorithm would turn to be gradient descent based on small step sizes. When a successful step is reached, μ would be decreased due to reduced requirements for the fast gradient descent. The Levenberg–Marquardt algorithm inhabits good properties of steepest-descent types of algorithms and Gauss-Newton types of techniques while avoiding some of their limitations. To be more specific, it would efficiently address the concern of slow convergence (Hagan and Menhaj, 1994).
3.2 Scaled conjugate gradient algorithm
Adjustments of weights are conducted along the steepest descent in a backpropagation algorithm due to the need to realize fast decreases of a given performance function in that direction. But this does not always lead to the fastest convergence and model training. A conjugate gradient algorithm would carry out searching along the conjugate direction, which would generally result in quicker convergence as compared to the steepest descent method. A learning rate is often adopted by different algorithms for deciding the length of an updated weight step size. When it comes to the case of a conjugate gradient algorithm, the step size would be modified in the processes of iterations. Therefore, searching is carried out along the conjugate gradient direction with the purpose of determining a step size to realize the reduction of a given performance function. In addition, as line searching by a conjugate gradient algorithm could be time-consuming, the scaled conjugate gradient algorithm would be adopted to improve the training speed. The scaled conjugate gradient algorithm is generally faster as compared to a Levenberg–Marquardt backpropagation-based algorithm.
3.3 Bayesian regularization algorithm
Cross-validation could be lengthy. But it is not required by a Bayesian regularized NN. Essentially, the Bayesian regularization would convert nonlinear types of regressions to statistical issues in the matter of ridge types of regressions and the algorithm would explore the weights' probabilistic nature related to the underlying data under investigation. The chance of overfitting would, however, increase dramatically when additional hidden layers of neurons are used. Thus, in a Bayesian regularization algorithm, unreasonable sophisticated models would be penalized with the corresponding linkage weights pushed to 0. As a result, the NN would concentrate on non-trivial weights. Naturally, some parameters would converge to constant values as the NN grows. A Bayesian regularized NN would generally be more parsimonious as compared to a basic backpropagation network. It would also help reduce the probability of model overfitting due to the underlying data noises.
4. Results
4.1 ANN
We show, in Figure 3, root mean square errors (RMSEs) generated by the ANN models that are trained through the Levenberg–Marquardt algorithm and based upon the trading volume's own lags of the CSI300 nearby futures contract. With the consideration of the need to balance model prediction accuracy and the stabilities of model performance across the three phases of training, validation and testing, we make the selection of the ANN model that uses 10 hidden neurons and 20 delays. We denote this model as ANN-1 and the corresponding performance in terms of RMSEs is 955.67 for training, 957.94 for validation and 955.32 for testing. The summary of the ANN-1 model is included in Table 3. A similar analysis has been conducted for ANN models trained via the other two algorithms (results available upon request). Combining all analysis results, we present, in Figure 4, the prediction performance of the top three candidate models and determine that the ANN-1 model is still the optimal choice for balancing model prediction accuracy and stabilities of model performance. Please note that in Figure 4, RMSEs are 956.17 for training, 975.75 for validation and 922.75 for testing for the ANN model trained via the Levenberg–Marquardt algorithm, which is based upon 15 hidden neurons and 30 delays. And, RMSEs are 951.77 for training and 949.91 for testing for the ANN model trained via the Bayesian regularization algorithm, which consumes much more time than the ANN-1 model. For the remaining of this work, our focus would be the Levenberg–Marquardt algorithm for model training.
We present, in Figure 5, four variations of the ANN-1 model associated with different data segmentation ratios utilized during model building. Specifically, the ANN-1 model uses 70% of the data for training, 15% of the data for validation and 15% of the data for testing. The four variations maintain 15% of the data for testing but consider the following ratios for training and validation purposes: 60% for training and 25% for validation, 65% for training and 20% for validation, 75% for training and 10% for validation and 80% for training and 5% for validation. According to the results shown in Figure 5, one would be able to see that the ratio of 70% for training, 15% for validation and 15% for testing results in the most stable model performance across the three phases. Thus, for the remaining of this work, our focus would be this data segmentation ratio. Through reserving 15% of the trading volume data for the testing purpose, we have also conducted additional analysis by executing the model based upon each of the five different segmentation ratios for training and validation for one hundred times and comparing means of the validation results. It turns out that the ratios of 60% for training and 25% for validation, 65% for training and 20% validation, 70% for training and 15% for validation, 75% for training and 10% for validation and 80% for training and 5% for validation lead to means of validation RMSEs of 958.59, 953.67, 952.82, 970.23 and 998.45, respectively, which support the choice of the ratio of 70% for training, 15% for validation and 15% for testing.
Visualization of ANN-1's predictions is shown in Figure 6 and visualization of ANN-1's prediction errors is shown in Figure 7. In Figure 6, the dark solid 45-degree line associated with “perfect prediction” stands for the situation for which a point on this line indicates no prediction error. Considering that ANN-1's prediction performance is rather stable, Figures 6 and 7 have combined visualization results for the training, validation and testing data because separating the plots for different phases almost does not create different visualization for the three sub-samples. For the remaining of this work, we would follow this practice for ANN–X models as well.
Based upon the analysis of the ANN-1 model, we could observe that the trading volume of the CSI300 nearby futures contract would be able to be predicted with 1–20 min ahead trading volume data, considering that the ANN-1 model is based upon 20 delays and the 1-min data are utilized for model building, through a relatively low complex model that has 10 hidden neurons [3] based on its own lags, leading to rather robust prediction performance with the RMSE being about 955 contracts across the three phases. Prediction errors are concentrated around 0 with a high peak in frequency as visualized in Figure 7. Several trading volumes of the CSI300 nearby futures, including the largest trading volume that is observed, however, show prediction results of 0 as visualized in Figure 6. This problem is somewhat remediated by further including the trading volumes of the CSI300 spot and CSI300 first distant futures in the ANN–X models that we turn to next.
4.2 ANN–X
Similar to the analysis based on the ANN models, Figure 8 reports RMSEs for the ANN–X models when the trading volume of the CSI300 spot is further included as part of model training. We test the trading volume of the CSI300 spot prior to the trading volume of the CSI300 first distant futures due to the closer relation between the CSI300 nearby futures and CSI300 spot as compared to that between the CSI300 nearby futures and CSI300 first distant futures (Xu, 2019b). Balancing model prediction accuracy and model performance stabilities across different phases, we make the selection of the ANN–X model that has 15 hidden neurons and 30 delays, denoted as ANN–X-1. This model results in RMSEs of 949.37 for training, 946.50 for validation and 935.85 for testing. The summary of the ANN–X-1 model is included in Table 3. Predictions from the ANN–X-1 model are reported in Figure 9 and corresponding prediction errors are reported in Figure 10.
We finally include the trading volumes of both the CSI300 spot and CSI300 first distant futures for model training and report RMSEs for ANN–X models in Figure 11. Again, balancing model prediction accuracy and model performance stabilities across the three phases, we make the selection of the ANN–X model that has 35 hidden neurons and 30 delays, denoted as ANN–X-2. It results in RMSEs of 942.44 for training, 934.04 for validation and 940.20 for testing. The summary of the ANN–X-2 model is included in Table 3. Predictions from the ANN–X-2 model are visualized in Figure 12 and corresponding prediction errors are visualized in Figure 13.
In Figure 14, we make comparisons of model performance based on the ANN-1, ANN–X-1 and ANN–X-2 models. It could be observed that including the trading volumes of the CSI300 spot and CSI300 first distant futures could help improve prediction accuracy by a robust modest magnitude of about 1–2%. This, however, significantly helps some near-zero predictions of the trading volumes via the own-lag-only model, i.e. the ANN-1 model, as can be observed by making comparisons of prediction results shown in Figures 6, 9, and 12. To be more specific, one should be able to observe, in Figure 6, that there exist certain amounts of data points that show predicted trading volumes of zero or near zero while their associated observed trading volumes are not zero. These data points could be visually located in Figure 6, whose associated vertical axis (i.e. the predicted trading volume of the CSI300 nearby futures contract) values are zero or near zero while the corresponding horizontal axis (i.e. the observed trading volume of the CSI300 nearby futures contract) values are not zero. When we turn attention to results in Figures 9 and 12, we would observe that such data points have been largely eliminated, particularly, for the results shown in Figure 12. Another benefit is that the prediction of the trading volume of the CSI300 nearby futures would be generated via 1–30 min ahead trading volume data with the incorporation of the additional series, considering that the ANN–X-1 and ANN–X-2 models are based upon 30 delays and the 1-min data are employed for model building, although the complexity of the models would slightly increase as additional hidden neurons would be required.
4.3 Subperiod analysis
To test whether prediction accuracy would be affected by the number of shortable stocks in the CSI300, we run the models, i.e. ANN-1, ANN–X-1 and ANN–X-2, on three subperiods with different numbers of shortable stocks and compare the results. The first subperiod is April 16, 2010–December 4, 2011, during which 90 stocks in the CSI300 are shortable. The second subperiod is December 5, 2011–January 30, 2013, during which 260 stocks are shortable. The third subperiod is January 31, 2013–November 14, 2014, during which all 300 stocks are shortable. The results are presented in Figure 15, together with those for the whole sample from April 16, 2010, to November 14, 2014. We could observe from Figure 15 that the trading volume of the CSI300 nearby futures of the first subperiod is most accurately predicted, followed by the second subperiod and then the third subperiod. This result is intuitive because with more stocks becoming shortable, the trading becomes more volatile and harder to predict. From Figure 15, we still observe that incorporating the spot and first distant futures improves predictions for the three subperiods. Specifically, this can be seen when comparing the result of ANN-1 with those of ANN–X-1 and ANN–X-2 for a given subperiod and subsample.
4.4 Benchmark analysis
We have performed benchmark analysis through comparisons of the ANN-1 model with the linear autoregressive (denoted as AR) model and the linear autoregressive integrated moving average (denoted as ARIMA) model. Considering that the ANN–X-1 and ANN–X-2 models include additional predictive information as compared to the AR and ARIMA models and the performance of ANN–X-1 and ANN–X-2 has been shown to be better than that of the ANN-1 model, we have not benchmarked them against the AR or ARIMA model. In determining the lag of the AR model, the Bayesian information criterion (Schwarz, 1978) has been employed. The structure of the ARIMA model has also been determined through the Bayesian information criterion (Schwarz, 1978). We have applied the modified Diebold-Mariano (Diebold and Mariano, 2002) test (Harvey et al., 1997) for the purpose of making comparisons of model performance. The modified test mitigates some shortcomings of the original test, particularly the potential over-sized issues. The modified test is based upon dt as follows:
where and are employed to denote two error terms at time t that are generated based upon model M1 and model M2, respectively. Here, we would denote AR or ARIMA as model M1 and ANN-1 as model M2. The test statistic for comparing model performance is denoted as MDM as follows:
where T is employed to denote the length of the time period of the testing phase, h is employed to denote the prediction horizon (h = 1 for our application), is employed to denote the sample average of dt,
is employed to denote the variance of dt, and
is employed to denote the kth auto-covariance of dt for k = 1, …, h − 1 and h ≥ 2. Under the null that two models being compared result in equal mean squared errors, the MDM test would follow the t – distribution whose degrees of freedom is T − 1. We report RMSEs stemming from the AR model and the ARIMA model in Table 3. We have found that the p values of the MDM tests are below 0.001. This result suggests that prediction accuracy stemming from the ANN-1 model is statistically significantly better than prediction accuracy stemming from the AR model and the ARIMA model. We have also conducted comparisons of performance based on the superior predictive ability (SPA) test (Hansen, 2005) as a robustness check and found that this test determines that the performance of the ANN-1 model is statistically significantly better than that of the AR model and the ARIMA model as well. We note that using the Akaike information criterion (Akaike, 1974) for determining the lag of the AR model and the structure of the ARIMA model does not affect this conclusion. It should be mentioned here that a certain model that is not performing as well as compared to another model would not necessarily mean that the particular model would not be able to contribute to prediction results. Many previous studies on prediction combinations actually have targeted at constructing different weights for different models' predictions with the purpose of potentially improving prediction accuracy. One interesting research field of prediction combinations is combining linear models and nonlinear models. Previous research, such as Stock and Watson (1998) and Blake and Kapetanios (1999), would have offered good examples in this research area. Hansen et al. (2011) have introduced the concept of the model confidence set (MCS), which is a useful technique to select optimal models with a given level of confidence.
Following the same idea of comparing the ANN-1 model with the AR and ARIMA models, we have also compared the performance of the ANN–X-1 model with that of the AR–X-1 and ARIMA–X-1 models, where X refers to the trading volume of the CSI300 spot, and performance of the ANN–X-2 model with that of the AR–X-2 and ARIMA–X-2 models, where X refers to the trading volumes of the CSI300 spot and first distance futures. The RMSEs based upon the AR–X-1, ARIMA–X-1, AR–X-2 and ARIMA–X-2 models are reported in Table 3. We have found that the p values of the MDM tests are below 0.001 for comparisons between the ANN–X-1 model and the AR–X-1 and ARIMA–X-1 models. This result suggests that the performance of ANN–X-1 is statistically significantly better than that of AR–X-1 and ARIMA–X-1. Similarly, we have found that the p values of the MDM tests are below 0.001 for comparisons between the ANN–X-2 model and the AR–X-2 and ARIMA–X-2 models. This result suggests that the performance of ANN–X-2 is statistically significantly better than that of AR–X-2 and ARIMA–X-2. As a robustness check, we have also conducted comparisons of performance based on the SPA test (Hansen, 2005) and found that it still holds that performance of ANN–X-1 is statistically significantly better than that of AR–X-1 and ARIMA–X-1 and performance of ANN–X-2 is statistically significantly better than that of AR–X-2 and ARIMA–X-2.
5. Conclusion
For policymakers and participants of financial markets, predictions of trading volumes of financial indices are important issues. In this present work, we address such a prediction problem based on the CSI300 nearby futures by using high-frequency data recorded on a minute basis, which has never been explored in previous studies. We adopt the neural network for modeling the irregular trading volume series and have key empirical findings as follows. Our results show that we could construct a rather simple neural network model, trained via the Levenberg–Marquardt (Levenberg, 1944; Marquardt, 1963) algorithm, with 10 hidden neurons to robustly predict the trading volume of the CSI300 nearby futures using one to twenty minutes ahead trading volume data. The model robustly leads to the root mean square error of about 955 contracts across the three phases of training, validation and testing. Utilizing additional predictive information from trading volumes of the CSI300 spot and first distant futures could further benefit prediction accuracy and the magnitude of improvements is about 1–2%. This benefit is particularly significant when the trading volume of the CSI300 nearby futures is close to be zero. Another benefit, at the cost of the model becoming slightly more sophisticated with more hidden neurons, is that predictions could be generated through 1–30 min ahead trading volume data as the corresponding neural networks would use 30 delays and 15 or 35 neurons, which also are trained via the Levenberg–Marquardt (Levenberg, 1944; Marquardt, 1963) algorithm. Our results could be used for multiple purposes, including designing financial index trading systems and platforms in terms of ongoing evaluating system/platform limits for processing trading activities, monitoring systematic financial risks in terms of ongoing detecting possible abnormal trading activities and building financial index price forecasting as suggested in the literature that one might make use of predictive information from the trading volume for helping improve the prediction accuracy of financial index prices. Our results here would be useful to policymakers from different countries for the purpose of designing another financial index or reforming an existing financial index. To be more specific, gaining good understanding of the trends of financial trading volumes would help the planning of a relatively new financial index from a thin market to a liquid and mature market. Although our present work focuses on relatively fundamental neural network models, developments in the machine learning field suggest that there exist more advanced models, such as the convolutional neural network and long short-term memory neural network, which have been seen in the literature for financial predictions. Explorations of more advanced models should be a worthwhile avenue for future studies on predicting financial trading volumes, including that of the CSI300 futures.
Funding: No funds, grants or other support were received.
Human participants or animal participants: This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of interest: The authors have no relevant financial or non-financial interests to disclose.
Notes
It is possible that different platforms could generate slightly different trading volume data for each minute. It is worth noting that trading volumes of the CSI300 spot are always 0 from 9:16 a.m. to 9:29 a.m. on a trading day.
When different futures contracts are being investigated, the contract that has the closest settlement date is named the nearby contract. The first distant contract is the contract which settles right after the nearby contract.
We state “a relatively low complex model” from our empirical judgment that the ANN-1 model with 10 hidden neurons is not so complex for our case.















