Skip to Main Content
Purpose

To overcome the structural limitations arising from subjectively preset adjustable parameters in the NGBM(1,1) model family and significantly enhance its predictive capability for complex nonlinear systems, this study develops a novel self-adaptive nonlinear discrete grey model (SANDGMO(1,1)).

Design/methodology/approach

The proposed SANDGMO(1,1) model is developed by integrating non-homogeneous discrete grey modeling principles into the traditional NGBM(1,1) model to eliminate jumping errors. Parameterization functions are introduced to optimize all adjustable parameters, and a multi-parameter collaborative optimization strategy based on the firefly algorithm is designed to solve for the optimal hyperparameters, thereby overcoming the limitations of subjectively preset adjustable parameters used as fixed values in the existing NGBM(1,1) model family. Finally, the Sobol method analyzes the global sensitivity of hyperparameters to elucidate their mechanism and provides a theoretical foundation for multi-parameter simultaneous optimization.

Findings

The validity of the SANDGMO(1,1) model was verified using two classical fluctuating time series and a real-world case of predicting China's marine transportation industry output value. Results demonstrate its effectiveness in predicting complex nonlinear systems. Projections indicate China's marine transportation output value will reach 958.6436 billion CNY by 2027.

Practical implications

This study reveals the evolutionary trajectory of China's marine transportation industry and provides empirical evidence for policy formulation, facilitating the green, high-quality development of the industry.

Originality/value

This study constructs a novel SANDGMO(1,1) model by integrating the non-homogeneous discrete grey model with the NGBM(1,1) model. The proposed model employs parameterization functions to optimize all adjustable parameters and establishes a unified optimization framework based on the firefly algorithm, resolving the limitations of subjectively preset adjustable parameters used as fixed values in the existing NGBM(1,1) model family. The Sobol method quantifies the interactions among hyperparameters, establishing a theoretical foundation for multi-parameter simultaneous optimization algorithms and addressing limitations in existing research.

Accurate prediction plays a crucial role in economic planning and policy-making, especially in the marine economic industry, which is characterized by complex nonlinear features and scarce data. As a critical component of the marine economy, the marine transportation industry accounts for a substantial share of total marine industry output. According to the China Marine Economic Statistical Bulletin (2024), the gross output value of China's marine transportation industry reached 819.4 billion CNY in 2024, accounting for 18.74% of the total output value of the marine industry. This data visually demonstrates its important status in the marine economy. Influenced by economic policies, international relations, and external uncertainty shocks, the development trend of the marine transportation industry exhibits a complex nonlinear growth pattern. Moreover, due to the inherent complexity and high volatility of the marine economic environment, obtaining large volumes of high-quality data for the marine transportation industry often poses significant challenges (Xu and Li, 2023). Therefore, the marine transportation industry is a typical uncertain system characterized by “small datasets, poor information”. This restricts the predictive performance of forecasting methods with high data requirements, such as ARIMA models and machine learning approaches, as the prediction accuracy of these methods all depends on high-quality datasets (De Gooijer and Hyndman, 2006).

The grey prediction model is an essential part of grey system theory, first proposed by Professor Deng in 1982 (Deng, 1982). It takes “small sample and poor information” uncertain systems as its research objects and provides a promising alternative for predicting the marine economic industry. However, the traditional GM(1,1) model and its nonlinear variants suffer from structural rigidity limitations due to their predefined adjustable parameters (Tang et al., 2025), which hinder their prediction accuracy in forecasting the development trend of the marine transportation industry. Therefore, the purpose of this study is to develop a tailor-designed self-adaptive nonlinear grey prediction model for the marine transportation industry, which accurately elucidates its inherent evolutionary dynamics and provides data-driven decision support for formulating industrial development strategies.

1.2.1 Research on grey prediction models

The grey prediction model, as a core branch of grey system theory, was pioneered by Professor Deng in 1982 (Deng, 1982). Its theoretical innovation lies in breaking through the dependency of traditional statistical methods on large-sample datasets, enabling dynamic prediction of uncertain systems through “small-sample modeling” and “poor-information mining”. The main merit of the grey prediction model is that it can effectively mine the inherent development law of uncertain systems with less data. Over the past several decades, the grey prediction method has been successfully applied in various fields, such as tourism demand forecasting (Sun et al., 2016; Tang et al., 2022), energy demand forecasting (Zeng and Liu, 2017; Zhou et al., 2016), stock market forecasting (Chen et al., 2010; Hsu et al., 2009), traffic flow forecasting (Lu et al., 2016; Xiao et al., 2017), marine economy forecast (Li et al., 2023; Xu and Li, 2023), and so on. The first-order grey model with one variable (GM(1,1)) is one of the most popular grey forecasting models and has been successfully used in many fields over the past decades. However, the performance of the original GM(1,1) model is not stable in some cases, and many improvements on the GM(1,1) model have been published in the literature, which mainly focus on the optimization of the GM(1,1) model adjustable parameters, such as the initial value optimization, background value optimization, accumulated generating operator optimization and grey action quantity optimization, etc. (Cui et al., 2013; Dang et al., 2004; Ma et al., 2019; Wang et al., 2010; Wu et al., 2013). These improvements enhanced the prediction accuracy of the GM(1,1) model to some extent. However, the above models are essentially linear prediction models, which can not effectively deal with complex nonlinear prediction systems. To further improve the capability of the grey prediction model, Chen (2008) and Chen et al. (2008) constructed a novel nonlinear grey Bernoulli model (NGBM(1,1)) by introducing the concept of Bernoulli differential equation into the grey prediction model, empirical results confirmed that the NGBM(1,1) model significantly improved prediction accuracy and expanded applicability compared to conventional grey models. However, a key limitation of the NGBM(1,1) model has emerged with the proliferation of its applications: the structural rigidity caused by subjectively presetting its adjustable parameters to fixed values limits the model's robustness in predicting complex uncertain systems. To address this deficiency, scholars have introduced an adaptive grey prediction modeling technique that optimizes these adjustable parameters through parameterized functions, thereby spawning numerous enhanced variants. Table 1 systematically summarizes representative achievements in this field.

Table 1

Summary of the literature on adjustable parameter optimization for NGBM(1,1) model

Author (year)ModelContributionsMethod
Chen et al. (2008) NGBM(1,1)Power exponentA simple computer program
Chen (2008) NGBM(1,1)Power exponentA simple computer program
Zhou et al. (2009) Optimal p NGBM(1,1)Power exponent, Background valueParticle swarm optimization algorithm
Chen et al. (2010) NNGBM(1,1)Power exponent, Background valueNash equilibrium theory and simple computer iterative program
Hsu (2010) GANGBM(1,1)Power exponentGenetic algorithm
Wang et al. (2011) ONGBM(1,1)Power exponent, Background valueLINGO
Pao et al. (2012) NGBM-OPPower exponentNumerical iterative method
Wang (2013) Optimized Nash NGBM(1,1)Power exponent, Background value, Initial conditionLINGO
Zhang et al. (2014) PSO–NNGBM(1,1)Power exponent, Background valueParticle swarm optimization algorithm
An et al. (2015) Power exponent, Background value, Initial conditionParticle swarm optimization algorithm
Hsin and Chen (2015) Two-stage NNGBMPower exponent, Background valueNash equilibrium theory and Minmax principle
Li et al. (2016) Power exponent, Background valueParticle swarm optimization algorithm
Lu et al. (2016) ONGBM(1,1)Power exponent, Background value, Initial conditionOptimization approach
Chen et al. (2017) Adaptive NGBM(1,1)Power exponent, Background value, Initial conditionGenetic algorithm
Guo et al. (2019) SA-NGBMPower exponent, Background valueLINGO
Wu et al. (2019) FANGBM(1,1)Power exponent, Accumulated generation operatorParticle swarm optimization algorithm
Liu and Xie (2019) WBGM(1,1)Power exponentGenetic algorithm
Xiao et al. (2020) BC-NGBM*Power exponentQuantum adiabatic evolution
Zheng et al. (2021a, b) Unbiased NGBM(1,1)Power exponentParticle swarm optimization algorithm
Wu et al. (2022) FDNGBM(1,1)Power exponent, Accumulated generation operatorWhale optimization algorithm
Wu et al. (2024) CFNGBMO(1,1)Power exponent, Accumulated generation operatorWhale optimization algorithm
Xu et al. (2025) TFFGBM(α,1,r,γ)Power exponent, Accumulated generation operator, grey action quantityParticle swarm optimization algorithm
Source(s): Compiled by authors

As shown in Table 1, studies on NGBM(1,1) model optimization in the literature mainly focus on the optimization of adjustable parameters, and early research on the optimization of the NGBM(1,1) model was mainly concentrated on the solution of the power exponent, such as Chen (2008)and other researchers used a simple computer program to estimate the power exponent of the NGBM(1,1) model (Pao et al., 2012). However, these methods are sometimes not effective, with the rapid development of artificial intelligence, more and more intelligent algorithms have been used to solve the power exponent of the NGBM(1,1) model, such as the particle swarm optimization algorithm (Wu et al., 2019, 2020; Zheng et al., 2021a; Zhou et al., 2009), genetic algorithm (Hsu, 2010; Liu and Xie, 2019; Şahin, 2021), flower pollination algorithm (Xiao et al., 2020), etc. Kong and Ma (2018) systematically studied the performance of four mainstream intelligent optimizers (genetic algorithm, particle swarm optimizer, grey wolf optimizer, and ant lion optimizer) in solving the optimal power exponent of the NGBM(1,1) model under different circumstances, and the results showed that no one algorithm is universally superior. The above research achievements have greatly improved the accuracy of power exponent estimation.

To further improve the prediction accuracy of the NGBM(1,1) model, some scholars have proposed the idea of simultaneous optimization of multiple adjustable parameters of the NGBM(1,1) model, such as power exponent and background value (Chen et al., 2010; Hsin and Chen, 2015; Li et al., 2016; Wang et al., 2011; Zhou et al., 2009), power exponent, background value, and initial condition (Lu et al., 2016), power exponent and accumulated generating operator (Wu et al., 2019, 2022), etc. All these studies significantly improved the prediction performance of the NGBM(1,1) model. Therefore, each adjustable parameter has an important impact on the prediction accuracy of the NGBM(1,1) model.

Although the traditional NGBM(1,1) model demonstrates superior nonlinear prediction capabilities, it still fails to fundamentally address the issue of jumping errors between parameter estimation and actual prediction inherent in conventional grey forecasting models, thereby compromising the robustness of the NGBM(1,1) model. To address this limitation, scholars have made improvements from various perspectives, such as directly deriving structural parameters from differential equations (Zheng et al., 2021b), integrating discrete modeling concepts into the traditional NGBM(1,1) model (Wu et al., 2022), and optimizing model linear parameters through mathematical identical equations (Wu et al., 2024). These advancements effectively resolve the jumping error issue in existing NGBM(1,1) models, significantly enhancing prediction accuracy and stability.

In recent years, addressing the structural inflexibility of existing NGBM(1,1) models in forecasting specific complex dynamic systems, the academic community has proposed a spectrum of domain-adapted enhanced models based on a comprehensive analysis of domain-specific evolutionary characteristics. For instance, Ma et al. (2025) proposed a novel time-delayed fractional-order grey Bernoulli model with independent fractional-order parameters for forecasting fossil fuel demand, demonstrating superior predictive performance. Similarly, Yan et al. (2025) introduced a seasonal factor to develop a new structural adaptive seasonal grey Bernoulli model for natural gas demand forecasting, with experimental results showing significant outperformance over competing methods in predicting natural gas production.

In summary, while the aforementioned studies have effectively enhanced the predictive performance of the NGBM(1,1) model, they predominantly focus on partial parameter optimization rather than establishing a unified optimization framework encompassing all adjustable parameters. Therefore, they have not fundamentally solved the limitations of subjectively preset adjustable parameters. Furthermore, although numerous scholars have proposed multi-parameter collaborative optimization algorithms and employed bio-inspired metaheuristics to solve for the hyperparameters of the NGBM(1,1) model, there is insufficient attention to the mechanism of the hyperparameters of the NGBM(1,1) model.

1.2.2 Research on marine industries forecast

Marine industries constitute the fundamental units of the marine economy. Accurate forecasting of industrial dynamics is crucial for formulating scientific development plans and optimizing resource allocation, thereby attracting significant academic attention. Existing forecasting methodologies for marine industries primarily fall into three categories: time-series forecasting, artificial intelligence-based forecasting, and grey forecasting.

Time series forecasting methods are widely adopted in marine industry forecasting due to their solid mathematical foundation and ease of operation. For example, Benavides et al. (2022) used the SARIMA model to predict the landings of six marine fish species in the Colombian Pacific (CPO) from 2020 to 2024, achieving ideal prediction accuracy. Similarly, Batchelor et al. (2007) evaluated the performance of popular time series models in forecasting spot and forward freight rates on major shipping routes. The results revealed that the VECM achieved the best in-sample fit; however, for forward freight rate predictions, ARIMA or VAR models produced superior forecasts. Nevertheless, the limitations of time series forecasting methods are also pronounced, as their predictive accuracy heavily relies on the statistical asymptotic validity under large-sample conditions and requires strict adherence of the modeling data to the distributional assumptions predefined by the model.

In recent years, artificial intelligence prediction methods have gradually been applied in the field of marine industry prediction due to their exceptional nonlinear fitting capabilities. For example, Shi (2019) used a backpropagation neural network to construct a prediction index system for the marine regional economy and predicted the development trend of the regional marine economy, and the empirical results show that the model has high fit and prediction accuracy. Hirata and Matsuda (2022) compared the predictive performance of the LSTM model and the SARIMA model in forecasting the composite and route-based Shanghai containerized freight index (SCFI). The research findings indicate that the LSTM model outperforms the SARIMA model across most datasets. However, the limitations of artificial intelligence prediction methods are also obvious, as their forecasting accuracy highly depends on large-scale datasets, and these models inherently possess a black-box nature.

In summary, both time series forecasting methods and artificial intelligence prediction methods share a fundamental limitation: their predictive accuracy is critically contingent upon high-quality modeling data. It is well-established that China's marine economic system represents a typical uncertain system characterized by “small sample, poor information” (Yin et al., 2022), which fundamentally constrains the predictive efficacy of these models in this domain.

In contrast, grey prediction models specialize in “small samples, poor information” uncertainty systems, offering a robust solution for marine industry forecasting. Recent applications demonstrate their efficacy: For example, Li et al. (2023) utilized the improved IMGM(1,m) model to predict the output values of emerging marine industries, including the marine power industry, marine biopharmaceutical industry, and marine chemical industry, achieving satisfactory prediction accuracy. Xu and Li (2023) proposed an enhanced AWBO-MGM(1,m) model to systematically analyze and forecast the internal interaction mechanism among marine scientific research and education, industrial structure upgrading, and marine economic growth in China, demonstrating that these factors mutually promote each other and validating the model's superior predictive accuracy under external shocks like COVID-19 through comparative analysis with traditional grey and machine learning models. Zeng et al. (2024) proposed a novel multivariable discrete grey prediction model (WFTDGM) that integrates the weakening buffer operator (WBO) and the grey wolf optimization algorithm (GWO) to forecast global shipping carbon emissions. The empirical results indicate that the proposed model significantly outperforms competitive models such as LSTM, BPNN, and ARIMA in terms of prediction performance.

Although the aforementioned grey prediction models have demonstrated satisfactory performance in forecasting the marine economy, they still fail to fundamentally overcome the limitation of subjectively presetting some adjustable parameters to fixed values, which restricts their practical applicability in marine economic forecasting.

To address the aforementioned limitations of grey prediction models, we propose a novel self-adaptive SANDGM(1,1) model by combining the non-homogeneous discrete grey model with the NGBM(1,1) model, termed as the SANDGMO(1,1) model. The main novelties and innovations of this paper can be summarized as follows:

  1. A parametric function is introduced to optimize the adjustable parameters of the SANDGMO(1,1) model, and on this basis, a multi-parameter collaborative optimization algorithm based on the firefly algorithm is designed. This approach dynamically determines the optimal combination of hyperparameters through a data-driven algorithm, significantly reducing the empirical reliance on preset parameters in the traditional NGBM(1,1) model family and enhancing the adaptability of grey prediction models in dealing with complex marine economic sequences.

  2. The concept of the non-homogeneous discrete grey model is introduced into the NGBM(1,1) model to eliminate the inherent discretization error of the NGBM(1,1) model.

  3. The Sobol method is applied to investigate the global sensitivity of adjustable parameters of the SANDGMO(1,1) model, enhancing the understanding of the mechanism of adjustable parameters and providing a theoretical basis for the multi-parameter simultaneous optimization algorithm proposed in this paper.

  4. The new model is used to predict the trend of the gross output value of the marine transportation industry, providing data-based support for decision-making by governments, enterprises, investors, and relevant research institutions.

The remainder of this paper is arranged as follows: Section 2 introduces the preliminaries. In Section 3, a novel self-adaptive SANDGMO(1,1) model is proposed. The effectiveness of the proposed method is verified by two classic benchmark sequences in Section 4. The real case for predicting gross output value of the marine transportation industry is presented in Section 5. Section 6 concludes this paper.

The modeling steps of the NGBM(1,1) model are as follows:

  • Step 1: Let X(0) be a non-negative raw data sequence with m(m4) elements.

(1)
  • Step 2: Construct a new data sequence X(1) by implementing the first-order accumulated generating operation (1-AGO) on the raw data sequence X(0),

(2)

where

(3)

The discrete form of Eq.(3) is given by

(4)

where z(1)(k) is called the background value, whose calculation formula is

(5)
  • Step 4: The structural parameters a and b can be estimated by the least square method:

(6)

where

  • Step 5: Assuming x^(1)(1)=x(0)(1), the special solution of the Eq.(3) can be obtained

(7)
  • Step 6: The predicted values can be obtained by performing a first-order inverse accumulated generating operation (1-IAGO) on X^(1),

(8)

where

(9)

Non-homogeneous discrete grey model (NDGM) is one of the most popular discrete grey prediction models, first proposed by Xie et al. (2013), and has attracted wide attention due to its excellent performance in solving the prediction modeling problem of non-homogeneous index sequence. The modeling process of the non-homogeneous discrete grey model is as follows:

Let X(0)=(x(0)(1),x(0)(2),,x(0)(m)) be an original sequence, X(1)=(x(1)(1),x(1)(2),,x(1)(m)) is the first-order accumulated generation sequence of X(0), where x(1)(k)=ikx(0)(i),k=1,2,,m.

The equation

(10)

is called the non-homogeneous discrete grey model (NDGM). The structural parameter vector [β^1,β^2,β^3]T can be calculated by the least square method, which is

(11)

where

Assume the initial condition x^(1)(1)=x(0)(1), the time response function of NDGM(1,1) model is given by

(12)

The restored values X^(1) can be obtained by the first-order inverse accumulated generating operation (1-IAGO).

(13)

Sensitivity analysis helps identify the key parameters of the model and provides a basis for model optimization, parameter calibration, and uncertainty analysis. Generally, sensitivity analysis can be broadly divided into local sensitivity analysis and global sensitivity analysis.

Local sensitivity analysis refers to the influence of a single parameter change on model output while other parameters remain unchanged, which is usually calculated by the partial derivative. The main drawback of this approach is that the interaction between parameters can not be calculated. On the contrary, global sensitivity analysis can effectively overcome this drawback of local sensitivity analysis, so it has received extensive attention. Among various global sensitivity analysis techniques in published literature, the Sobol method has been widely used over the past few years (Cheng et al., 2020).

The Sobol method is a global sensitivity analysis algorithm, which was first proposed by Sobol (1993). The core idea of Sobol indices is to decompose the total variance of the objective function into the sum of the variance of a single parameter and the variance of the interaction between multiple parameters. Over the past few years, the Sobol method has been successfully used in various fields due to its easy implementation and high efficiency (Nariman et al., 2019; Nossent et al., 2011; Zhang et al., 2014).

Suppose Y=f(x1,x2,,xm) is an objective function, wherexk(k=1,2,,m) is the input variable of the objective function, and Y is a scalar output. The total variance of Y=f(x1,x2,,xm) can be decomposed as follows:

(14)

where V(y) is the total variance of the objective function Y=f(x1,x2,,xm), Vi represents the partial variance contribution of the objective function input variable xi, Vij represents the interaction variance between objective function variables xi and xj, V1,2,,m represents the variance of the interaction between multiple input variables xi(i=1,2,m) of the objective function, and the Sobol sensitivity indices can be calculated by the following formula:

The first-order sensitivity index is defined as

(15)

The second-order sensitivity index is defined as

(16)

The total sensitivity index of the parameter xi is defined as

(17)
Definition 1.

Set X(0)= (x(0)(1),x(0)(2),,x(0)(m)) is a non-negative time series, X(1) =  (x(1)(1),x(1)(2),,x(1)(m)) is the first-order accumulated generation series of X(0), Y(1)=(y(1)(1),y(1)(2),,y(1)(m)) is the(1γ) power generation series of X(1), where

Definition 2.

Assume Y(1) is as shown in Definition 1, the equation

(18)

is called the self-adaptive nonlinear discrete grey model(SANDGM(1,1)). The structure parameter vector β^=[β^1,β^2,β^3]T can be calculated by using the least squares method.

(19)

where

Theorem 1.

Assume X(0),X(1) and Y(1) are as shown in Definition 1, β^=[β^1,β^2,β^3]T=(BTB)1BTY, then the following holds.

  1. Set x^(1)(1)=x(0)(1), the solution of Eq.(18) is

(20)
  1. The restored values of X^(0) is given by

(21)

Proof

  1. Set x^(1)(1)=x(0)(1), the solution of the SANDGM(1,1) model can get.

  1. x^(0)(k+1)=i=1k+1x^(0)(i)i=1kx^(0)(i)=x^(1)(k+1)x^(1)(k),k=1,2,,m1.

This completes the proof.

To sum up, it can be seen that the structural parameter estimator and the time response equation of the SANDGM(1,1) model are based on its discrete equation (Eq.(18)), effectively solving the jumping error between parameter estimation and actual prediction in the traditional NGBM(1,1) model. Therefore, the SANDGM(1,1) model is an unbiased grey prediction model.

Theorem 2.

The SANDGM(1,1) model is suitable for modeling monotonic series and fluctuating series with no more than one extreme point.

Proof.

According to Eq.(20) and Eq.(21), the predicted values of the SANDGM(1,1) model for the original series are calculated as

(22)

By applying Lagrange mean value theorem on Eq.(22), there is

where ε(t1,t),t2

Hence, Eq.(22) can be further expressed as

(23)

where ε(k1,k),k=2,3,,n. In Eq.(23), ε increases with k increase.

For a concise description, please note

Then, the expression for x^(0)(k) can be simplified as

To discuss the number of extreme points for x^(0)(k), let

(24)
  1. if γ=0, Eq.(24) degenerates to be

Obviously, x^(0)(t) is monotonic, so x^(0)(k) is only a monotonic sequence.

  1. if γ0, let the derivative of Eq.(24) be equal to zero, then

(25)

Obviously, there is one root for Eq.(25) at most, so x^(0)(t) is a function of no more than one extreme point, therefore, x^(0)(k) is a sequence of no more than one extreme point.

This completes the proof.

Set y(1)(t)=(x(1)(t))1γ, the NGBM(1,1) model can be converted to a linear model

(26)

By integrating Eq.(26) and using the two-point trapezoidal formula in the interval [k,k+1], one can obtain the discrete form of the NGBM(1,1) model.

(27)

When β2=0, the SANDGM(1,1) model degenerates to y(1)(k+1)=β1y(1)(k)+β3, and its time response equation is given below

(28)

The form of Eq.(27) is identical to Eq.(28), so it is assumed that there is a relationship.

Then

The time response equation of the NGBM(1,1) model (Eq.(7)) can be rewritten as

(29)

Expanding ea^(1γ) and 10.5a(1γ)1+0.5a(1γ) using the McLaughlin formula, one can get

From the McLaughlin expansion, it can be concluded that when |a(1γ)| is very small, the influence of higher-order terms is negligible, and if only the first four terms are selected,

Then

Therefore, when |a(1γ)| is very small, then β1ea^(1γ), substituting β1 for ea^(1γ), then Eq.(29) becomes

(30)

The form of Eq.(30) is identical to Eq.(28). Therefore, when β2=0, the proposed SANDGM(1,1) model and the NGBM(1,1) model can be considered as different expressions of the same model. In particular, when |a(1γ)| is very small, the two models can be substituted for each other.

This section proposes a novel self-adaptive SANDGM(1,1) model, termed as SANDGMO(1,1) model. The flowchart of the proposed SANDGMO(1,1) model is illustrated in Figure 1.

Figure 1
A flow chart illustrates a three-stage process.The three stages enclosed in a box from left to right are “Stage 1 (Parameter Input),” “Stage 2 (Training Process),” and “Stage 3 (Result Output).” Each stage has a downward arrow connected to a dashed box showing the steps within that stage. Stage 1: Parameter Input: Three input parameters (gamma, xi, and beta subscript 4 t) are fed into the “S A N D G M O(1,1)” model in Stage 2. Stage 2: Training Process (Main Loop): The “S A N D G M O(1,1)” model output proceeds to “Sensitivity analysis.” “Observed data” is also input into the “Sensitivity analysis.” The analysis output is fed into “Training Parameters with firefly algorithm.” A decision step follows: “Is the maximum iteration reached?” (represented by a yellow diamond). If “NO,” the process loops back to the “Training Parameters with firefly algorithm.” If “YES,” the process moves to Stage 3. Stage 3: Result Output: The finalized training parameters are used to “Make predictions.”

The flow chart of the proposed SANDGMO(1,1) optimization method. Source(s): Authors' own elaboration

Figure 1
A flow chart illustrates a three-stage process.The three stages enclosed in a box from left to right are “Stage 1 (Parameter Input),” “Stage 2 (Training Process),” and “Stage 3 (Result Output).” Each stage has a downward arrow connected to a dashed box showing the steps within that stage. Stage 1: Parameter Input: Three input parameters (gamma, xi, and beta subscript 4 t) are fed into the “S A N D G M O(1,1)” model in Stage 2. Stage 2: Training Process (Main Loop): The “S A N D G M O(1,1)” model output proceeds to “Sensitivity analysis.” “Observed data” is also input into the “Sensitivity analysis.” The analysis output is fed into “Training Parameters with firefly algorithm.” A decision step follows: “Is the maximum iteration reached?” (represented by a yellow diamond). If “NO,” the process loops back to the “Training Parameters with firefly algorithm.” If “YES,” the process moves to Stage 3. Stage 3: Result Output: The finalized training parameters are used to “Make predictions.”

The flow chart of the proposed SANDGMO(1,1) optimization method. Source(s): Authors' own elaboration

Close modal

3.3.1 Optimization scheme of the SANDGM(1,1) model adjustable parameters

This section first investigates the mechanism of adjustable parameters of the SANDGM(1,1) model and then proposes their optimization scheme. The details are as follows.

  1. Power exponent

It can be seen from Eq.(20) that the capability of the SANDGM(1,1) model to deal with complex nonlinear time series mainly depends on the value of its power exponent. But in reality, how to choose the appropriate power exponent is a challenging task, which has attracted wide attention. During the past decade, many efforts have been devoted to the optimization of the power exponent of the NGBM(1,1) model, such as Chen et al. (2008) proposed a trial-and-error method to solve the power exponent, and Pao et al. (2012) proposed a numerical iterative method to determine the power exponent. However, these methods are sometimes not very effective. With the rapid development of artificial intelligence, more and more intelligent optimization algorithms have been used to solve the power exponent, such as genetic algorithm (Chen et al., 2017; Liu and Xie, 2019), particle swarm optimization (Wu et al., 2019; Zheng et al., 2021b), flower pollination algorithm (Xiao et al., 2020), and whale optimization algorithm (Wu et al., 2022), which greatly improve the accuracy and efficiency of the NGBM(1,1) model parameter estimation. In this paper, the firefly algorithm, as an emerging bionic swarm intelligence optimization algorithm, is used to estimate the power exponent γ.

  1. Initial condition

It can be seen from Eq.(20) that the time response equation of the SANDGM(1,1) model is based on the assumption that the initial condition is equal to x(0)(1), according to the principle of least squares, it is known that the best-fit curve of the SANDGM(1,1) model does not necessarily pass through the point (1, x(0)(1)). Therefore, the modeling assumption of the initial condition in the SANDGM(1,1) model has no theoretical basis, which affects the prediction accuracy of the SANDGM(1,1) model. Based on this, the boundary value correction method is introduced to further optimize the initial condition of the SANDGM(1,1) model. Hence, the initial condition would be x(0)(1)+β4, where parameter β4 is a correction item, and the optimized time response equation of the SANDGM(1,1) model is given below.

(31)
  1. Accumulated generating operator

Accumulated generating operation is an important feature that distinguishes grey system theory from other theories. Through accumulated generating operations, grey system theory can effectively reveal the hidden integral property in random data and improve the accuracy of grey modeling. The traditional NGBM(1,1) model is based on the assumption that the order of the accumulated generating operator is equal to one, which violates the new information priority principle of grey system theory. To further improve the prediction accuracy of the NGBM(1,1) model, Wu et al. (2019) first proposed a novel NGBM(1,1) model with fractional order accumulation (FANGBM(1,1)). Xie et al. (2021) propose a novel CCFNGBM(1,1) model by incorporating conformable fractional accumulation into the traditional NGBM(1,1) model. Zheng et al. (2021a, b) proposed a novel conformable fractional non-homogeneous grey Bernoulli model. Şahin (2021) designed an optimized fractional nonlinear grey Bernoulli model to forecast renewable energy consumption. The aforementioned research achievements effectively prove that fractional accumulation is the key technique to improve the prediction accuracy of the NGBM(1,1) model. For this, a novel SANDGM(1,1) model with fractional-order accumulation was proposed in this paper, where the order of the fractional-order accumulation generation operator is an unknown parameter that needs to be determined according to the characteristics of the modeling data.

3.3.2 The modeling framework of the SANDGMO(1,1) model

According to the aforementioned SANDGM(1,1) model adjustable parameters optimization idea, a novel SANDGM(1,1) optimization model, termed as SANDGMO(1,1) model, can be constructed, and the modeling framework is as follows:

Definition 3.

Suppose X(0)=(x(0)(1),x(0)(2), ,x(0)(m)),m4 is an original sequence, ξ is the order of the accumulated generating operator, and X(ξ)=(x(ξ)(1),x(ξ)(2),,x(ξ)(m)) can be obtained by performing ξ order accumulated generating operation(ξ-AGO) on X(0).where x(ξ)(k)=i=1k(ki+ξ1ki)x(0)(i), k=1,2,,m.

Definition 4.

Suppose Y(ξ)=(y(ξ)(1),y(ξ)(2),,y(ξ)(m)) is the 1γ power generation series of X(ξ) where y(ξ)=(x(ξ)(1))(1γ),γ1,k=1,2,m. The Eq.(23)

(23)

is called SANDGMO(1,1) model.

The structural parameters β1, β2 andβ3 can be solved by the least-squares estimate method, whose calculation formula is as follows:

(24)

where

Assume the initial condition x^(ξ)(1)=x(0)(1)+β4, the time response function of the SANDGMO(1,1) model is given below.

(25)

The restored values of X^(0) can be obtained by performing the ξ order inverse accumulated generating operation (ξ −IAGO) on X^(ξ).

(26)

where

3.3.3 Compatibility analysis of the SANDGMO(1,1) model

This section focuses on the compatibility of the SANDGMO(1,1) model. According to the above modeling analysis of the SANDGMO(1,1) model, it can be concluded that when the adjustable parameters in the SANDGMO(1,1) model take different values, the model will degenerate to some existing grey prediction models.

  1. When γ=0,ξ=1,β2=0,β4=0, the proposed SANDGMO(1,1) model degenerates to the DGM(1,1) model, which was proposed by Xie and Liu (2009).

  2. When γ=0,ξ=1,β4=0, the proposed SANDGMO(1,1) model degenerates to the traditional NDGM(1,1) model, which was proposed by Xie et al. (2013).

  3. When γ=0,β4=0, the proposed SANDGMO(1,1) model degenerates to NDGMr(1,1) model, which was proposed by Wu et al. (2014).

  4. When γ=0, the proposed SANDGMO(1,1) model degenerates to FANDGMr(1,1) model, which was proposed by Tang et al. (2022).

  5. When β2=0,β4=0, the proposed SANDGMO(1,1) model degenerates to FDNGBM(1,1) model, which was proposed by Wu et al. (2022).

  6. when ξ=1,β4=0, the proposed SANDGMO(1,1) model degenerates to the SANDGM(1,1) model proposed in Section 3.1 of this paper.

Based on the above analysis, it can be seen that the proposed SANDGMO(1,1) model is a general form of the DGM(1,1) model, NDGM(1,1) model, NDGMr(1,1) model, FANDGMr(1,1) model, FDNGBM(1,1) model and SANDGM(1,1) model. To sum up, the SANDGMO(1,1) model can be regarded as a general form of many existing grey models. Therefore, compared to traditional grey prediction models, the SANDGMO(1,1) model has a more flexible structure and stronger adaptability.

3.3.4 Structure selection for SANDGMO(1,1) model

Drawing on ideas from machine learning, this study constructs a new multi-parameter simultaneous optimization algorithm using the three emerging parameters of the SANDGMO(1,1) model as input variables. The training mechanism of emerging parameters of the SANDGMO(1,1) model is shown in Eq.(36).

It can be seen that the optimization mechanism of emerging parameters of the SANDGMO(1,1) model is essentially a nonlinear optimization model, which is difficult to solve by the ordinary optimization method. To address this issue, the firefly algorithm is introduced to seek the optimal emerging parameters for the SANDGMO(1,1) model in the current paper.

(36)

The firely algorithm is an emerging bionic swarm intelligence optimization algorithm, which was first proposed by Yang (2010). The construction of the firefly algorithm is based on the following three assumptions.

  1. All fireflies are gender-neutral, so the attraction between fireflies depends on the brightness of the individual, regardless of their gender.

  2. For any two fireflies, the less bright fireflies will move toward the brighter fireflies, and the brightest fireflies in the population will move randomly.

  3. The brightness of a firefly is related to the objective function to be optimized.

In the current paper, the objective function is constructed by minimizing the mean absolute percentage error(MAPE). The detailed calculation procedure for seeking the optimal emerging parameters γ,ξ, and β4 using the firefly algorithm is shown in Algorithm 1.

Algorithm 1.

Solution to optimal emerging parameters γ,ξ, and β4.

  • Input: Original observations.

  • Output: emerging parameters γ,ξ, and β4.

  • 1: Initialize parameters in the firefly algorithm: population size N, maximum generation T, maximum attractiveness β0, step factor α, light absorption coefficient γ.

  • 2:  while t < T do

  • 3:   For i = 1 to N do

  • 4:    For j = 1 to N do

  • 5:     If Ij>Ii then

  • 6:     Update the position of firefly, xi(t+1)=xi(t)+β0eγrij2(xj(t)xi(t))+α(rand12) where rij is the distance between the ith firefly and the jth firefly, rand is a random number between 0 and 1,

  • 7:     End if

  • 8:     Evaluate new solutions and update the light intensity

  • 9:    End for

  • 10:    End for

  • 11:   Rank fireflies and find the current best firefly

  • 12:   t=t+1

  • 13:  End while

  • 14: Obtain the optimal emerging parameters γ,ξ and β4.

In this section, the prediction accuracy of the proposed SANDGMO(1,1) model is validated by two classical fluctuating data sequences from published literature, and the results are compared with the existing mainstream nonlinear grey prediction models. For a fair comparison, the predicted results of competing models are all from published literature. The parameters in the proposed SANDGMO(1,1) model are solved by the firefly algorithm, and the relevant parameter settings of the firefly algorithm are shown in Table 2.

Table 2

Parameters setting for the firefly algorithm

ParameterValue
β01
α0.01
γ1
Population Size (Max)50
Number of Iterations (Max)50
Source(s): The parameters β0, α, and γ are set with reference to Yang (2010), while the population size and number of iterations were determined by the problem scale in this study

To better verify the efficiency of the firefly algorithm in solving the adjustable parameters of the SANDGMO(1,1) model, classical intelligent optimization algorithms such as genetic algorithm (Goldberg, 1989) and particle swarm algorithm (Kennedy and Eberhart, 1995) were employed as alternative approaches for adjustable optimization of the SANDGMO(1,1) model, and the genetic algorithm parameter settings at population size = 50, Number of iterations = 50, mutation rate = 0.1, and crossover rate = 0.5. In the particle swarm algorithm, population size = 50, Number of iterations = 50, learning factors c1 = c2 = 1.5, inertia factor ω varying between [0.4, 0.8].

To evaluate the prediction accuracy of the proposed model, three commonly used model evaluation indicators, namely, mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE) are employed to evaluate the prediction accuracy of the proposed SANDGMO(1,1) model.

MAPE is one of the significant forecasting error measures and is widely used to evaluate the accuracy of forecasting models. MAE is used to measure how close the predictions are to the actual values, which can effectively reflect the overall reliability of the prediction model. RMSE is a quantity used to measure the standard deviation of the difference between predicted and actual values, which is sensitive to the response of extreme values. Therefore, the RMSE indicator is a useful complement to the MAPE indicator in the evaluation of model errors. These metrics are mathematically defined as follows:

(37)
(38)
(39)

where m is the number of observed samples, x^(0)(k) and x(0)(k) represent the simulated value and actual value at time k, respectively. From the definition of the above measures, it can be seen that the smaller the value of these measures, the higher the prediction accuracy of the model is.

Case 1. The first example considers the fluctuating data sequence x(0)=(1,2,1.5,3), which has been widely used in literature (Chen et al., 2010, 2017; Wang, 2013; Wu et al., 2019). Sobol sensitivity indices are applied to investigate the global sensitivity of the SANDGMO(1,1) model adjustable parameters, and the calculation results are presented in Figure 2. The results indicate that the initial value correction factor β4 has the highest first-order sensitivity index and total order sensitivity index, followed by the accumulated generating operator order ξ and power exponent γ. In addition, the total-order sensitivity indices of the SANDGMO(1,1) model adjustable parameters are significantly higher than their first-order sensitivity indices, indicating that there is a significant interaction between the parameters of the SANDGMO(1,1) model. Therefore, a multi-parameter simultaneous optimization scheme is key to improving the prediction accuracy of the SANDGMO(1,1) model. Based on this, the firefly algorithm is used to solve all adjustable parameters of the SANDGMO(1,1) model simultaneously, and the calculation results are shown in Table 3.

Figure 2
A grouped bar chart Sobol sensitivity indices of “S A N D G M O(1,1)” model parameters in case 1.The vertical axis is labeled “Total (First) Order Sensitivity Indices” and ranges from 0 to about 0.7 in increments of 0.1 units. The horizontal axis lists the three “Parameter” variables: beta subscript 4, xi, and gamma. Two types of bars are grouped together as indicated in the legend: “First-order sensitivity” and “Total order sensitivity.” The comparison for each parameter is as follows: Beta subscript 4: First-order sensitivity: 0.48. Total order sensitivity: 0.73. xi: First-order sensitivity: 0.14. Total order sensitivity: 0.45. gamma: First-order sensitivity: 0.06. Total order sensitivity: 0.22.

Sobol sensitivity indices of SANDGMO(1,1) model parameters in Case 1. Source(s): Authors' own creation

Figure 2
A grouped bar chart Sobol sensitivity indices of “S A N D G M O(1,1)” model parameters in case 1.The vertical axis is labeled “Total (First) Order Sensitivity Indices” and ranges from 0 to about 0.7 in increments of 0.1 units. The horizontal axis lists the three “Parameter” variables: beta subscript 4, xi, and gamma. Two types of bars are grouped together as indicated in the legend: “First-order sensitivity” and “Total order sensitivity.” The comparison for each parameter is as follows: Beta subscript 4: First-order sensitivity: 0.48. Total order sensitivity: 0.73. xi: First-order sensitivity: 0.14. Total order sensitivity: 0.45. gamma: First-order sensitivity: 0.06. Total order sensitivity: 0.22.

Sobol sensitivity indices of SANDGMO(1,1) model parameters in Case 1. Source(s): Authors' own creation

Close modal
Table 3

The simulation results comparison of different models in Case 1

kOriginal valueNGBM n = −1.5
p = 0.5
Nash NGBM n = −1.7
p = 0.54
Optimized nash NGBM n = −1.202
p = 0.103
Full NGBM n = −0.898
p = 0.250
m = 1
s = −0.998
FANGBM n = −0.3313
γ = 1.9588
SANDGMO
β4 = 0.000998
ξ = 0.81801067
γ = 2.119156
x(0)(k)x^(0)(k)x^(0)(k)x^(0)(k)x^(0)(k)x^(0)(k)x^(0)(k)
11111111
222.002.0121.855622.0000
31.52.06822.071.92451.52051.51.5000
432.91382.992.46462.89542.78433.0000
MAPE(avg)%13.62%13.05%15.38%4.02%2.3968%0.0000%
Source(s): Authors' own creation

As can be seen in Table 3, the newly proposed SANDGMO(1,1) model can accurately fit the small sample nonlinear time series x(0)=(1,2,1.5,3) without bias, significantly outperforming other competing NGBM(1,1) models, which indicates that the SANDGMO(1,1) model is suitable for modeling small sample nonlinear fluctuating data sequence prediction.

Figure 3 shows the optimization process of the firefly algorithm, genetic algorithm, and particle swarm algorithm. It can be seen that the firefly algorithm only needs a few steps to reach the minimum MAPE in case 1, obviously better than the genetic algorithm and particle swarm algorithm.

Figure 3
A line graph shows convergence curves of the “S A N D G M O(1,1)” model using three optimization algorithms in case 1“.The vertical axis is labeled “M A P E (percent)” and ranges from 0.00 to 0.07 in increments of 0.01 percent. The horizontal axis is labeled “Number of iterations” and ranges from 0 to 50 in increments of 10 units. The graph shows three lines as indicated in the legend: “Firefly algorithm” (blue line), “Particle swarm algorithm” (orange line), and “Genetic algorithm” (green line). The blue line starts with an initial “M A P E” around 0.01 percent, drops immediately to near 0 percent, and converges to a final error of 0 percent within the first 3 iterations, demonstrating the fastest convergence. The orange line starts with an initial “M A P E” around 0.075 percent, drops to about 0.03 percent by iteration 1, remains at this level until iteration 5, then drops to 0 percent by iteration 7. The green line starts with an initial “M A P E” around 0.036 percent, drops to about 0.016 percent by iteration 1, remains at this level until iteration 23, then drops to 0 percent by iteration 25. Note: All numerical values are approximated.

Convergence curves of the SANDGMO(1,1) model using three optimization algorithms in Case 1. Source(s): Authors' own creation

Figure 3
A line graph shows convergence curves of the “S A N D G M O(1,1)” model using three optimization algorithms in case 1“.The vertical axis is labeled “M A P E (percent)” and ranges from 0.00 to 0.07 in increments of 0.01 percent. The horizontal axis is labeled “Number of iterations” and ranges from 0 to 50 in increments of 10 units. The graph shows three lines as indicated in the legend: “Firefly algorithm” (blue line), “Particle swarm algorithm” (orange line), and “Genetic algorithm” (green line). The blue line starts with an initial “M A P E” around 0.01 percent, drops immediately to near 0 percent, and converges to a final error of 0 percent within the first 3 iterations, demonstrating the fastest convergence. The orange line starts with an initial “M A P E” around 0.075 percent, drops to about 0.03 percent by iteration 1, remains at this level until iteration 5, then drops to 0 percent by iteration 7. The green line starts with an initial “M A P E” around 0.036 percent, drops to about 0.016 percent by iteration 1, remains at this level until iteration 23, then drops to 0 percent by iteration 25. Note: All numerical values are approximated.

Convergence curves of the SANDGMO(1,1) model using three optimization algorithms in Case 1. Source(s): Authors' own creation

Close modal

Case 2.The fluctuating data sequence x(0)=(5,6,4,7) from the literature (Chen et al., 2008; Wu et al., 2019; Zheng et al., 2021b) is used as an example to demonstrate the effectiveness of the proposed SANDGMO(1,1) model. To identify the key parameters of the proposed SANDGMO(1,1) model in fitting the data sequence x(0), Sobol sensitivity indices are applied to investigate the global sensitivity of the SANDGMO(1,1) model adjustable parameters, and the calculation results are presented in Figure 4.

Figure 4
A grouped bar chart Sobol sensitivity indices of “S A N D G M O(1,1)” model parameters in case 2.The vertical axis is labeled “Total (First) Order Sensitivity Indices” and ranges from 0 to about 0.8 in increments of 0.1 units. The horizontal axis lists the three “Parameter” variables: Beta subscript 4, xi, and gamma. Two types of bars are grouped together as indicated in the legend: “First-order sensitivity” and “Total order sensitivity.” The comparison for each parameter is as follows: Beta subscript 4: First-order sensitivity: 0.79. Total order sensitivity: 0.84. xi: First-order sensitivity: 0.11. Total order sensitivity: 0.17. gamma: First-order sensitivity: 0.03. Total order sensitivity: 0.06.

Sobol sensitivity indices of SANDGMO(1,1) model parameters in Case 2. Source(s): Authors' own creation

Figure 4
A grouped bar chart Sobol sensitivity indices of “S A N D G M O(1,1)” model parameters in case 2.The vertical axis is labeled “Total (First) Order Sensitivity Indices” and ranges from 0 to about 0.8 in increments of 0.1 units. The horizontal axis lists the three “Parameter” variables: Beta subscript 4, xi, and gamma. Two types of bars are grouped together as indicated in the legend: “First-order sensitivity” and “Total order sensitivity.” The comparison for each parameter is as follows: Beta subscript 4: First-order sensitivity: 0.79. Total order sensitivity: 0.84. xi: First-order sensitivity: 0.11. Total order sensitivity: 0.17. gamma: First-order sensitivity: 0.03. Total order sensitivity: 0.06.

Sobol sensitivity indices of SANDGMO(1,1) model parameters in Case 2. Source(s): Authors' own creation

Close modal

The results indicate that the initial value correction parameter β4 is the key parameters that affect the prediction performance of the SANDGMO(1,1) model, followed by the important parameters accumulated generating operator order ξ and power exponent γ. Figure 4 shows that the total order sensitivity indices of the SANDGMO(1,1) model adjustable parameters are higher than their first-order sensitivity indices, indicating that there is an interaction between the parameters of the SANDGMO(1,1) model. Therefore, the multi-parameter simultaneous optimization scheme is an important way to improve the accuracy of the model parameter estimation. Based on this, the firefly algorithm is used to solve all the adjustable parameters of the SANDGMO(1,1) model, and the optimal emerging parameters are presented in Table 4.

Table 4

The simulation results comparison of different models in Case 2

kOriginal valueNGBM(1,1)
Chen et al. (2008)
γ = −10
FANGBM(1,1)
Wu et al. (2019)
γ = −0.2573 r = 1.9789
Unbiased NGBM(1,1)
Zheng et al. (2021a, b)
γ = −99.9654
SANDGMO(1,1)
β4 = -0.00001142
ξ = 0.43200654
γ = −0.00001504
x(0)(k)x^(0)(k)x^(0)(k)x^(0)(k)x^(0)(k)
155.00005.00005.00005.0000
266.49986.00005.92666.0000
344.92154.00004.07344.0000
476.98616.64827.00007.0000
MAE0.47840.11730.04890.0000
RMSE0.60530.20310.05990.0000
MAPE(avg)%10.52201.67521.01990.0000
Source(s): Authors' own creation

Figure 5 shows that the firefly algorithm, genetic algorithm, and particle swarm optimization algorithm all take only a few steps to reach the minimum MAPE in Case 2, among which the convergence speed of the firefly algorithm is slightly faster than that of the genetic algorithm and particle swarm algorithm. It can be seen from Table 4 that the proposed SANDGMO(1,1) model can fit the fluctuating data sequence in Case 2 without bias, and its predictive performance is significantly better than other comparative models.

Figure 5
A line graph shows convergence curves of the “S A N D G M O(1,1)” model using three optimization algorithms in case 2“.A line graph shows convergence curves of the “S A N D G M O(1,1)” model using three optimization algorithms in case 2.” The vertical axis is labeled “M A P E (percent)” and ranges from 0.00 to 0.08 in increments of 0.01 percent. The horizontal axis is labeled “Number of iterations” and ranges from 0 to 50 in increments of 10 units. The graph shows three lines as indicated in the legend: “Firefly algorithm” (blue line), “Particle swarm algorithm” (orange line), and “Genetic algorithm” (green line). The blue line starts with an initial “M A P E” around 0.08 percent, drops rapidly, and converges close to 0 percent within the first 5 to 10 iterations. The orange line starts with the highest initial “M A P E” around 0.05 percent, drops very steeply to below 0.01 percent in the first few iterations, and then converges to the same near-zero value as the other algorithms by about 5-10 iterations. The green line starts with a low initial “M A P E” (around 0.003 percent), and converges to the near-zero value very quickly, stabilizing within the first 5 iterations. Note: All numerical values are approximated.

Convergence curves of the SANDGMO(1,1) model using three optimization algorithms in Case 2. Source(s): Authors' own creation

Figure 5
A line graph shows convergence curves of the “S A N D G M O(1,1)” model using three optimization algorithms in case 2“.A line graph shows convergence curves of the “S A N D G M O(1,1)” model using three optimization algorithms in case 2.” The vertical axis is labeled “M A P E (percent)” and ranges from 0.00 to 0.08 in increments of 0.01 percent. The horizontal axis is labeled “Number of iterations” and ranges from 0 to 50 in increments of 10 units. The graph shows three lines as indicated in the legend: “Firefly algorithm” (blue line), “Particle swarm algorithm” (orange line), and “Genetic algorithm” (green line). The blue line starts with an initial “M A P E” around 0.08 percent, drops rapidly, and converges close to 0 percent within the first 5 to 10 iterations. The orange line starts with the highest initial “M A P E” around 0.05 percent, drops very steeply to below 0.01 percent in the first few iterations, and then converges to the same near-zero value as the other algorithms by about 5-10 iterations. The green line starts with a low initial “M A P E” (around 0.003 percent), and converges to the near-zero value very quickly, stabilizing within the first 5 iterations. Note: All numerical values are approximated.

Convergence curves of the SANDGMO(1,1) model using three optimization algorithms in Case 2. Source(s): Authors' own creation

Close modal

To summarize, for small sample nonlinear fluctuation time series forecasting, the prediction performance of the SANDGMO(1,1) model is significantly better than competitive models. Based on this, the SANDGMO(1,1) model is employed to predict the gross output value of the marine transportation industry.

In this section, the new proposed SANDGMO(1,1) model is employed to predict the gross output value of the marine transportation industry, and the predictions are compared with the competing models, including the traditional NGBM(1,1) model (Chen et al., 2008), the fractional NGBM(1,1) model (FANGBM(1,1)) (Wu et al., 2019), and the unbiased NGBM(1,1) (Zheng et al., 2021b) to verify the validity of the proposed model. For the fairness of comparison, the firefly algorithm is used to solve the adjustable parameters of the SANDGMO(1,1) model and other competing models. Finally, the SANDGMO(1,1) model is used to forecast the gross output value of the marine transportation industry from 2025 to 2027.

With the accelerating process of global economic integration, the ocean has emerged as a critical channel for international trade and resource transportation, demonstrating increasingly prominent strategic importance. China possesses vast maritime territories and an extensive coastline, endowed with abundant marine resources that provide inherent advantages for developing its marine transportation industry. The progressive implementation of the “Maritime Power” strategy and the “Belt and Road” initiative has propelled China's marine transportation sector into a phase of rapid development as a pillar industry of the coastal economy. According to data from the China Marine Economic Statistical Bulletin, the total output value of China's marine transportation industry reached 819.4 billion CNY in 2024, representing a year-on-year growth of 7.4% and accounting for 7.77% of China's total marine production value. However, influenced by factors such as fluctuations in the international shipping market, intensifying port competition, and green low-carbon policy constraints, the total output value of China's marine transportation industry exhibits pronounced nonlinearity and high volatility characteristics, as shown in Figure 6. In this context, accurate prediction of the marine transportation industry's total output value is crucial for formulating scientific development plans in the maritime transport industry. Due to the relatively late establishment of China's marine economic statistical system, insufficient data granularity, and frequent external disturbances, the availability of effective data reflecting the industry's latest development trends remains limited. These data characteristics significantly constrain the application effectiveness of traditional time series models (e.g., ARIMA) and machine learning approaches. To address these challenges, this study employs the newly proposed SANDGMO(1,1) model to predict the total output value of China's marine transportation industry, aiming to achieve high-precision forecasting under small-sample conditions.

Figure 6
A bar and line chart displays the marine transportation industry output value and its growth rate.The left vertical is labeled “Marine transportation industry output value (100 million yuan)” and ranges from 0 to 9000 in increments of 1000 units. The right vertical axis is labeled “Growth Rate (percent)” and ranges from negative 15 to 30 in increments of 5 units. The horizontal axis is labeled “Year” and ranges from 2018 to 2024 in increments of 1 year. The data from the bars are as follows: 2018: 6522. 2019: 6427. 2020: 5711. 2021: 6980. 2022: 7528. 2023: 7623. 2024: 8194. The growth rates for each year relative to the previous year are as follows: 2019: negative 1.5 percent. 2020: negative 11.1 percent. 2021: 22.2 percent. 2022: 7.9 percent. 2023: 1.3 percent. 2024: 7.5 percent. The line shows a sharp trough in growth rate in 2020, followed by a dramatic peak in 2021, after which the growth rate moderated.

Annual total output value and growth rate of China's marine transportation industry (2018–2024). Source(s): Authors' own creation

Figure 6
A bar and line chart displays the marine transportation industry output value and its growth rate.The left vertical is labeled “Marine transportation industry output value (100 million yuan)” and ranges from 0 to 9000 in increments of 1000 units. The right vertical axis is labeled “Growth Rate (percent)” and ranges from negative 15 to 30 in increments of 5 units. The horizontal axis is labeled “Year” and ranges from 2018 to 2024 in increments of 1 year. The data from the bars are as follows: 2018: 6522. 2019: 6427. 2020: 5711. 2021: 6980. 2022: 7528. 2023: 7623. 2024: 8194. The growth rates for each year relative to the previous year are as follows: 2019: negative 1.5 percent. 2020: negative 11.1 percent. 2021: 22.2 percent. 2022: 7.9 percent. 2023: 1.3 percent. 2024: 7.5 percent. The line shows a sharp trough in growth rate in 2020, followed by a dramatic peak in 2021, after which the growth rate moderated.

Annual total output value and growth rate of China's marine transportation industry (2018–2024). Source(s): Authors' own creation

Close modal

The data from 2018 to 2023 is used as a training set for model construction, whereas the data for 2024 is used as a validation set to assess the effectiveness of the model. To determine the key adjustable parameters and improve the accuracy of parameter calibration, the Sobol method is used to investigate the global sensitivity of the adjustable parameters of the SANDGMO(1,1) model, and the calculation results are presented in Figure 7. The results indicate that power exponent γ is the key parameters that affect the prediction performance of the SANDGMO(1,1) model, followed by the important parameters accumulated generating operator order ξ and initial value correction parameter β4. Figure 7 shows that the first-order sensitivity indices of the SANDGMO(1,1) model adjustable parameters are less than their total sensitivity indices, especially for parameters γ and ξ, indicating that there is an interaction between the parameters of the SANDGMO(1,1) model. Based on this, a multi-parameter simultaneous optimization scheme based on the firefly algorithm is applied to solve the optimal emerging parameters of the SANDGMO(1,1) model. The optimal emerging parameters and prediction results of different grey models are shown in Table 5. The convergence trajectory of the firefly algorithm, genetic algorithm, and particle swarm algorithm is plotted in Figure 8. Figure 8 shows that the aforementioned three algorithms only need about 50 steps to reach the minimum MAPE on China's marine transportation industry output value dataset, and the parameter estimation accuracy of firefly algorithm and particle swarm algorithm is better than that of genetic algorithm. The prediction results and errors comparison of the SANDGMO(1,1) model and competing models are presented in Table 5.

Figure 7
A grouped bar chart Sobol sensitivity indices of “S A N D G M O(1,1)” model parameters.The vertical axis is labeled “Total (First) Order Sensitivity Indices” and ranges from 0 to about 1.0 in increments of 0.1 units. The horizontal axis lists the three “Parameter” variables: Beta subscript 4, xi, and gamma. Two types of bars are grouped together as indicated in the legend: “First-order sensitivity” and “Total order sensitivity.” The comparison for each parameter is as follows: Beta subscript 4: First-order sensitivity: 0.00000010. Total order sensitivity: 0.00000015. xi: First-order sensitivity: 0.00877630. Total order sensitivity: 0.0326547. gamma: First-order sensitivity: 0.96756490. Total order sensitivity: 0.99127030.

Sobol sensitivity indices of SANDGMO(1,1) model parameters. Source(s): Authors' own creation

Figure 7
A grouped bar chart Sobol sensitivity indices of “S A N D G M O(1,1)” model parameters.The vertical axis is labeled “Total (First) Order Sensitivity Indices” and ranges from 0 to about 1.0 in increments of 0.1 units. The horizontal axis lists the three “Parameter” variables: Beta subscript 4, xi, and gamma. Two types of bars are grouped together as indicated in the legend: “First-order sensitivity” and “Total order sensitivity.” The comparison for each parameter is as follows: Beta subscript 4: First-order sensitivity: 0.00000010. Total order sensitivity: 0.00000015. xi: First-order sensitivity: 0.00877630. Total order sensitivity: 0.0326547. gamma: First-order sensitivity: 0.96756490. Total order sensitivity: 0.99127030.

Sobol sensitivity indices of SANDGMO(1,1) model parameters. Source(s): Authors' own creation

Close modal
Table 5

Performance comparison of various models

YearDataNGBM(1,1)
γ = −0.5028
FANGBM(1,1)
γ = 3.6052 r = 0.0807
Unbiased NGBM(1,1)
γ = −0.5226
SANDGMO(1,1)
β4 = -0.0032
ξ = 0.9999
γ = 0.2628
ValuesValuesValuesValues
20186,522
20196,4276426.76886427.42466426.96846400.3324
20205,7116221.41616610.98836255.50015739.8815
20216,9806560.92386893.85336553.95667109.0826
20227,5287163.68317237.26037124.48497224.1300
20237,6237970.05547627.81557899.08517895.7355
MAE328.2192256.4230330.0351152.2474
RMSE371.3014424.7242378.6927192.3171
MAPE(avg)%4.8674%4.1850%4.9241%2.0769%
20248,1948970.34688060.17348864.56188203.9107
MAE776.3468133.8266670.56189.9107
RMSE776.3468133.8266670.56189.9107
MAPE(avg)%9.4746%1.6332%8.1836%0.1210%
Source(s): Authors' own creation
Figure 8
A line graph shows convergence curves of the “S A N D G M O(1,1)” model using three optimization algorithms.The vertical axis is labeled “M A P E (percent)” and ranges from 0.020 to 0.050 in increments of 0.005 percent. The horizontal axis is labeled “Number of iterations” and ranges from 0 to 50 in increments of 10 units. The graph shows three lines as indicated in the legend: “Firefly algorithm” (blue line), “Particle swarm algorithm” (orange line), and “Genetic algorithm” (green line). he blue line starts with an initial “M A P E” of around 0.034 percent, decreases rapidly, and converges to the lowest final error of 0 percent by 34 iterations. The orange line starts with an initial “M A P E” of around 0.036 percent, consistently remains slightly above the blue line, and converges to a final error of about 0 percent by 35 iterations. The green line starts with the highest initial “M A P E” of around 0.037 percent, stays above the other two lines throughout, and converges to a final error of about 0 percent by 37 iterations. Note: All numerical values are approximated.

Convergence curves of the SANDGMO(1,1) model using three optimization algorithms on marine transportation industry output value dataset. Source(s): Authors' own creation

Figure 8
A line graph shows convergence curves of the “S A N D G M O(1,1)” model using three optimization algorithms.The vertical axis is labeled “M A P E (percent)” and ranges from 0.020 to 0.050 in increments of 0.005 percent. The horizontal axis is labeled “Number of iterations” and ranges from 0 to 50 in increments of 10 units. The graph shows three lines as indicated in the legend: “Firefly algorithm” (blue line), “Particle swarm algorithm” (orange line), and “Genetic algorithm” (green line). he blue line starts with an initial “M A P E” of around 0.034 percent, decreases rapidly, and converges to the lowest final error of 0 percent by 34 iterations. The orange line starts with an initial “M A P E” of around 0.036 percent, consistently remains slightly above the blue line, and converges to a final error of about 0 percent by 35 iterations. The green line starts with the highest initial “M A P E” of around 0.037 percent, stays above the other two lines throughout, and converges to a final error of about 0 percent by 37 iterations. Note: All numerical values are approximated.

Convergence curves of the SANDGMO(1,1) model using three optimization algorithms on marine transportation industry output value dataset. Source(s): Authors' own creation

Close modal

To assess the effectiveness of the proposed model, three model evaluation indices (MAPE, RMSE, and MAE) are introduced to measure the prediction power of all models. As Table 5 shows, in the simulation phase, all the MAPEs of the four models are less than 5%, achieving very high simulation accuracy. It is worth noting that the SANDGMO(1,1) model is significantly better than these competitors on all evaluation metrics, whether in the simulation stage or the prediction stage, which is consistent with the results of the simulation in Section 4, so it can be considered that SANDGMO(1,1) model is the best model for predicting the gross output value of the marine transportation industry.

Data-driven prediction models can only achieve relatively accurate simulations of samples, and the obtained model usually has certain uncertainties in the application. To further validate the reliability of the SANDGMO(1,1) model, the Monte Carlo simulations were introduced to create three different noise scenarios—low, medium, and high-intensity. By applying both the SANDGMO(1,1) model and competing models to the noise-augmented data for prediction, we generated a series of repeated results. Based on these results, we conducted a comprehensive analysis and comparison of the robustness of the new model and the competing models.

To comprehensively evaluate the predictive performance of various grey models under different noise intensities, we introduced Gaussian noise of varying strengths into the original modeling data to simulate input data uncertainty. This procedure was implemented through Monte Carlo experiments, where the mean of the Gaussian noise was set to zero, and the standard deviation was defined as the product of the noise intensity parameter α and the standard deviation of the original modeling data sequence. The parameter α was set to 0.005 (weak), 0.01 (moderate), and 0.05 (strong) to represent distinct noise levels. Through these simulations, we generated 1,000 datasets to assess the predictive robustness of the SANDGMO(1,1) model against competing models.

Table 6 presents the Monte Carlo experimental results, showing the mean absolute percentage error (MAPE) metrics of comparative models on both training and test sets under different noise conditions. The findings highlight that the SANDGMO(1,1) model demonstrates superior performance across all tested scenarios, significantly outperforming the competing models. This consistent robustness indicates that the SANDGMO(1,1) model delivers high-precision predictions under uncertain conditions.

Table 6

Comparison of the MAPE (%) of various grey models under different noise levels

Low(α= 0.005)Medium(α= 0.01)High(α= 0.05)
ModelTraining setTest setTraining setTest setTraining setTest set
NGBM4.06729.43584.08009.43314.18429.4334
FANGBM5.55283.05675.56773.04905.71803.1622
Unbiased NGBM4.11568.18494.12868.18214.23428.1788
SANDGMO1.73690.12031.74360.11981.84140.2281
Source(s): Authors' own creation

To further evaluate the prediction robustness of the model under uncertain conditions, this study introduces probability density analysis (PDA) and confidence interval (CI) methods.

First, uncertainty is quantified by performing PDA on the repeated prediction results of each model generated through Monte Carlo simulations under different noise intensities, as shown in Figure 9. Second, the CIs for each model on both the training and test sets under high-noise conditions are calculated using quantile estimation, providing a clear visualization of model performance and robustness in fluctuating uncertainty environments, as shown in Figure 10.

Figure 9
Four plots show the density distribution of the predicted “Gross output value of the marine transportation industry”.Each plot has the “Predicted Value” on the horizontal axis and “Density” on the vertical axis. Each plot shows one vertical line and three solid curves as indicated in the legend: a red vertical dashed line represents “True Value,” a solid purple curve represents “alpha equals 0.005,” a solid blue curve represents “alpha equals 0.01,” and a yellow curve represents “alpha equals 0.05.” In each plot, vertical dashed lines mark the 95 percent confidence intervals for three significance levels (alpha equals 0.005, alpha equals 0.01, and alpha equals 0.05). The confidence intervals are represented by the region between two vertical lines for each alpha. N G B M (Top Left): The vertical axis ranges from 0.00 to 0.10 in increments of 0.02 units. The horizontal axis ranges from 8200 to 9600 in increments of 200 units. The red dashed vertical line is drawn at 8200. The mean predicted value is around 9000. The density curve is narrower. The blue curve has the lowest peak of about 0.01, and the purple curve has the highest peak of over 0.10. The “True Value” falls outside the 95 percent confidence interval, indicating poor predictive accuracy. F A G N B M (Top Right): The vertical axis ranges from 0.000 to 0.025 in increments of 0.005 units. The horizontal axis ranges from 7750 to 9500 in increments of 250 units. The red dashed vertical line is drawn at 8155. The mean predicted value is around 8400. The density curve is relatively broader and has a wider confidence interval. The “True Value” is located near the edge or slightly outside the 95 percent confidence interval. Unbiased N G B M (Bottom Left): The vertical axis ranges from 0.00 to 0.10 in increments of 0.02 units. The horizontal axis ranges from 8200 to 9400 in increments of 200 units. The red dashed vertical line is drawn at 8200. The mean predicted value is around 8900. The density curve is extremely narrow and peaked (high density), indicating very low predictive uncertainty. The “True Value” falls outside the narrow 95 percent confidence interval. S A N D G M O (Bottom Right): The vertical axis ranges from 0.000 to 0.175 in increments of 0.025 units. The horizontal axis ranges from 7800 to 8600 in increments of 200 units. The red dashed vertical line is drawn at 8200. The mean predicted value is around 8200. The density curve is extremely narrow and peaked (high density), similar to “Unbiased N G B M.” The mean prediction is the closest to the “True Value,” and the “True Value” falls within the very narrow 95 percent confidence interval. Note: All numerical values are approximated.

KDE curves of various grey models under different noise scenarios. Source(s): Authors' own creation

Figure 9
Four plots show the density distribution of the predicted “Gross output value of the marine transportation industry”.Each plot has the “Predicted Value” on the horizontal axis and “Density” on the vertical axis. Each plot shows one vertical line and three solid curves as indicated in the legend: a red vertical dashed line represents “True Value,” a solid purple curve represents “alpha equals 0.005,” a solid blue curve represents “alpha equals 0.01,” and a yellow curve represents “alpha equals 0.05.” In each plot, vertical dashed lines mark the 95 percent confidence intervals for three significance levels (alpha equals 0.005, alpha equals 0.01, and alpha equals 0.05). The confidence intervals are represented by the region between two vertical lines for each alpha. N G B M (Top Left): The vertical axis ranges from 0.00 to 0.10 in increments of 0.02 units. The horizontal axis ranges from 8200 to 9600 in increments of 200 units. The red dashed vertical line is drawn at 8200. The mean predicted value is around 9000. The density curve is narrower. The blue curve has the lowest peak of about 0.01, and the purple curve has the highest peak of over 0.10. The “True Value” falls outside the 95 percent confidence interval, indicating poor predictive accuracy. F A G N B M (Top Right): The vertical axis ranges from 0.000 to 0.025 in increments of 0.005 units. The horizontal axis ranges from 7750 to 9500 in increments of 250 units. The red dashed vertical line is drawn at 8155. The mean predicted value is around 8400. The density curve is relatively broader and has a wider confidence interval. The “True Value” is located near the edge or slightly outside the 95 percent confidence interval. Unbiased N G B M (Bottom Left): The vertical axis ranges from 0.00 to 0.10 in increments of 0.02 units. The horizontal axis ranges from 8200 to 9400 in increments of 200 units. The red dashed vertical line is drawn at 8200. The mean predicted value is around 8900. The density curve is extremely narrow and peaked (high density), indicating very low predictive uncertainty. The “True Value” falls outside the narrow 95 percent confidence interval. S A N D G M O (Bottom Right): The vertical axis ranges from 0.000 to 0.175 in increments of 0.025 units. The horizontal axis ranges from 7800 to 8600 in increments of 200 units. The red dashed vertical line is drawn at 8200. The mean predicted value is around 8200. The density curve is extremely narrow and peaked (high density), similar to “Unbiased N G B M.” The mean prediction is the closest to the “True Value,” and the “True Value” falls within the very narrow 95 percent confidence interval. Note: All numerical values are approximated.

KDE curves of various grey models under different noise scenarios. Source(s): Authors' own creation

Close modal
Figure 10
A time series plot shows the prediction confidence intervals of various grey models under high noise“.The vertical axis is labeled “Gross output value of the marine transportation industry” and ranges from 6000 to 9000 in increments of 500 units. The horizontal axis is labeled “Year” and ranges from 2015 to 2021 in increments of 1 year. The data is divided by a vertical dashed line into a “Training Set” (2015 to mid-2020) and a “Test Set” (mid-2020 to 2021). The legend at the top left indicates the following: a black line represents “True Values,” a blue line represents “N G B M,” a blue shaded area represents “N G B M 95 percent C I,” an orange line represents “F A N G B M,” an orange shaded area represents “F A N G B M 95 percent C I,” a green line represents “Unbiased N G B M,” a green shaded area represents “Unbiased N G B M 95 percent C I,” a red line represents “S A N D G M,” and a red shaded area represents “S A N D G M 95 percent C I.” The “True Values” line starts at approximately 6500 in 2015, decreases sharply to about 5700 in 2017, and then increases steadily to around 8200 in 2021. The “F A N G B M” line overestimates the output value across the entire period, showing the highest predicted values and the widest confidence interval. The “N G B M,” “Unbiased N G B M,” and “S A N D G M” lines closely follow the “True Values” line, particularly around the 2017 dip. By 2021, “N G B M” and “Unbiased N G B M” reach the highest output value of around 8900, followed by “F A N G B M” at about 8400, and “S A N D G M” at approximately 8200. Note: All numerical values are approximated.

The plot of the prediction confidence intervals of various grey models under high noise. Source(s): Authors' own creation

Figure 10
A time series plot shows the prediction confidence intervals of various grey models under high noise“.The vertical axis is labeled “Gross output value of the marine transportation industry” and ranges from 6000 to 9000 in increments of 500 units. The horizontal axis is labeled “Year” and ranges from 2015 to 2021 in increments of 1 year. The data is divided by a vertical dashed line into a “Training Set” (2015 to mid-2020) and a “Test Set” (mid-2020 to 2021). The legend at the top left indicates the following: a black line represents “True Values,” a blue line represents “N G B M,” a blue shaded area represents “N G B M 95 percent C I,” an orange line represents “F A N G B M,” an orange shaded area represents “F A N G B M 95 percent C I,” a green line represents “Unbiased N G B M,” a green shaded area represents “Unbiased N G B M 95 percent C I,” a red line represents “S A N D G M,” and a red shaded area represents “S A N D G M 95 percent C I.” The “True Values” line starts at approximately 6500 in 2015, decreases sharply to about 5700 in 2017, and then increases steadily to around 8200 in 2021. The “F A N G B M” line overestimates the output value across the entire period, showing the highest predicted values and the widest confidence interval. The “N G B M,” “Unbiased N G B M,” and “S A N D G M” lines closely follow the “True Values” line, particularly around the 2017 dip. By 2021, “N G B M” and “Unbiased N G B M” reach the highest output value of around 8900, followed by “F A N G B M” at about 8400, and “S A N D G M” at approximately 8200. Note: All numerical values are approximated.

The plot of the prediction confidence intervals of various grey models under high noise. Source(s): Authors' own creation

Close modal

Figure 9 displays the kernel density estimation (KDE) curves of the proposed SANDGMO(1,1) model under varying noise intensities. The KDE curves indicate that the prediction outcomes of all grey prediction models exhibit a unimodal distribution under different noise conditions, signifying intrinsic stability in their prediction performance. As observed in Figure 9, the bandwidth of the KDE curves increases significantly with rising noise level α, demonstrating that the prediction accuracy of grey prediction models declines as noise intensifies. Notably, among all comparative models, the SANDGMO(1,1) model exhibits significantly narrower KDE bandwidths across all noise levels, and its peak position aligns most closely with the true value. This highlights the model's reliability and precision in forecasting. In contrast, while other competing models also show a unimodal distribution, their peak positions generally exhibit significant deviations from the true value, further validating the SANDGMO(1,1) model's superior prediction accuracy and robustness under complex conditions.

Figure 10 illustrates the 95% CI for the SANDGMO(1,1) model and competing models on both the training and test sets under high-noise conditions. The SANDGMO(1,1) model exhibits a compact overall CI range, demonstrating close alignment with the true values on both the training and test sets. This high degree of consistency indicates excellent prediction fidelity and minimal uncertainty for the SANDGMO(1,1) model in low-to-moderate noise scenarios. Under high noise, while the CI of the SANDGMO(1,1) model on the test set moderately widens, it remains relatively narrow and tightly centered around the true value, showcasing reliable generalization capability in prediction.

In comparison, although the confidence intervals of the NGBM(1,1) and Unbiased NGBM(1,1) models are also narrow, they fail to effectively capture the complex dynamic characteristics of the data. This leads to significant systematic deviations between the predictions and the actual data on both the training and test sets, indicating their insufficient model expressiveness. The FANGBM(1,1) model exhibits significantly widened CIs on both training and test sets. This demonstrates a high level of uncertainty, thereby hindering its ability to effectively capture the overall development trend of the gross output value of the marine transportation industry.

In summary, facing challenges posed by varying noise intensities and limited samples, the SANDGMO(1,1) model demonstrates exceptional performance by accurately tracking the actual development trajectory. This consistently stable performance strongly proves its exceptional robustness and confirms its excellent capability in tackling the challenges posed by uncertain prediction environments. Based on this, the present study will use the SANDGMO(1,1) model to forecast the gross output value of the marine transportation industry for the period 2025–2027.

Due to the superior predictive power compared to competing models, the SANDGMO(1,1) model is further applied to predict the gross output value of the marine transportation industry from 2025 to 2027. The data from 2018 to 2024 are used as training sets for model construction, and the firefly algorithm was applied to find the optimal adjustable parameters (β4=40.6810,ξ=0.6639,γ=0.1182). The prediction results are presented in Table 7.

Table 7

The gross output value of the marine transportation industry in the next three years (unit: billion CNY)

Year202520262027
Predicted value876.3243915.8.751958.6436
Source(s): Authors' own creation

As Table 7 shows, China's marine transportation industry is projected to maintain sustained growth over the next three years, with its total output value expected to reach 958.6436 billion CNY by 2027. Based on this forecast, the following policy recommendations are proposed:

First, strengthen intelligent upgrading of port infrastructure to enhance hub capacity and meet growing transport demand; Second, establish a green shipping subsidy mechanism to accelerate large-scale adoption of new-energy vessels and drive low-carbon transformation of the marine transportation sector; Third, optimize the marine talent development system to support marine transportation industry digitalization and internationalization. The industry's rapid expansion urgently requires a large number of interdisciplinary professionals (e.g., in intelligent shipping system operation/maintenance, international maritime law, and green technology management). However, a structural gap exists between current talent cultivation models and industrial demands, necessitating accelerated talent cultivation through deep industry-education integration and global talent recruitment initiatives.

This paper presents a novel self-adaptive SANDGM(1,1) model, termed as SANDGMO(1,1) model, by combining the non-homogeneous discrete grey model with the NGBM(1,1) model, and investigates its property and applicability. The proposed SANDGMO(1,1) model has the advantages of NGBM(1,1) and a non-homogeneous discrete grey model. Two classical fluctuating time series and a real-world case of predicting China's marine transportation industry output value were introduced to verify the effectiveness of the proposed SANDGMO(1,1) model in nonlinear time series prediction. Empirical results (Tables 3–6) illustrated that the prediction robustness and accuracy of the proposed SANDGMO(1,1) model are significantly better than those of the competing models. As shown in Table 6, the MAPE values of the SANDGMO(1,1) model for both training and test sets under different noise intensities are significantly lower than those of the competing models, indicating the superior robustness of the proposed model. The main conclusions of this paper are as follows:

  1. SANDGMO(1,1) model is a generalization of the existing NGBM(1,1) model family, which not only retains the advantages of the existing NGBM(1,1) model family, but also effectively eliminates the jumping error between the parameter estimation and the actual prediction in the traditional NGBM(1,1) model, and has unique advantages in terms of nonlinear time series prediction with small samples.

  2. Empirical results (Figures 2, 4, and 7) showed that there is a significant interaction between the emerging parameters of the SANDGMO(1,1) model, which provides a theoretical basis for the multiparameter simultaneous optimization scheme proposed in this paper.

  3. The SANDGMO(1,1) model eliminates the unreasonable modeling assumption of the existing NGBM (1,1) model family that some adjustable parameters are known, remedies the main forecast error of previous studies, and provides an effective solution for nonlinear time series forecasting with small samples. Empirical results (Tables 3–6) strongly support this conclusion.

  4. Property analysis of the SANDGMO(1,1) model shows that the model has higher fitting accuracy and wider applicability than the existing NGBM(1,1) model family. Whether the small data sequence is monotonic or fluctuating, the SANDGMO(1,1) model would be suitable for modeling. Especially when the data sequence is a random unimodal sequence, the SANDGMO(1,1) model will have high fitting and prediction accuracy. Therefore, the SANDGMO(1,1) model is a more efficient model for small data sequences with an unknown distribution.

Up to now, the advantages of the SANDGMO(1,1) model over other competitors have been expounded. However, there are still some issues that deserve further research.

  1. Although the SANDGMO(1,1) model has achieved very high prediction accuracy in solving nonlinear dynamic system prediction modeling, it is essentially a univariate grey prediction model that does not take into account the influence of external factors. Therefore, the study of a self-adaptive nonlinear multivariate grey prediction model is an interesting direction.

  2. Although the SANDGMO(1,1) model demonstrates superior robustness over benchmark models across varying Gaussian noise levels, its robustness still decays with increasing noise intensity. Given that the noise in real-world systems often exhibits non-Gaussian characteristics, developing a noise-level-driven dynamic parameter tuning mechanism for the SANDGMO(1,1) model to strengthen its noise immunity constitutes an important future research direction.

An
,
Y.
,
Zou
,
Z.
and
Zhao
,
Y.
(
2015
), “
Forecasting of dissolved oxygen in the Guanting reservoir using an optimized NGBM (1,1) model
”,
Journal of Environmental Sciences
, Vol.
29
, pp.
158
-
164
, doi: .
Batchelor
,
R.
,
Alizadeh
,
A.
and
Visvikis
,
I.
(
2007
), “
Forecasting spot and forward prices in the international freight market
”,
International Journal of Forecasting
, Vol.
23
No.
1
, pp.
101
-
114
, doi: .
Benavides
,
I.F.
,
Romero-Leiton
,
J.P.
,
Santacruz
,
M.
,
Barreto
,
C.
,
Puentes
,
V.
and
Selvaraj
,
J.J.
(
2022
), “
Applying seasonal time series modeling to forecast marine fishery landings for six species in the Colombian Pacific ocean
”,
Regional Studies in Marine Science
, Vol.
56
, 102716, doi: .
Chen
,
C.-I.
(
2008
), “
Application of the novel nonlinear grey Bernoulli model for forecasting unemployment rate
”,
Chaos, Solitons and Fractals
, Vol.
37
No.
1
, pp.
278
-
287
, doi: .
Chen
,
C.-I.
,
Chen
,
H.L.
and
Chen
,
S.-P.
(
2008
), “
Forecasting of foreign exchange rates of Taiwan's major trading partners by novel nonlinear grey Bernoulli model NGBM(1, 1)
”,
Communications in Nonlinear Science and Numerical Simulation
, Vol.
13
No.
6
, pp.
1194
-
1204
, doi: .
Chen
,
C.-I.
,
Hsin
,
P.-H.
and
Wu
,
C.-S.
(
2010
), “
Forecasting Taiwan's major stock indices by the nash nonlinear grey Bernoulli model
”,
Expert Systems with Applications
, Vol.
37
No.
12
, pp.
7557
-
7562
, doi: .
Chen
,
Y.Y.
,
Chen
,
G.W.
,
Chiou
,
A.H.
and
Chen
,
S.H.
(
2017
), “
Forecasting nonlinear time series using an adaptive nonlinear grey Bernoulli model: cases of energy consumption
”,
Journal of Grey System
, Vol.
29
No.
4
, pp.
75
-
93
.
Cheng
,
K.
,
Lu
,
Z.
,
Ling
,
C.
and
Zhou
,
S.
(
2020
), “
Surrogate-assisted global sensitivity analysis: an overview
”,
Structural and Multidisciplinary Optimization
, Vol.
61
No.
3
, pp.
1187
-
1213
, doi: .
Cui
,
J.
,
Liu
,
S.
,
Zeng
,
B.
and
Xie
,
N.
(
2013
), “
A novel grey forecasting model and its optimization
”,
Applied Mathematical Modelling
, Vol.
37
No.
6
, pp.
4399
-
4406
, doi: .
Dang
,
Y.
,
Liu
,
S.
and
Chen
,
K.
(
2004
), “
The GM models that x(n) be taken as initial value
”,
Kybernetes
, Vol.
33
No.
2
, pp.
247
-
254
, doi: .
De Gooijer
,
J.G.
and
Hyndman
,
R.J.
(
2006
), “
25 years of time series forecasting
”,
International Journal of Forecasting
, Vol.
22
No.
3
, pp.
443
-
473
, doi: .
Deng
,
J.
(
1982
), “
Control problems of grey systems
”,
Systems and Control Letters
, Vol.
1
No.
5
, pp.
288
-
294
, doi: .
Goldberg
,
D.E.
(
1989
),
Genetic Algorithms in Search, Optimization and Machine Learning
,
Addison-Wesley Longman Publishing
,
Boston, MA
.
Guo
,
X.
,
Liu
,
S.
and
Yang
,
Y.
(
2019
), “
A prediction method for plasma concentration by using a nonlinear grey Bernoulli combined model based on a self-memory algorithm
”,
Computers in Biology and Medicine
, Vol.
105
, pp.
81
91
, doi: .
Hirata
,
E.
and
Matsuda
,
T.
(
2022
), “
Forecasting shanghai container freight index: a deep-learning-based model experiment
”,
Journal of Marine Science and Engineering
, Vol.
10
No.
5
, p.
593
, doi: .
Hsin
,
P.-H.
and
Chen
,
C.-I.
(
2015
), “
Application of game theory on parameter optimization of the novel two-stage nash nonlinear grey Bernoulli model
”,
Communications in Nonlinear Science and Numerical Simulation
, Vol.
27
No.
1
, pp.
168
-
174
, doi: .
Hsu
,
L.-C.
(
2010
), “
A genetic algorithm based nonlinear grey Bernoulli model for output forecasting in integrated circuit industry
”,
Expert Systems with Applications
, Vol.
37
No.
6
, pp.
4318
-
4323
, doi: .
Hsu
,
Y.-T.
,
Liu
,
M.-C.
,
Yeh
,
J.
and
Hung
,
H.-F.
(
2009
), “
Forecasting the turning time of stock market based on Markov–Fourier Grey model
”,
Expert Systems with Applications
, Vol.
36
No.
4
, pp.
8597
-
8603
, doi: .
Kennedy
,
J.
and
Eberhart
,
R.
(
1995
), “
Particle swarm optimization
”,
Presented at the ICNN95-International Conference on Neural Networks
,
Perth, WA
, Vol.
4
, pp.
1942
-
1948
, doi: .
Kong
,
L.
and
Ma
,
X.
(
2018
), “
Comparison study on the nonlinear parameter optimization of nonlinear grey Bernoulli model (NGBM(1,1)) between intelligent optimizers
”,
Grey Systems: Theory and Application
, Vol.
8
No.
2
, pp.
210
-
226
, doi: .
Li
,
W.
,
Lu
,
C.
and
Liu
,
S.
(
2016
), “
The research on electric load forecasting based on nonlinear gray Bernoulli model optimized by cosine operator and particle swarm optimization
”,
Journal of Intelligent and Fuzzy Systems
, Vol.
30
No.
6
, pp.
3665
-
3673
, doi: .
Li
,
X.
,
Wu
,
X.
and
Zhao
,
Y.
(
2023
), “
Research and application of multi-variable grey optimization model with interactive effects in marine emerging industries prediction
”,
Technological Forecasting and Social Change
, Vol.
187
, 122203, doi: .
Liu
,
X.
and
Xie
,
N.
(
2019
), “
A nonlinear grey forecasting model with double shape parameters and its application
”,
Applied Mathematics and Computation
, Vol.
360
, pp.
203
-
212
, doi: .
Lu
,
J.
,
Xie
,
W.
,
Zhou
,
H.
and
Zhang
,
A.
(
2016
), “
An optimized nonlinear grey Bernoulli model and its applications
”,
Neurocomputing
, Vol.
177
, pp.
206
-
214
, doi: .
Ma
,
X.
,
Wu
,
W.
and
Zhang
,
Y.
(
2019
), “
Improved GM(1,1) model based on Simpson formula and its applications
”,
Journal of Grey System
, Vol.
31
No.
4
, pp.
33
-
46
.
Ma
,
X.
,
He
,
Q.
,
Li
,
W.
and
Wu
,
W.
(
2025
), “
Time-delayed fractional grey Bernoulli model with independent fractional orders for fossil energy consumption forecasting
”,
Engineering Applications of Artificial Intelligence
, Vol.
155
, 110942, doi: .
Nariman
,
N.A.
,
Hussain
,
R.R.
,
Mohammad
,
I.I.
and
Karampour
,
P.
(
2019
), “
Global sensitivity analysis of certain and uncertain factors for a circular tunnel under seismic action
”,
Frontiers of Structural and Civil Engineering
, Vol.
13
No.
6
, pp.
1289
-
1300
, doi: .
Nossent
,
J.
,
Elsen
,
P.
and
Bauwens
,
W.
(
2011
), “
Sobol’ sensitivity analysis of a complex environmental model
”,
Environmental Modelling and Software
, Vol.
26
No.
12
, pp.
1515
-
1525
, doi: .
Pao
,
H.-T.
,
Fu
,
H.-C.
and
Tseng
,
C.-L.
(
2012
), “
Forecasting of CO2 emissions, energy consumption and economic growth in China using an improved grey model
”,
Energy
, Vol.
40
No.
1
, pp.
400
-
409
, doi: .
Şahin
,
U.
(
2021
), “
Future of renewable energy consumption in France, Germany, Italy, Spain, Turkey and UK by 2030 using optimized fractional nonlinear grey Bernoulli model
”,
Sustainable Production and Consumption
, Vol.
25
, pp.
1
-
14
, doi: .
Shi
,
H.
(
2019
), “
Application of back propagation (BP) neural network in marine regional economic forecast
”,
Journal of Coastal Research
, Vol.
98
No.
sp1
, pp.
67
-
70
, doi: .
Sobol
,
I.M.
(
1993
), “
Sensitivity analysis for nonlinear mathematical models
”,
Mathematical Modelling and Computational Experiment
, Vol.
1
, pp.
407
-
414
.
Sun
,
X.
,
Sun
,
W.
,
Wang
,
J.
,
Zhang
,
Y.
and
Gao
,
Y.
(
2016
), “
Using a Grey–Markov model optimized by cuckoo search algorithm to forecast the annual foreign tourist arrivals to China
”,
Tourism Management
, Vol.
52
, pp.
369
-
379
, doi: .
Tang
,
X.
,
Xie
,
N.
and
Hu
,
A.
(
2022
), “
Forecasting annual foreign tourist arrivals to China by incorporating firefly algorithm into fractional non-homogenous discrete grey model
”,
Kybernetes
, Vol.
51
No.
2
, pp.
676
-
693
, doi: .
Tang
,
X.
,
Zhu
,
Z.
,
Liu
,
X.
and
Zhan
,
H.
(
2025
), “
A novel self-adaptive nonlinear grey Bernoulli model for forecasting China's industrial electricity consumption
”,
AIMS Mathematics
, Vol.
10
No.
8
, pp.
17305
-
17333
, doi: .
Wang
,
Z.-X.
(
2013
), “
An optimized nash nonlinear grey Bernoulli model for forecasting the main economic indices of high technology enterprises in China
”,
Computers and Industrial Engineering
, Vol.
64
No.
3
, pp.
780
-
787
, doi: .
Wang
,
Y.
,
Dang
,
Y.
,
Li
,
Y.
and
Liu
,
S.
(
2010
), “
An approach to increase prediction precision of GM(1,1) model based on optimization of the initial condition
”,
Expert Systems with Applications
, Vol.
37
No.
8
, pp.
5640
-
5644
, doi: .
Wang
,
Z.-X.
,
Hipel
,
K.W.
,
Wang
,
Q.
and
He
,
S.-W.
(
2011
), “
An optimized NGBM(1,1) model for forecasting the qualified discharge rate of industrial wastewater in China
”,
Applied Mathematical Modelling
, Vol.
35
No.
12
, pp.
5524
-
5532
, doi: .
Wu
,
L.
,
Liu
,
S.
,
Yao
,
L.
,
Yan
,
S.
and
Liu
,
D.
(
2013
), “
Grey system model with the fractional order accumulation
”,
Communications in Nonlinear Science and Numerical Simulation
, Vol.
18
No.
7
, pp.
1775
-
1785
, doi: .
Wu
,
L.-F.
,
Liu
,
S.-F.
,
Cui
,
W.
,
Liu
,
D.-L.
and
Yao
,
T.-X.
(
2014
), “
Non-homogenous discrete grey model with fractional-order accumulation
”,
Neural Computing and Applications
, Vol.
25
No.
5
, pp.
1215
-
1221
, doi: .
Wu
,
W.
,
Ma
,
X.
,
Zeng
,
B.
,
Wang
,
Y.
and
Cai
,
W.
(
2019
), “
Forecasting short-term renewable energy consumption of China using a novel fractional nonlinear grey Bernoulli model
”,
Renewable Energy
, Vol.
140
, pp.
70
-
87
, doi: .
Wu
,
W.
,
Ma
,
X.
,
Zeng
,
B.
,
Lv
,
W.
,
Wang
,
Y.
and
Li
,
W.
(
2020
), “
A novel grey Bernoulli model for short-term natural gas consumption forecasting
”,
Applied Mathematical Modelling
, Vol.
84
, pp.
393
-
404
, doi: .
Wu
,
W.-Z.
,
Xie
,
W.
,
Liu
,
C.
and
Zhang
,
T.
(
2022
), “
A novel fractional discrete nonlinear grey bernoulli model for forecasting the wind turbine capacity of China
”,
Grey Systems: Theory and Application
, Vol.
12
No.
2
, pp.
357
-
375
, doi: .
Wu
,
W.
,
Ma
,
X.
,
Zeng
,
B.
and
Zhang
,
Y.
(
2024
), “
A conformable fractional-order grey Bernoulli model with optimized parameters and its application in forecasting Chongqing's energy consumption
”,
Expert Systems with Applications
, Vol.
255
, 124534, doi: .
Xiao
,
X.
,
Yang
,
J.
,
Mao
,
S.
and
Wen
,
J.
(
2017
), “
An improved seasonal rolling grey forecasting model using a cycle truncation accumulated generating operation for traffic flow
”,
Applied Mathematical Modelling
, Vol.
51
, pp.
386
-
404
, doi: .
Xiao
,
Q.
,
Gao
,
M.
,
Xiao
,
X.
and
Goh
,
M.
(
2020
), “
A novel grey Riccati–Bernoulli model and its application for the clean energy consumption prediction
”,
Engineering Applications of Artificial Intelligence
, Vol.
95
, 103863, doi: .
Xie
,
N.M.
and
Liu
,
S.
(
2009
), “
Discrete grey forecasting model and its optimization
”,
Applied Mathematical Modelling
, Vol.
33
No.
2
, pp.
1173
-
1186
, doi: .
Xie
,
N.-M.
,
Liu
,
S.-F.
,
Yang
,
Y.-J.
and
Yuan
,
C.-Q.
(
2013
), “
On novel grey forecasting model based on non-homogeneous index sequence
”,
Applied Mathematical Modelling
, Vol.
37
No.
7
, pp.
5059
-
5068
, doi: .
Xie
,
W.
,
Wu
,
W.-Z.
,
Liu
,
C.
,
Zhang
,
T.
and
Dong
,
Z.
(
2021
), “
Forecasting fuel combustion-related CO2 emissions by a novel continuous fractional nonlinear grey Bernoulli model with grey wolf optimizer
”,
Environmental Science and Pollution Research
, Vol.
28
, pp.
38128
-
38144
, doi: .
Xu
,
H.
and
Li
,
N.
(
2023
), “
Forecasting China's marine scientific research and education, marine industrial structure upgrading and marine economy growth based on the AWBO-MGM(1,m) model
”,
Marine Economics and Management
, Vol.
6
No.
1
, pp.
1
-
22
, doi: .
Xu
,
X.
,
Wu
,
Y.
and
Zeng
,
B.
(
2025
), “
Forecasting short-term energy consumption in Chongqing using a novel grey Bernoulli model
”,
Grey Systems: Theory and Application
, Vol.
15
No.
1
, pp.
21
-
44
, doi: .
Yan
,
S.
,
Peng
,
M.
,
Wu
,
L.
and
Xiong
,
P.
(
2025
), “
A novel structural adaptive seasonal grey Bernoulli model in natural gas production forecasting
”,
Engineering Applications of Artificial Intelligence
, Vol.
148
, 110407, doi: .
Yang
,
X.-S.
(
2010
),
Nature-Inspired Metaheuristic Algorithms
, Vol.
2
,
Luniver Press
,
Frome
.
Yin
,
K.
,
Liu
,
Z.
,
Zhang
,
C.
,
Huang
,
S.
,
Li
,
J.
,
Lv
,
L.
,
Su
,
X.
and
Zhang
,
R.
(
2022
), “
Analysis and forecast of marine economy development in China
”,
Marine Economics and Management
, Vol.
5
No.
1
, pp.
1
-
33
, doi: .
Zeng
,
B.
and
Liu
,
S.
(
2017
), “A self-adaptive intelligence gray prediction model with the optimal fractional order accumulating operator and its application”,
edited by Cuevas, C
.,
Mathematical Methods in the Applied Sciences
, Vol.
40
No.
18
, pp.
7843
-
7857
, doi: .
Zeng
,
Z.
,
Xu
,
J.
,
Zhou
,
S.
,
Zhao
,
Y.
and
Shi
,
Y.
(
2024
), “
Forecasting the potential of global marine shipping carbon emission under artificial intelligence based on a novel multivariate discrete grey model
”,
Marine Economics and Management
, Vol.
7
No.
1
, pp.
42
-
66
, doi: .
Zhang
,
L.
,
Zheng
,
Y.
,
Wang
,
K.
,
Zhang
,
X.
and
Zheng
,
Y.
(
2014
), “
An optimized Nash nonlinear grey Bernoulli model based on particle swarm optimization and its application in prediction for the incidence of Hepatitis B in Xinjiang, China
”,
Computers in Biology and Medicine
, Vol.
49
, pp.
67
73
, doi: .
Zheng
,
C.
,
Wen-Ze
,
W.-Z.
,
Xie
,
W.
and
Li
,
Q.
(
2021a
), “
A MFO-based conformable fractional nonhomogeneous grey Bernoulli model for natural gas production and consumption forecasting
”,
Applied Soft Computing
, Vol.
99
No.
1
, 106891, doi: .
Zheng
,
C.
,
Wu
,
W.-Z.
,
Xie
,
W.
,
Li
,
Q.
and
Zhang
,
T.
(
2021b
), “
Forecasting the hydroelectricity consumption of China by using a novel unbiased nonlinear grey Bernoulli model
”,
Journal of Cleaner Production
, Vol.
278
, 123903, doi: .
Zhou
,
J.
,
Fang
,
R.
,
Li
,
Y.
,
Zhang
,
Y.
and
Peng
,
B.
(
2009
), “
Parameter optimization of nonlinear grey Bernoulli model using particle swarm optimization
”,
Applied Mathematics and Computation
, Vol.
207
No.
2
, pp.
292
-
299
, doi: .
Zhou
,
D.
,
Yu
,
Z.
,
Zhang
,
H.
and
Weng
,
S.
(
2016
), “
A novel grey prognostic model based on Markov process and grey incidence analysis for energy conversion equipment degradation
”,
Energy
, Vol.
109
, pp.
420
-
429
, doi: .
Published in Marine Economics and Management. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence.

or Create an Account

Close Modal
Close Modal