Surrogate models are frequently used to alleviate the computational cost associated with reliability analysis in geotechnical engineering; however, efficiently training high-accuracy surrogate models remains a significant challenge. This paper aims to introduce AEGPR-MCS, a robust framework that integrates the ensemble model, active learning strategy and Monte Carlo simulation (MCS) to perform the reliability analysis for various geotechnical systems.
AEGPR-MCS leverages an ensemble model as a substitute for complex, computationally intensive and implicit geotechnical mechanical models. It incorporates active learning to iteratively choose the most important training samples, ensuring that the surrogate model is trained with both precision and efficiency. The added new training samples are strategically positioned near the limit state surface, which significantly improves model efficiency by focusing computational resources on the most critical regions. Moreover, the adaptive ensemble model dynamically adjusts the weights of its components to enhance predictive robustness across various geotechnical systems.
Validation through three geotechnical examples demonstrates significant improvements in both computational efficiency and prediction accuracy over other methods. The AEGPR-MCS method provides reliable and precise reliability assessments for different geotechnical engineering problems while significantly reducing computational resource requirements, making it a well-suited and versatile approach for addressing a broad spectrum of geotechnical reliability challenges.
This study presents a robust methodology where the active learning strategy plays a pivotal role in identifying the most informative training samples, primarily those located near the limit state surface. This strategic approach allows the surrogate model to be developed iteratively, promoting both precision and resource efficiency. Furthermore, AEGPR-MCS’s adaptive ensemble model dynamically adjusts the weights of its components, thus bolstering its robustness across diverse geotechnical systems.
1. Introduction
Reliability analysis in geotechnical engineering often involves complex and inherently implicit mechanical models, which are computationally intensive and often inefficient (Wan et al., 2025). To address these challenges and enhance computational efficiency, surrogate models (e.g. polynomial regression, Kriging and neural networks) are commonly used as simplified and computationally efficient alternatives to these intricate geotechnical models. By doing so, the computational burden can be significantly reduced (Li et al., 2016; Yang and Prayogo, 2022; Phoon and Zhang, 2023; Ma et al., 2024). Recent advancements in artificial intelligence have greatly accelerated the progress of surrogate models in geotechnical engineering applications (Metya et al., 2017; Yang and Prayogo, 2022; Shen et al., 2022; Phoon and Zhang, 2023; Singh et al., 2023; Liu et al., 2024a). However, despite their clear advantages in improving computational efficiency, reliability methods that rely on surrogate models do not always guarantee high predictive accuracy. Inevitably, approximating the original mechanical model through surrogate models introduces additional uncertainties and errors (Li et al., 2016; Liu et al., 2022a; Liu et al., 2025a). The computational challenges associated with geotechnical reliability analysis using surrogate models can be categorized into two key aspects: the suitability of the selected surrogate model for capturing the underlying behavior of the system and the effectiveness of the training process, which dictates how accurately the surrogate model represents the original complex model. Addressing these factors is essential for ensuring both the efficiency and precision of reliability assessments in geotechnical engineering.
Regarding the first aspect, surrogate model selection, different geotechnical structures (e.g. tunnels, excavations, slopes, retaining walls and footings) have significant differences in terms of physical assumptions, material properties and loading conditions. Thus, selecting a surrogate model should depend on the specific engineering problem being addressed. Different surrogate models offer distinct advantages regarding complexity and nonlinear fitting capability (Teixeira et al., 2021; Singh et al., 2023). While some surrogate models can perform well for specific geotechnical structures, they may not be as effective for other types of geotechnical systems (Singh et al., 2023). Consequently, a detailed comparison and evaluation of different surrogate models is necessary before conducting a reliability analysis to ensure the selection of the most appropriate model. However, this process may not be effective in practice because of the time and resources required. Fortunately, the ensemble model, which combines multiple learning models, can achieve better predictive performance than a single model when applied to various reliability problems (Opitz and Maclin, 1999). In theory, ensemble models can be used to construct robust models that are suitable for different geotechnical structures (Cheng and Lu, 2020; Liu et al., 2022b). However, the use of ensemble models in reliability analysis for geotechnical structures is still relatively unexplored. As a result, there is an urgent need to develop a robust and efficient surrogate model that can be applied to a broad range of geotechnical structures.
Regarding the second aspect, the surrogate model must be properly trained to accurately replicate the original mechanical model. In general, increasing the number of training samples reduces prediction error. However, more training samples also necessitate additional evaluations of the original performance function (PF), resulting in numerous deterministic numerical simulations for geotechnical engineering, such as finite element analyses that can be extremely time-consuming (Li et al., 2016; Jiang et al., 2022; Ren et al., 2022; Yang and Prayogo, 2022; Liu et al., 2022a, 2022b; Liu et al., 2023; Liu et al., 2024b). This, in turn, increases the computational burden, which runs counter to the original purpose of using a surrogate model. At present, the training samples’ number is typically determined through empirical judgment, which makes it challenging to identify the optimal sample size. This further complicates the balance between accuracy and efficiency in surrogate modeling. Recently, active learning strategies have gained significant attention (Pan and Dias, 2017; Al-Bittar et al., 2018; Liu et al., 2022b). The active learning strategy facilitates the dynamic selection of the most informative samples, enabling iterative updates to the surrogate model. This approach not only enhances prediction accuracy but also reduces the number of training samples needed. This strategy shows great potential for addressing the challenges associated with training surrogate models efficiently while maintaining high accuracy.
This research aims to present a robust and efficient methodological framework that combines ensemble modeling, active learning approaches and Monte Carlo simulation (MCS) to address current challenges in geotechnical reliability assessment. The framework focuses on two key aspects: enhancing computational performance while maintaining analytical stability across diverse geotechnical applications. The integration of the ensemble learning model strengthens the surrogate model’s adaptability, enabling reliable analysis across various geotechnical scenarios. To optimize computational resources, the active learning strategy is incorporated that dynamically adjusts the analysis process. The study is structured as follows. The theoretical foundations of the ensemble modeling approach are first introduced in Section 2, followed by a detailed methodology of the proposed framework in Section 3. To validate the proposed framework, Section 4 examines three geotechnical structures that showcase the framework’s precision, computational efficiency and robustness. The key findings and conclusions are summarized in Section 5.
2. Ensemble model construction
2.1 Foundations of the ensemble model
The predictive power of a single learner model is usually optimized for a specific problem, and its performance is typically limited when applied to other engineering problems. Ensemble learning addresses this limitation by combining different base learners to create an ensemble model through weighted averaging, as illustrated in Figure 1 (Cheng and Lu, 2020; Liu et al., 2022b):
The diagram illustrates an ensemble learning architecture in which an Input is provided simultaneously to Learner 1, Learner 2, and Learner m. Each Learner processes the Input independently and sends its result forward through weighted connections labelled g 1, g 2, and g m. These weighted outputs converge into a single Ensemble model block, which aggregates the individual Learner contributions. The Ensemble model then generates a final Output, showing how multiple models are combined to improve overall predictive performance.Fundamentals of ensemble model
The diagram illustrates an ensemble learning architecture in which an Input is provided simultaneously to Learner 1, Learner 2, and Learner m. Each Learner processes the Input independently and sends its result forward through weighted connections labelled g 1, g 2, and g m. These weighted outputs converge into a single Ensemble model block, which aggregates the individual Learner contributions. The Ensemble model then generates a final Output, showing how multiple models are combined to improve overall predictive performance.Fundamentals of ensemble model
where denotes the predicted PF value by the ensemble model; x is the input random variable; m represents the number of component learners; and and gi(x) are the predicted output response and weights of the i-th learner, respectively. Ensemble learning operates on the principle of assigning distinct weights to each learner throughout the model training process. By reducing the weights of poorly performing learners and increasing the weights of those with stronger predictive capabilities, the ensemble model effectively filters out less effective learners while retaining the most accurate ones. Consequently, the ensemble model’s robustness and overall predictive performance are substantially improved.
In this study, the weights of each learner within the ensemble model are determined using the following heuristic algorithm (Goel et al., 2007; Liu et al., 2022b):
where α and β are the importance coefficients of the average predicted response and the component model response, respectively (Goel et al., 2007). Smaller values of α and β result in larger weights being assigned to the component models. In this study, α and β are set to 0.05 and -1 (Goel et al., 2007), respectively. Ei represents the generalized mean squared cross-validation error of the i-th component learner, which can be evaluated as follows:
where yi(j) denotes the true values of the j-th training sample, while represents the predictions of the i-th component learner for the j-th training sample; k represents the number of subsample partitions used for cross-validation, usually set to match the total number of training samples.
It is widely recognized that increasing the number of learners in an ensemble model, while promoting greater diversity in their predictive capabilities, typically improves the overall performance of the ensemble model. Given that Gaussian process regression (GPR) has significant advantages in fitting complex and nonlinear problems in geotechnical engineering, three types of GPRs, namely, the squared exponential kernel, Matérn kernel and automatic relevance determination (ARD) kernel – are chosen as component learners for constructing the ensemble model in this study (Williams and Rasmussen, 2006; Pal and Deswal, 2010). The detailed description of this approach is provided in the following section.
2.2 Gaussian process regression
GPR models the input data as samples drawn from a multivariate Gaussian distribution, using a kernel function to describe the similarity between input features. GPR not only fits a function to the input data but also quantifies the uncertainty of its predictions, making it particularly effective in scenarios where the data set is limited. Given the input variable x and the target output f(x), GPR assumes that for any set of inputs {x1, x2,…, xn}, the corresponding outputs {f(x1), f(x2),…, f(xn)} follow a multivariate Gaussian distribution (Williams and Rasmussen, 2006; Liu et al., 2022a):
where m(.) is the mean values; is the kernel function, which defines the correlation between inputs.
Let the input data be x = [x1, x2,…, xn]T and the corresponding response be y = [y1, y2,…, yn]T. This data set is used to build the GPR model, which is then applied to predict the response value y* for a new input x*. In GPR, the relationship between the input x and the response y can be written as (Williams and Rasmussen, 2006; Liu et al., 2022a):
where f(xi) is the latent underlying function, and represents the observational noise. Then, the kernel function is used to construct the covariance matrix K for the training data:
The covariance matrix of the response data y can thus be represented as , where Ie is the identity matrix, and denotes the noise variance. The joint distribution of output y* and the training responses y is:
where represents the covariance between the new input x* and each training sample x, and denotes the covariance of the new point with itself. Using the properties of conditional distributions in multivariate Gaussian distributions, the posterior distribution can be obtained (Williams and Rasmussen, 2006; Pal and Deswal, 2010):
where the predictive mean µ* and predictive variance σ*2 are given by:
Using equations (7)–(12), GPR can generate both the predicted mean and variance for any new input x*, with the variance indicating the level of uncertainty in the prediction.
A crucial aspect of the GPR prediction process is selecting the appropriate kernel function. The kernel function defines the covariance between pairs of input points and plays a key role in shaping the properties of the function that the GPR model seeks to learn. It influences how data points are correlated, ultimately shaping the predictions of the model. The careful selection of a kernel function can greatly improve the performance of GPR. The squared exponential kernel, Matérn kernel and ARD kernel are selected as component learners because of their unique characteristics. The Squared Exponential kernel is well-suited for scenarios involving smooth and continuously varying data. The Matérn kernel, on the other hand, is better suited for irregular or less smooth data; its flexibility makes it ideal for real-world situations that exhibit abrupt changes or non-smooth characteristics. The ARD kernel is often used for automatic feature selection, particularly in high-dimensional data sets where the importance of features varies significantly. The detailed expressions of these three kernels (Williams and Rasmussen, 2006) are provided in Table 1.
3. Reliability analysis using the adaptive ensemble Gaussian process regression model
This section introduces a novel computational framework, termed AEGPR-MCS, which integrates adaptive ensemble techniques with MCS for geotechnical reliability analysis. The framework uses active learning principles to dynamically identify and select optimal training samples that maximize information gain. After achieving sufficient training accuracy, this optimized ensemble model enables efficient estimation of failure probability (pf) through the following equation (Yang and Prayogo, 2022; Jiang et al., 2024; Liu et al., 2024a; Liu et al., 2025b; Li et al., 2025a):
where G(x) is the PF value of the geotechnical systems; is the predicted PF value of the ensemble GPR model; NMCS is the number of MCS simulations; I(.) is an indicator function, if is less than zero, I(.) equals 1; otherwise, I(.) equals 0. It is important to note that once the ensemble model is well established, the estimation of pf can be directly obtained from this model without the need for repeated deterministic analyses of the geotechnical system, thereby significantly improving computational efficiency.
3.1 Detailed implementation procedure
The step-by-step procedure for implementing the proposed AEGPR-MCS method is as follows:
Define the profile of the geotechnical structure and establish the corresponding geotechnical analysis model. Collect statistics for the soil uncertainty variables, including means, standard deviations and probability distributions (Baecher and Christian, 2005).
Use the MCS method to generate NMCS random samples that follow the statistical properties of the soil uncertainty variables. These samples constitute the candidate sample pool S.
Construct the ensemble GPR model for geotechnical engineering using the method described in Section 2. Then, use equation (13) to estimate the pf for the geotechnical structure.
Determine whether the iterative computation should terminate. If the stopping criterion is met, proceed to step (5). Otherwise, use the active learning strategy to select the optimal training sample xc from the sample pool S. Calculate the PF value for the sample xc by using the geotechnical analysis model, and then add xc and its response value to the training data set T. Return to step (3) and continue the process.
Terminate the iterative computation and output the final pf evaluated by step (4).
3.2 An active learning strategy
The active learning strategy is an adaptive approach for sample selection. It intelligently identifies the most valuable samples for the better construction of the ensemble GPR model and improves the construction efficiency of the modeling process (Liu et al., 2022a). To determine the optimal training samples, this study proposes an active learning strategy that selects sample points from the Monte Carlo sample pool S that contribute most significantly to the evaluation of pf. In this study, the optimal samples selected through the active learning strategy are assumed to meet the following two conditions: the new sample should be placed as close as possible to the limit state surface (LSS), where G(x) = 0, and it should also be positioned far from the current training samples. For a newly added sample, it should satisfy the following equation:
Next, the optimal sample xc is then selected as the new training sample using the following formulas:
where represents the coordinates of the candidate samples extracted using equation (14), and represents the coordinates of current training samples. Through equations (14)–(16), the optimal training sample xc can be identified from the sample pool.
3.3 Stopping criterion
To determine when to terminate the iterative process, this section implemented a stopping criterion that evaluates the model’s training adequacy through probability of failure stability. Following the approach suggested by Liu et al. (2022b), the termination condition is formulated as:
where pf,iter(5) represents the pf obtained from the ensemble model in the last sequence of five iterations. The active learning process stops when the ratio r falls below a predefined threshold rt, indicating that the pf has stabilized. Depending on the problem being analyzed, rt is set as 0.3% in this study.
4. Illustrative examples
In this section, three different engineering systems, including a numerical system, a slope and a footing, are considered as examples to illustrate the effectiveness of the proposed AEGPR-MCS method. Example 1 involves an explicit expression, while Examples 2 and 3 involve implicit expressions that require finite element software to solve for the PF. The pf estimated using the direct MCS method serves as the benchmark for validating the accuracy of the AEGPR-MCS method. In addition, ten independent analyses are conducted to minimize uncertainty in the stochastic analysis. The mean results, along with the 95% confidence intervals, are used to verify the robustness of the AEGPR-MCS method. The mean evaluations (Neval) of the original PF are used to quantify computational efficiency. This metric is then compared with other available methods to validate the efficiency of the proposed approach.
4.1 Example 1: the numerical system
To illustrate the proposed method more effectively, a numerical example (Teixeira et al., 2021; Pan et al., 2021; Li et al., 2021) involving a function of two random variables is considered here, denoted as x = [x1, x2]T. Both variables are independent and are assumed to follow the standard normal distributions. The PF of this system is defined as follows:
First, setting G(x) = 0 yields the theoretical LSS, as shown in Figure 2. Then, a candidate sample pool S is formed by generating 1 × 106 samples using the MCS. According to Li et al. (2021), 16 initial sample points are uniformly selected within the interval [−4, 4]. Based on these initial points and their corresponding PF values, an ensemble prediction model is first constructed. As shown in Figure 3(a), it is evident that the predicted LSS from the ensemble model deviates from the theoretical LSS. To address this discrepancy, the proposed active learning strategy is then used to find the optimal added sample xc. As illustrated in Figure 3(a), the sample xc is located near the current predicted LSS and is relatively far from the existing training sample points, indicating that the training value for the ensemble model at this location is significant. The sample xc, along with its corresponding PF value, is then added to update the ensemble model and enhance the prediction of the LSS. As shown in Figure 3(b), the updated LSS aligns more closely with the theoretical LSS, demonstrating improved accuracy. As the computational process progresses, the ensemble model is continuously refined, resulting in the predicted LSS gradually converging toward the theoretical LSS. This convergence is illustrated in Figure 3(a)–(d). The iterative process stops when 20 additional training samples have been added, and at this point, the predicted LSS closely matches the theoretical LSS, except in the corner regions. Note that the majority of the newly selected training samples are concentrated around the theoretical LSS. These new samples play a vital role in accurately approximating the theoretical LSS while effectively reducing redundant samples in other regions, thus significantly enhancing the efficiency of subsequent reliability analysis.
The scatter plot presents x 1 on the horizontal axis and x 2 on the vertical axis, ranging approximately from negative 6 to positive 8 on both axes. A dense cloud of Monte Carlo simulation samples fills the central region. Discrete initial samples are distributed around and within this cloud at regular intervals. A closed, irregular boundary labelled theoretical G of x equals 0 encloses a central feasible region with curved and straight segments. The boundary intersects near x 1 around negative 1 and positive 2 at x 2 around 5, and near x 2 around negative 5 at x 1 around negative 1. The figure illustrates the relationship between sampled points and the theoretical limit state surface.The theoretical LSS and MCS samples in Example 1
The scatter plot presents x 1 on the horizontal axis and x 2 on the vertical axis, ranging approximately from negative 6 to positive 8 on both axes. A dense cloud of Monte Carlo simulation samples fills the central region. Discrete initial samples are distributed around and within this cloud at regular intervals. A closed, irregular boundary labelled theoretical G of x equals 0 encloses a central feasible region with curved and straight segments. The boundary intersects near x 1 around negative 1 and positive 2 at x 2 around 5, and near x 2 around negative 5 at x 1 around negative 1. The figure illustrates the relationship between sampled points and the theoretical limit state surface.The theoretical LSS and MCS samples in Example 1
The image contains four panels labelled a to d, each plotting x 1 on the horizontal axis and x 2 on the vertical axis, with values from about negative 6 to positive 6. In all panels, a closed irregular boundary labelled theoretical G of x equals 0 defines a central region. A dashed curve labelled ensemble model approximates this boundary. Circular markers show initial samples distributed across the space. Square markers show selected samples added progressively. Panel a shows one selected sample near the lower left boundary. Panel b shows several selected samples along the upper and right boundary. Panel c adds more selected samples tracing most boundary segments. Panel d shows dense selected samples closely following the entire theoretical boundary.The active learning process of reliability analysis in Example 1
Note(s): (a) 1 selected sample, (b) 4 selected samples, (c) 10 selected samples and (d) 20 selected samples
The image contains four panels labelled a to d, each plotting x 1 on the horizontal axis and x 2 on the vertical axis, with values from about negative 6 to positive 6. In all panels, a closed irregular boundary labelled theoretical G of x equals 0 defines a central region. A dashed curve labelled ensemble model approximates this boundary. Circular markers show initial samples distributed across the space. Square markers show selected samples added progressively. Panel a shows one selected sample near the lower left boundary. Panel b shows several selected samples along the upper and right boundary. Panel c adds more selected samples tracing most boundary segments. Panel d shows dense selected samples closely following the entire theoretical boundary.The active learning process of reliability analysis in Example 1
Note(s): (a) 1 selected sample, (b) 4 selected samples, (c) 10 selected samples and (d) 20 selected samples
Figure 4 illustrates the iterative calculation process in Example 1, where the MCS results, evaluated using 1 × 106 samples, serve as the benchmark. As illustrated in Figure 4(a), with the increasing of the number of training samples, the calculated pf progressively approaches the benchmark value. Figure 4(b) shows the variation in weights for each learner in the ensemble GPR model. It can be observed that the weight assigned to the Matérn GPR is higher than those of the Squared Exponential RBF and ARD RBF, indicating that, in this example, the Matérn component plays a more dominant role in the ensemble model.
The image has two panels labelled a and b. In panel a, the horizontal axis shows the number of added training samples from 1 to 20, and the vertical axis shows p f scaled by 10 to the power negative 3. A dashed horizontal line marks the M C S result near 4.4. A red line with circular markers shows the ensemble model estimate rising rapidly from about 2.6, then gradually converging and stabilising close to the M C S value after roughly 10 samples. In panel b, the horizontal axis again shows the number of added training samples, and the vertical axis shows weight factors from 0 to 1. Three curves are plotted: squared exponential decreases steadily after an initial peak, Matern drops sharply early then increases and becomes dominant, and A R D remains relatively stable with moderate variation.Iterative calculation results in Example 1
Note(s): (a) Variation in probability of failure, and (b) variation in the weight factors of each component GPR model
The image has two panels labelled a and b. In panel a, the horizontal axis shows the number of added training samples from 1 to 20, and the vertical axis shows p f scaled by 10 to the power negative 3. A dashed horizontal line marks the M C S result near 4.4. A red line with circular markers shows the ensemble model estimate rising rapidly from about 2.6, then gradually converging and stabilising close to the M C S value after roughly 10 samples. In panel b, the horizontal axis again shows the number of added training samples, and the vertical axis shows weight factors from 0 to 1. Three curves are plotted: squared exponential decreases steadily after an initial peak, Matern drops sharply early then increases and becomes dominant, and A R D remains relatively stable with moderate variation.Iterative calculation results in Example 1
Note(s): (a) Variation in probability of failure, and (b) variation in the weight factors of each component GPR model
To evaluate the performance of AEGPR-MCS, this study conducted a comprehensive comparison against several established methodologies, as detailed in Table 2. These include advanced surface approximation techniques such as K-RSM (Li et al., 2021) and PCK-ARBIS (Pan et al., 2021), machine learning-based approaches such as AK-MCS (Echard et al., 2011), ASVM-MCS (Pan and Dias, 2017) and RVM-MCS (Li et al., 2021), as well as neural network and ensemble methods including NN-IS (Schueremans and Van Gemert, 2005) and AERBF-MCS (Liu et al., 2022a). The comparative analysis reveals that AEGPR-MCS achieves comparable accuracy with the benchmark solution in failure probability estimation, matching the performance of other methodologies except for NN-IS. Notably, the proposed AEGPR-MCS method requires only 37.7 PF evaluations on average, representing a significant reduction in computational cost compared to alternative approaches. The method’s reliability is further validated by the tight 95% confidence bounds observed in the PF estimates, indicating strong statistical consistency.
Reliability results for different methods in Example 1
| Method | Neval | Probability of failure | Reference |
|---|---|---|---|
| MCS | 1.0 × 106 | 4.395 × 10–3 (COVpf = 1.51%) | Benchmark |
| K-RSM | 238 | 4.443 × 10–3 | Li and Yang (2019) |
| AK-MCS | 126 | 4.420 × 10–3 | Echard et al. (2011) |
| ASVM-MCS | 99 | 4.460 × 10–3 | Pan and Dias (2017) |
| RVM-MCS | 76 | 4.430 × 10–3 | Li et al. (2021) |
| PCK-ARBIS | 59 | 4.450 × 10–3 | Pan et al. (2021) |
| NN-IS | 52 | 5.700 × 10–3 | Schueremans and Van Gemert (2005) |
| AERBF-MCS | 49.2 | 4.398 × 10–3 | Liu et al. (2022a) |
| AEGPR-MCSa | 37.7 | 4.423 × 10–3 | This study |
| AEGPR-MCSb | [35.18, 40.22] | [4.4386 × 10–3, 4.461 × 10–3] | This study |
| Method | Neval | Probability of failure | Reference |
|---|---|---|---|
| 1.0 × 106 | 4.395 × 10–3 ( | Benchmark | |
| K-RSM | 238 | 4.443 × 10–3 | |
| AK-MCS | 126 | 4.420 × 10–3 | |
| ASVM-MCS | 99 | 4.460 × 10–3 | |
| RVM-MCS | 76 | 4.430 × 10–3 | |
| PCK-ARBIS | 59 | 4.450 × 10–3 | |
| NN-IS | 52 | 5.700 × 10–3 | |
| AERBF-MCS | 49.2 | 4.398 × 10–3 | |
| AEGPR-MCSa | 37.7 | 4.423 × 10–3 | This study |
| AEGPR-MCSb | [35.18, 40.22] | [4.4386 × 10–3, 4.461 × 10–3] | This study |
aThe mean results from ten independent runs; b95% confidence intervals
4.2 Example 2: a soil slope
To validate the proposed methodology, this section analyzes a c-ϕ soil slope case study originally investigated by Zeng et al. (2022). The geometry of the slope is illustrated in Figure 5, featuring a 5 m high slope inclined at 26.6° with a soil unit weight of 17.64 kN/m³. Two key geotechnical parameters are treated as stochastic variables: soil cohesion c and internal friction angle ϕ, both characterized by lognormal distributions. The probabilistic model assigns mean values of 9.8 kPa and 10° to c and ϕ, respectively, with corresponding standard deviations of 3 kPa and 2°. Slope stability assessment uses the simplified Bishop method to compute the factor of safety (FS). The deterministic analysis using mean parameter values yields an FS of 1.343, closely aligning with the previously reported value of 1.34 by Zeng et al. (2022). The associated critical slip surface is visualized in Figure 5.
The image illustrates a two-dimensional slope cross-section with distance in metres on the x-axis from 0 to 30 and elevation in metres on the y-axis from 0 to 15. The ground surface is shown as a black polyline with a horizontal upper bench at about 10 metres elevation, a downward sloping face to about 5 metres elevation near 20 metres distance, and a lower horizontal bench extending to about 25 metres distance. A red dashed curved line within the slope depicts the critical deterministic slip surface. An arrow points to this curve with the label Critical deterministic slip surface, F S equals 1.343, indicating the factor of safety associated with the failure surface.Profile of the slope in Example 2
The image illustrates a two-dimensional slope cross-section with distance in metres on the x-axis from 0 to 30 and elevation in metres on the y-axis from 0 to 15. The ground surface is shown as a black polyline with a horizontal upper bench at about 10 metres elevation, a downward sloping face to about 5 metres elevation near 20 metres distance, and a lower horizontal bench extending to about 25 metres distance. A red dashed curved line within the slope depicts the critical deterministic slip surface. An arrow points to this curve with the label Critical deterministic slip surface, F S equals 1.343, indicating the factor of safety associated with the failure surface.Profile of the slope in Example 2
Assuming that the uncertainty parameters c and ϕ are represented by two independent log-normal random variables, the MCS sample pool for c and ϕ is first generated, consisting of 1 × 106 samples, as shown in Figure 6. Given that this slope reliability problem is not overly complex, only four initial samples are selected to build the initial ensemble model. These initial samples, along with their corresponding PF values, are used to create the training data set T, with the locations of these sample points illustrated in Figure 6.
The plot presents shear strength parameters with cohesion c in kilopascal on the horizontal axis from 0 to 30 and friction angle phi in degrees on the vertical axis from 0 to 25. A dense cloud of Monte Carlo simulation samples fills the central region, representing the joint distribution of c and phi. Several initial samples appear as distinct marked points within this cloud. A solid curved boundary labelled theoretical G x equals 0 runs from high phi at low c toward zero phi near c equals 15, separating safe and failure regions in the parameter space.The theoretical LSS and MCS samples in Example 2
The plot presents shear strength parameters with cohesion c in kilopascal on the horizontal axis from 0 to 30 and friction angle phi in degrees on the vertical axis from 0 to 25. A dense cloud of Monte Carlo simulation samples fills the central region, representing the joint distribution of c and phi. Several initial samples appear as distinct marked points within this cloud. A solid curved boundary labelled theoretical G x equals 0 runs from high phi at low c toward zero phi near c equals 15, separating safe and failure regions in the parameter space.The theoretical LSS and MCS samples in Example 2
Then, as illustrated in Figure 7(a), the initial predicted LSS produced by the ensemble model shows significant deviation from the actual LSS, highlighting the need for further model updating. Subsequently, candidate samples are obtained using equation (14), and the optimal sample xc is chosen based on equations (15) and (16). As shown in Figure 7, the selected optimal samples xc are significantly distant from the existing training sample points. Once the sample point xc and its corresponding PF value are added to the training data set T, the ensemble model is retrained. As shown in Figure 7(b), the predicted LSS of the retrained ensemble model closely aligns with the theoretical LSS, indicating a significant improvement in prediction accuracy near the theoretical LSS. Finally, after seven consecutive iterations, the stopping condition for the computation is met, as illustrated in Figure 7(a)–(d).
The image contains four panels labelled a, b, c, and d. Each panel shows a two-dimensional plot with c in kilopascals on the horizontal axis, ranging from 0 to 30, and phi in degrees on the vertical axis, ranging from 0 to 25. A solid black curve labelled theoretical G x equals 0 slopes downward from high phi at low c to phi equals 0 at c around 15. A red dashed curve labelled ensemble model closely follows the black curve in each panel. Green open circles labelled initial samples appear at discrete c and phi positions above the curves. Blue square markers labelled selected samples appear progressively closer to the black and red curves from panel a to panel d, increasing in number across panels.Iterative calculation process of reliability analysis in Example 2
Note(s): (a) After selecting one sample, (b) after selecting two samples, (c) after four selected samples and (d) after seven selected samples
The image contains four panels labelled a, b, c, and d. Each panel shows a two-dimensional plot with c in kilopascals on the horizontal axis, ranging from 0 to 30, and phi in degrees on the vertical axis, ranging from 0 to 25. A solid black curve labelled theoretical G x equals 0 slopes downward from high phi at low c to phi equals 0 at c around 15. A red dashed curve labelled ensemble model closely follows the black curve in each panel. Green open circles labelled initial samples appear at discrete c and phi positions above the curves. Blue square markers labelled selected samples appear progressively closer to the black and red curves from panel a to panel d, increasing in number across panels.Iterative calculation process of reliability analysis in Example 2
Note(s): (a) After selecting one sample, (b) after selecting two samples, (c) after four selected samples and (d) after seven selected samples
Figure 8(a) depicts the results of the iterative calculations for Example 2. As the quantity of additional training sample points rises, the computed pf progressively approaches the benchmark established by the MCS method. Figure 8(b) depicts the fluctuations in the weights of each learner for the ensemble GPR model throughout the active learning process. The weights assigned to each GPR learner range from approximately 30% to 40%, indicating that each component model makes a similar contribution to the overall ensemble model.
The image contains two panels labelled a and b. Panel a plots p f on the vertical axis from about 0.06 to 0.085 against the number of added training samples from 1 to 7 on the horizontal axis. A black dashed line labelled M C S result remains near 0.08 across all samples. A red line with open circle markers labelled ensemble model increases from about 0.071 at 1 sample to about 0.08 at 3 samples and then stays close to 0.08 through 7 samples. Panel b plots weight factors on the vertical axis from 0.1 to 0.5 against the number of added training samples from 1 to 7. Three lines are shown: a red solid line labelled squared exponential, a blue dashed line labelled Matern, and a magenta dashed line labelled A R D. The three lines vary across samples, with Matern generally higher after 3 samples, squared exponential fluctuating around mid values, and A R D starting lower, peaking near sample 2, then increasing again toward sample 7.Iterative calculation results in Example 2
Note(s): (a) Variation in probability of failure, and (b) variation in the weight factors of each component GPR model
The image contains two panels labelled a and b. Panel a plots p f on the vertical axis from about 0.06 to 0.085 against the number of added training samples from 1 to 7 on the horizontal axis. A black dashed line labelled M C S result remains near 0.08 across all samples. A red line with open circle markers labelled ensemble model increases from about 0.071 at 1 sample to about 0.08 at 3 samples and then stays close to 0.08 through 7 samples. Panel b plots weight factors on the vertical axis from 0.1 to 0.5 against the number of added training samples from 1 to 7. Three lines are shown: a red solid line labelled squared exponential, a blue dashed line labelled Matern, and a magenta dashed line labelled A R D. The three lines vary across samples, with Matern generally higher after 3 samples, squared exponential fluctuating around mid values, and A R D starting lower, peaking near sample 2, then increasing again toward sample 7.Iterative calculation results in Example 2
Note(s): (a) Variation in probability of failure, and (b) variation in the weight factors of each component GPR model
Table 3 summarizes the reliability results of the AEGPR-MCS method and other methods reported in the literature. It is clearly seen that the AEGPR-MCS method provides an accurate estimate of pf when compared to the exact solution, while requiring fewer evaluations. This demonstrates its superior computational efficiency relative to other methods. Additionally, the narrow 95% confidence intervals for Neval and pf further emphasize the simulation exhibits lower uncertainty, reinforcing the robustness of the proposed AEGPR-MCS method.
Reliability results for different methods in Example 2
| Method | Neval | Probability of failure | Reference |
|---|---|---|---|
| MCS | 100,000 | 0.0798 (COVpf = 1.07%) | Benchmark |
| LHS | 10,000 | 0.0775 | Zeng et al. (2022) |
| BCM | 79 | 0.0784 | Zeng et al. (2022) |
| ASVM-MCS | 65 | 0.0748 | Zhang et al. (2015) |
| AK-MCS | 49 | 0.0736 | Zhang et al. (2015) |
| Modified QRSM | 15 | 0.0694 | Zhang et al. (2015) |
| AEGPR-MCSa | 12.6 | 0.0796 | This study |
| AEGPR-MCSb | [11.58, 13.62] | [0.0790, 0.0801] | This study |
| Method | Neval | Probability of failure | Reference |
|---|---|---|---|
| 100,000 | 0.0798 ( | Benchmark | |
| 10,000 | 0.0775 | ||
| 79 | 0.0784 | ||
| ASVM-MCS | 65 | 0.0748 | |
| AK-MCS | 49 | 0.0736 | |
| Modified | 15 | 0.0694 | |
| AEGPR-MCSa | 12.6 | 0.0796 | This study |
| AEGPR-MCSb | [11.58, 13.62] | [0.0790, 0.0801] | This study |
LHS = Latin hypercube sampling; BCM = binary classification method; Modified QRSM represents the modified quadratic response surface method; aThe mean results from 10 independent runs; b95% confidence intervals
4.3 Example 3: a strip footing on the three-layered soil
This example involves a strip footing on three-layered soil, as presented by Liu et al. (2022b) and Pan and Dias (2017), and is used to further validate the AEGPR-MCS method, as shown in Figure 9. The width of the footing is 4.0 m, and it is subjected to a uniformly distributed vertical load of q = 600 kPa. The thicknesses of soil layers are h1 = 2 m, h2 = 3 m and h3 = 20 m, respectively. As suggested by the related research (Bauer and Puła, 2000), the soil layers are modeled as the linear elastic material to estimate the maximum settlement of the footing. The elastic modulus E and Poisson’s ratio υ, are treated as random variables, x = [E1, υ1, E2, υ2, E3, υ3]T. The statistics (Bauer and Puła, 2000) of these uncertain parameters are presented in Table 4, where the elastic modulus follows the lognormal distribution and the Poisson’s ratio follows the beta distribution.
The figure has two panels labelled a and b. Panel a shows a rectangular soil domain that is 38 metres wide and 25 metres deep. A uniform surface load is applied at the top and labelled q equals 600 kilopascal. Three horizontal layer boundaries are shown with labels h 1 equals 2 metres, h 2 equals 3 metres, and h 3 equals 20 metres. Panel b shows the calculated settlement field S in millimetres over the same domain. Settlement values range from 0 to 100 millimetres according to the scale. The highest settlement occurs near the upper left boundary below the applied load and gradually decreases with depth and horizontal distance across the domain.The strip footing on multi-layered subsoil
Note(s): (a) Finite element model, and (b) settlement contour of deterministic analysis
The figure has two panels labelled a and b. Panel a shows a rectangular soil domain that is 38 metres wide and 25 metres deep. A uniform surface load is applied at the top and labelled q equals 600 kilopascal. Three horizontal layer boundaries are shown with labels h 1 equals 2 metres, h 2 equals 3 metres, and h 3 equals 20 metres. Panel b shows the calculated settlement field S in millimetres over the same domain. Settlement values range from 0 to 100 millimetres according to the scale. The highest settlement occurs near the upper left boundary below the applied load and gradually decreases with depth and horizontal distance across the domain.The strip footing on multi-layered subsoil
Note(s): (a) Finite element model, and (b) settlement contour of deterministic analysis
The two-panel chart illustrates how an ensemble model responds to increasing training data, with panels labelled A and B. In panel A, the x-axis shows the number of added training samples, and the y-axis shows p sub i. A red line with circular markers represents the Ensemble model, rising rapidly from near 0 to around 0.028 by about 10 samples, then fluctuating and stabilising close to a dashed horizontal line labelled M C S result at approximately 0.022 as samples increase toward 30. In panel B, the x-axis again shows the number of added training samples, and the y-axis shows Weight factors. Three lines depict kernel contributions: Squared Exponential as a red solid line remaining low and gradually declining, Matern as a blue dashed line remaining dominant around 0.7 to 0.9, and A R D as a pink dashed line fluctuating between about 0.1 and 0.35, indicating how ensemble weighting adapts as more training samples are added.Iterative calculation results in Example 2
Note(s): (a) Variation in probability of failure, and (b) variation in the weight factors of each component GPR model
The two-panel chart illustrates how an ensemble model responds to increasing training data, with panels labelled A and B. In panel A, the x-axis shows the number of added training samples, and the y-axis shows p sub i. A red line with circular markers represents the Ensemble model, rising rapidly from near 0 to around 0.028 by about 10 samples, then fluctuating and stabilising close to a dashed horizontal line labelled M C S result at approximately 0.022 as samples increase toward 30. In panel B, the x-axis again shows the number of added training samples, and the y-axis shows Weight factors. Three lines depict kernel contributions: Squared Exponential as a red solid line remaining low and gradually declining, Matern as a blue dashed line remaining dominant around 0.7 to 0.9, and A R D as a pink dashed line fluctuating between about 0.1 and 0.35, indicating how ensemble weighting adapts as more training samples are added.Iterative calculation results in Example 2
Note(s): (a) Variation in probability of failure, and (b) variation in the weight factors of each component GPR model
Statistics of the involved uncertainty parameters in Example 3
| Soil | Uncertain parameters | Mean | SD | Lower bound | Upper bound | Distribution |
|---|---|---|---|---|---|---|
| Soil 1 | E1 | 20 MPa | 3 MPa | – | – | Lognormal |
| υ1 | 0.4 | 0.02 | 0.34 | 0.46 | Beta | |
| Soil 2 | E2 | 40 MPa | 6 MPa | – | – | Lognormal |
| υ2 | 0.35 | 0.0175 | 0.30 | 0.40 | Beta | |
| Soil 3 | E3 | 60 MPa | 9 MPa | – | – | Lognormal |
| υ3 | 0.28 | 0.014 | 0.24 | 0.32 | Beta |
| Soil | Uncertain parameters | Mean | Lower bound | Upper bound | Distribution | |
|---|---|---|---|---|---|---|
| Soil 1 | E1 | 20 MPa | 3 MPa | – | – | Lognormal |
| υ1 | 0.4 | 0.02 | 0.34 | 0.46 | Beta | |
| Soil 2 | E2 | 40 MPa | 6 MPa | – | – | Lognormal |
| υ2 | 0.35 | 0.0175 | 0.30 | 0.40 | Beta | |
| Soil 3 | E3 | 60 MPa | 9 MPa | – | – | Lognormal |
| υ3 | 0.28 | 0.014 | 0.24 | 0.32 | Beta |
Considering the geometric symmetry, a half-domain finite element model is constructed as depicted in Figure 9(a). The boundary conditions are specified as follows: the vertical axis of symmetry permits only vertical deformation, while the base and lateral boundaries are fully restrained. The deterministic displacement field, computed using mean parameter values, is visualized through settlement contours in Figure 9(b). For serviceability assessment, the footing’s maximum settlement (S) is evaluated as a function of the random variable vector x. The PF characterizing the serviceability limit state is formulated based on the criterion that S must remain below an allowable threshold value, yielding the following mathematical expression:
where Δ represents the allowable value of the maximum settlement of the footing. According to the recommendations by Bauer and Puła (2000), Δ is set as 120 mm.
To capture the uncertainties in soil properties, the MCS with 100,000 samples is performed to estimate the benchmark using the finite element model. The AEGPR-MCS method is initially constructed using 10 initial samples, and 27 iterations are required to meet the stopping criterion for one independent run. Therefore, this method only needed 37 analyses of the finite element model to achieve a convergent reliability result, as shown in the iterative calculation process in Figure 10(a). From Figure 11(b), it can be observed that the Matérn GPR contributes the most, accounting for nearly 80% of the total weight. This suggests that the data in this footing problem exhibit multi-scale characteristics with irregular variations, rather than relatively smooth trends. Nevertheless, the results estimated from the AEGPR-MCS closely match the benchmark solution. This consistency is attributed to the method’s capability to reduce the negative impact of data structure through the application of adaptive weight coefficients.
In addition, Figure 11 shows the PF values for the newly added training samples. It can be observed that these values are all close to zero, indicating proximity to the theoretical LSS. The inclusion of these additional training samples improves the ensemble model’s prediction accuracy close to the theoretical LSS, thereby providing indirect evidence of the effectiveness of the active learning strategy. Table 5 presents the results for different methods in Example 3. The proposed AEGPR-MCS method requires the fewest evaluations, and the AERBF-MCS method exhibits both high computational efficiency and strong robustness.
The scatter plot illustrates the relationship between the number of added training samples on the x-axis and the value of P F on the y-axis. A dashed horizontal line marks zero on the P F scale. At low sample counts around 1 to 3, several points lie far below zero, reaching approximately negative 18 and negative 10. As the number of added training samples increases from about 5 onward, most points move closer to zero, fluctuating between about negative 3 and positive 3. Beyond roughly 10 samples, the points cluster tightly around zero, indicating stabilisation of P F values as more training data are added.The values of performance function for the added new samples in Example 3
The scatter plot illustrates the relationship between the number of added training samples on the x-axis and the value of P F on the y-axis. A dashed horizontal line marks zero on the P F scale. At low sample counts around 1 to 3, several points lie far below zero, reaching approximately negative 18 and negative 10. As the number of added training samples increases from about 5 onward, most points move closer to zero, fluctuating between about negative 3 and positive 3. Beyond roughly 10 samples, the points cluster tightly around zero, indicating stabilisation of P F values as more training data are added.The values of performance function for the added new samples in Example 3
Reliability analysis results for different methods in Example 3
| Method | Neval | Probability of failure | Reference |
|---|---|---|---|
| MCS | 1.0 × 105 | 0.0229 (COVpf = 2.07%) | Benchmark |
| AK-MCS | >110 | 0.0221 | Liu et al. (2022a) |
| AEGPR-MCSa | 41.57 | 0.0232 | This study |
| AEGPR-MCSb | [35.80, 48.34] | [0.0218, 0.0243] | This study |
| Method | Neval | Probability of failure | Reference |
|---|---|---|---|
| 1.0 × 105 | 0.0229 ( | Benchmark | |
| AK-MCS | >110 | 0.0221 | |
| AEGPR-MCSa | 41.57 | 0.0232 | This study |
| AEGPR-MCSb | [35.80, 48.34] | [0.0218, 0.0243] | This study |
aThe mean results from 10 independent runs; b95% confidence intervals
5. Conclusions
This study addresses the challenge in geotechnical reliability assessment: the efficient selection and implementation of surrogate models. This study developed AEGPR-MCS, an integrated computational framework that combines adaptive ensemble learning with MCS to enhance both accuracy and computational efficiency in the geotechnical reliability analysis. The methodology’s core innovation lies in its utilization of a multi-component ensemble model to approximate complex geotechnical PFs, coupled with an active learning strategy for optimal sample selection and model refinement. Through comprehensive validation using three geotechnical examples, the robust performance and computational advantages of the framework are demonstrated. The main conclusions are as follows:
The active learning strategy dynamically identifies the most relevant training samples, iteratively updating and refining the ensemble model and ensuring that the chosen samples are located near the LSS. By leveraging a small set of training samples, the ensemble model is effectively constructed to closely approximate the theoretical LSS, making this approach more efficient than other alternative methods.
The AEGPR-MCS method automatically eliminates underperforming component learners by adjusting the weights of the component models, thereby proving its robustness in addressing a range of geotechnical reliability problems. The AEGPR-MCS method performs well for different geotechnical structures and consistently provides relatively accurate results.
The AEGPR-MCS method uses only three different types of GPRs to construct the ensemble model and perform reliability analysis. Future research could incorporate other advanced machine learning methods into the proposed framework to enhance the construction of ensemble model.
Author contributions
Conceptualization, Xian Liu and Yadong Liu; Methodology, Xian Liu and Yadong Liu; Software, Bin Liu, Xian Liu, Jiashen Liu and Ancheng Wang; Validation, Xian Liu and Yadong Liu; Formal analysis, Xian Liu and Yadong Liu; Investigation, Bin Liu, Yadong Liu and Jiashen Liu; Resources, Bin Liu and Ancheng Wang; data curation, Bin Liu and Xian Liu; Writing – original draft preparation, Xian Liu; Writing – review and editing, Bin Liu, Yadong Liu and Jiashen Liu; Visualization, Bin Liu, Yadong Liu and Jiashen Liu; Supervision, Yadong Liu; Project administration, Yadong Liu; Funding acquisition, Yadong Liu. All authors have read and agreed to the published version of the manuscript.

