The use of hybrid surrogate modelling techniques for the life-cycle management of anchored quay walls was investigated. A synthetic dataset was generated using a simplified structural model based on classical earth pressure theory, representing a range of geometrical and hydraulic boundary conditions. Two neural network (NN) architectures were developed and compared: (a) a baseline feedforward neural network (FNN) using static input features and (b) a hybrid model combining bidirectional long short-term memory (BiLSTM) layers with dense layers (BiLSTM–FNN), which incorporates sequential displacement data. Both models were tuned across multiple trials with varying architectures, activation functions and learning rates. The final architectures were deployed in supervised learning to train surrogate models. The BiLSTM–FNN model outperformed the FNN, achieving significantly lower validation loss and superior predictive accuracy, but at a higher computational cost. This modelling approach provides an effective tool for estimating internal structural forces such as maximum bending moments, thereby supporting predictive maintenance and optimised design. The results demonstrate the potential of hybrid NN architectures within digital twin frameworks for port infrastructure, contributing to enhanced resilience and more efficient resource use.
Notation
- E
Young’s modulus of steel
- EI
bending stiffness of sheet pile profile
- H
total height of quay wall
- I
moment of inertia
- Mmax
maximum bending moment of quay wall
- N
number of data samples
- R2
coefficient of determination
- t
embedment depth of quay wall
- ux,y
horizontal displacement along quay wall
- Wa
water level on excavated side
sequential input array for recurrent layers
static input feature for dense layers
sequential input variable in
mean of actual values
actual value of output variable
predicted value of output variable
- za
anchorage level of quay wall
predicted subset
testing subset
training subset
validation subset
1. Introduction
The importance of sustainability in the civil engineering industry has grown significantly in recent years. Holistic approaches that consider environmental impacts across the entire project life cycle (Kendall et al., 2018; Samuelsson et al., 2024), based on standardised life-cycle assessment (LCA) methodologies (ISO, 2006a, 2006b), are now integrated from the early stages of geotechnical site investigation (Purdy et al., 2022), through the design and construction phases, often leveraging building information modelling approaches (Sanfilippo et al., 2025; Wan et al., 2025), to the maintenance and rehabilitation of existing infrastructure (Duffy et al., 2024).
A key component contributing to the LCA of geotechnical structures is the development of digital twins, which are virtual representations of physical assets maintained across their operational lifespan (Jiang et al., 2021). Developing accurate models that reflect real-world structures remains a significant challenge in geotechnical engineering, largely due to the complex and non-linear behaviour of soils (Atkinson, 2000). While real-time back-analyses using numerical simulations (Jaquès et al., 2024; Tian et al., 2024) and probabilistic methods (Contreras and Brown, 2019) can serve as digital twins, they are often computationally intensive and time-consuming, an important limitation in contexts where rapid decision-making is required to ensure structural safety. Consequently, surrogate modelling has emerged as a promising and efficient alternative for digital twin development in geotechnical applications.
Surrogate modelling is built upon two fundamental principles – the development of computationally efficient models and the preservation of sufficient predictive accuracy (Forrester et al., 2008). The aim is to identify simplified yet representative mappings of complex systems by learning input–output relationships (Queipo et al., 2005). A range of surrogate modelling techniques has been applied in geotechnical engineering. The response surface method has been used for the reliability assessment of reinforced earth structures (Ghazavi and Valinezhad-Torghabeh, 2024; Sarkar and Hegde, 2025; Yu and Bathurst, 2017). Kriging, based on Gaussian process regression, has supported offshore risk evaluations (Vazirizade and Haldar, 2021). Radial basis functions have been used in tunnel modelling (Khaledi et al., 2014) and polynomial chaos expansion has proven useful in assessing the reliability of geosynthetic-reinforced retaining walls under seismic loading (Alhajj Chehade et al., 2023).
Among these, surrogate models based on artificial neural networks (NNs) have demonstrated strong predictive power and versatility, which has driven their growing adoption in civil engineering, despite challenges such as the need for large datasets and the inherent complexity of model interpretability. Their applications include tunnel stability in karst formations (Kovačević et al., 2021), prediction of soil strength parameters (Heshmati et al., 2012), liquefaction analysis around submarine pipelines (Kutanaei and Choobbasti, 2019), liquefaction-induced lateral spreading (Demir and Sahin, 2023), the uplift capacity of strip anchors (Duong et al., 2025), tunnel heading stability (Keawsawasvong et al., 2025), long-term subgrade performance (Deng et al., 2025), settlement prediction above subsurface cavities (Shubham et al., 2024), prediction of deep excavation responses (Tao et al., 2022), embankment consolidation (Tian et al., 2024), mapping displacement of soil nail walls (Liu et al., 2021) and applying long short-term memory (LSTM) networks in slope displacement prediction (Tang et al., 2021). Tao et al. (2024) employed NNs for time-dependent surrogate modelling of braced excavations, utilising the concept of transfer learning. Ruiz López et al. (2024) developed a surrogate model of vertical shafts in London clay. Similarly, Ferrero et al. (2023) investigated surrogate modelling of braced excavations for real-time back-analysis. El-Sekelly et al. (2025) applied explainable artificial intelligence for predicting the strength of bio-cemented sands utilising various machine learning models.
This paper presents a novel hybrid NN-based surrogate modelling framework that integrates static input parameters (e.g. embedment depth, wall geometry and material properties) with sequential data representing a structure’s horizontal displacements. The model architecture leverages a combination of bidirectional long short-term memory (BiLSTM) layers and fully connected dense layers to capture complex dependencies in the input space. The BiLSTM layers can process entire sequences of ordered data simultaneously, making them well suited for spatially dependent sequences such as horizontal displacements along a retaining wall. Unlike feedforward neural network (FNNs) with dense neurons, BiLSTMs can capture dependencies between positions in the sequence, analogous to temporal dependencies in time series, which allows the model to exploit spatial correlations even in long and short sequences. This approach allows the accurate prediction of maximum bending moments in quay walls, facilitating real-time structural assessment. This aligns with the UN Sustainable Development Goals (SDGs) (UN, 2015) (SDG 9: Industry, innovation and infrastructure, SDG 11: Sustainable cities and communities and SDG 12: Responsible consumption and production) by providing a scalable and computationally efficient tool that enhances infrastructure resilience, supports digitalisation in geotechnical engineering and promotes life-cycle extension through data-driven monitoring.
The aim of this work was to evaluate the capabilities and limitations of hybrid NN-based surrogate models using BiLSTMs in accurately predicting the maximum bending moment of a quay wall cross-section. The models use a combination of static input features and sequential horizontal displacement data to reduce the time and cost associated with traditional numerically based back-analysis, which requires extensive model calibration (e.g. soil parameters) to achieve an accurate system response consistent with field data. The proposed approach bypasses these intermediate calibration steps by directly linking measurable quantities to the desired analytical output. The performance of the hybrid models, integrating both dense and recurrent NNs, was compared against a baseline FNN model that relies solely on static inputs to demonstrate the advantages of incorporating sequential data into surrogate modelling.
The theoretical foundations and methodological framework are outlined in Sections 2 and 3, respectively. The results derived from the conducted analyses are discussed in Section 4. Section 5 concludes the paper by summarising key findings, acknowledging limitations and proposing directions for future research and practical implementation.
2. Theory
This section presents the theoretical background, including a description of the quay walls for which surrogate models are trained, the data sampling procedure required for the supervised learning process and the types of NNs.
2.1 Quay wall and structural system
Quay walls are essential infrastructural components in maritime settings, designed to retain soil and withstand forces resulting from earth pressure, hydrostatic loads and operational activities such as mooring and berthing impacts. These structures are generally categorised into gravity walls, cantilevered sheet pile walls and anchored systems, with the selection of a specific type largely governed by local ground conditions, design requirements and economic factors.
Anchored sheet pile quay walls are particularly prevalent due to their structural efficiency and cost-effectiveness. By employing tie rods or ground anchors, these systems are capable of significantly reducing bending moments and horizontal displacements, allowing the wall to sustain higher loads with a slender structural profile (HTG and DGGT, 2024).
For the purpose of surrogate model development, a simplified representation of an anchored quay wall was employed in this study, as shown in Figure 1. The structure was modelled as a horizontally supported beam with Dirichlet boundary conditions applied at the anchor point and the toe of the wall. The depth of the lower support was incrementally adjusted increasing the embedment depth (t) until the shear force at this point approached zero, indicating a state of equilibrium between external loads and passive earth resistance. Lateral soil pressures were computed using Blum’s method for anchored retaining walls (Blum, 1950) and hydrostatic forces were applied to both faces of the wall to reflect variable water levels. A uniform surface surcharge load of 30.0 kN/m2 was also introduced to account for operational loading. The soil parameters of the coarse-grained, cohesionless material used to calculate horizontal earth pressure were kept constant for all combinations, comprising an internal friction angle of 30°, a saturated unit weight of 18.0 kN/m3 and an unsaturated unit weight of 10.0 kN/m3. This idealised structural setup enabled consistent and reproducible simulation of internal forces and wall displacements, forming the basis for a comprehensive dataset used in the training and validation of the surrogate models. The computational analysis of the beam was performed using the anaStruct library for Python (Vink, 2016) with internal forces and horizontal displacements calculated and stored at discrete points along the sheet pile profile.
The schematic diagram shows a vertical sheet pile wall retaining soil with water on the excavated side. Labels include Discretised horizontal displacements u x,y, Wall height H, Water level on excavated side W a, Embedment depth t, Anchor level z a, Sheet pile profile bending stiffness E I, and Surcharge load. The wall extends from ground surface to embedment depth t below the excavation level. A horizontal water surface is shown at W a on the excavated side. A surcharge load is applied at the retained ground surface. An anchor is located at level z a. The wall height is labelled H and discretised horizontal displacements are indicated along the wall length.Simplified quay wall cross-section used for data sampling
The schematic diagram shows a vertical sheet pile wall retaining soil with water on the excavated side. Labels include Discretised horizontal displacements u x,y, Wall height H, Water level on excavated side W a, Embedment depth t, Anchor level z a, Sheet pile profile bending stiffness E I, and Surcharge load. The wall extends from ground surface to embedment depth t below the excavation level. A horizontal water surface is shown at W a on the excavated side. A surcharge load is applied at the retained ground surface. An anchor is located at level z a. The wall height is labelled H and discretised horizontal displacements are indicated along the wall length.Simplified quay wall cross-section used for data sampling
The aim of this study was to demonstrate the application of hybrid NN-based surrogate modelling in geotechnical engineering. To this end, a simplified analytical model was selected to illustrate the approach. While soil non-linearity and measurement noise were not present in the dataset, the simplified model provided a controlled environment to test the capabilities of the hybrid architecture. Importantly, even though the chosen model was computationally light, the study focuses on validating the surrogate methodology itself, which can later be applied to more complex and computationally intensive models where real-time analysis or optimisation would make surrogate modelling essential.
2.2 Data sampling
Developing reliable surrogate models requires a carefully chosen data sampling approach to ensure accurate mapping between input variables and their corresponding outputs (Blondet et al., 2018). Given the relatively low computational expense associated with data generation in this study, a structured grid-based sampling technique was selected. This method discretised the input domain into fixed intervals, creating a uniform and comprehensive distribution of samples across the entire parameter space. The static input features were defined as
embedment depth of quay wall (t): result of iterative calculation (Section 2.1)
anchorage level of quay wall (za): from 0.5 m to 2.0 m in 0.1 m steps
water level on the excavated side (Wa): from 2.5 m to 4.5 m in 0.1 m steps
quay wall’s total height (H): from 5.0 m to 10.0 m in 0.5 m steps
Sheet pile Z-profile: selected from {AZ 12–700, AZ 14–700, AZ 18–700, AZ 20–700, AZ 24–700} (ArcelorMittal, 2022).
To incorporate the sheet pile profile into the regression surrogate model, each profile was numerically encoded using its bending stiffness (EI), calculated as the product of the Young’s modulus of steel (E = 210 GPa) and the corresponding moment of inertia (I) of the profile.
The sampling scheme generated a dataset comprising n = 18 480 individual entries, each representing a unique combination of input variables (), where k = 1, …, 5 corresponds to the parameters t, za, Wa, H, EI and i = 1, …, N indicated the data sample. Recorded horizontal displacements (ux,y) along the sheet pile were treated as sequential input arrays () for the BiLSTM components of the network, where each consisted of sequential input variables () representing individual displacement points along the sheet pile. The output feature (i.e. the maximum bending moment Mmax) was defined as and the model’s accuracy was evaluated by comparing the values of to the predicted outputs () stored in the predicted subset . The full dataset, comprising N data samples, was subsequently divided during preprocessing into training, validation and test subsets (, and , respectively) to support model development and performance assessment. Data sampling was performed using known dependencies and leveraging Blum’s method (Blum, 1950). Therefore, an extensive analysis of multi-collinearity and feature importance (Raja et al., 2023) was omitted in this study. However, for more complex datasets including soil non-linearity and field measurements, it is strongly recommended to thoroughly examine the composition and dependencies within the data.
2.3 FNNs
FNNs represent one of the most basic yet foundational architectures within the broader domain of artificial NNs. The conceptual basis for artificial neurons dates back to the work of McCulloch and Pitts (1943), while the first trainable network (the perceptron) was introduced by Rosenblatt (1958). Although this single-layer design demonstrated early promise, it was later shown to be incapable of modelling non-linear relationships (Minsky and Papert, 1988). This limitation was overcome with the introduction of the back-propagation algorithm (Rumelhart et al., 1986), which enabled the training of multi-layer networks, thereby significantly expanding their functional capabilities.
FNNs operate by transmitting information in a single direction, starting from the input layer, passing through one or more hidden layers and finally reaching the output layer. Unlike recurrent architectures, FNNs do not incorporate cycles or feedback, which simplifies their behaviour and training. Each neuron in the network applies a weighted sum to its inputs, transforms the result using an activation function and passes the output forward to the next layer. This architecture is particularly well suited to tasks involving supervised learning, such as classification and regression, where a known target value is associated with each input.
A standard FNN is comprised of three distinct types of layers. The input layer consists of nodes corresponding to the features of the dataset and serves solely as a conduit for data into the network. The hidden layers are responsible for non-linear transformation and abstraction of the input space. Here, each neuron performs a linear combination of inputs followed by a non-linear activation, enabling the network to capture intricate dependencies within the data. Finally, the output layer generates the network’s prediction. In regression tasks, this layer typically contains a single neuron with a linear activation function, allowing it to output a continuous real-valued prediction (Hecht-Nielsen, 1993).
The training of FNNs is an iterative process comprising forward and backward passes. During forward propagation, the input vector is passed through the network to compute a prediction. The loss function then quantifies the deviation from the true value. In the back-propagation step, gradients of the loss with respect to each model parameter are calculated using the chain rule. These gradients are then used to update weights, typically by stochastic gradient descent or one of its variants.
To ensure robust learning and fair evaluation, the dataset is divided into training and testing sets. The model is fitted using the training data, where it learns to associate input features with their corresponding outputs. The testing data, which remains unseen during training, is used to assess the model’s generalisation capability. This split is critical for identifying whether the network suffers from overfitting (where it learns noise rather than patterns) or underfitting (where the model lacks the complexity or training necessary to capture relevant trends in the data). FNNs were used in this work for surrogate modelling of quay walls using non-sequential static input features ().
2.4 BiLSTM NN
LSTM NNs, introduced by Hochreiter and Schmidhuber (1997), are a specialised form of recurrent NNs designed to model sequential data and capture long-range dependencies effectively. Their architecture incorporates memory cells regulated by gating mechanisms (input, forget and output gates) that control information flow, enabling the network to retain relevant information over extended sequences. In LSTM, an activation function controls signal propagation to the next layer, while a recurrent activation function is used within the gating mechanisms, regulating information flow across data steps. This structure addresses the vanishing gradient and exploding gradient problem inherent in conventional recurrent NNs, facilitating stable training and improved learning of temporal correlations. A gradient is the partial derivative of the loss function with respect to the model parameters, which guides updates during training. The gradient applied to each parameter is scaled by a value held in the model’s weight matrices. These problems arise when the back-propagated gradients either grow exponentially large (exploding gradient) or diminish to near zero (vanishing gradient), making it difficult for the model to learn effectively (Staudemeyer and Morris, 2019).
BiLSTM NNs extend the LSTM framework by introducing a second, parallel LSTM layer that processes the input sequence in reverse chronological order (Schuster and Paliwal, 1997). This bidirectional architecture allows the model to access both past and future contexts at each time step, enhancing its capacity to capture temporal dependencies when predictions depend on the entire sequence. The outputs from forward and backward passes are usually concatenated before subsequent processing. Although BiLSTMs increase computational complexity and parameter count, they have demonstrated superior performance in applications such as time-series forecasting, sequence labelling, structural behaviour prediction under temporally varying inputs (Graves and Schmidhuber, 2005) and geotechnical predictions of laboratory test results (Cerek et al., 2024, 2025a) or LSTM-based parameter calibration (Cerek et al., 2025b). This study leveraged BiLSTMs to process sequential input data () comprising horizontal displacements along the quay wall (ux,y) while merging their outputs with FNN-based components.
2.5 Activation functions
The selection of a suitable activation function for hidden layers plays a critical role in developing an accurate and efficient NN. Non-linear activation functions are employed within neurons to enable the mapping of complex relationships between input and output data. The final output of an NN is ultimately determined by the weighted sum of the activations produced by individual neurons.
In this study, a range of activation functions was investigated as part of the model architecture tuning process, with the aim of identifying optimal choices for both the FNN and the hybrid BiLSTM–FNN architecture. Notably, in the case of the recurrent BiLSTM model, certain activation functions were found to prevent convergence during training, indicating their unsuitability for this type of network. The activation functions evaluated were
rectified linear unit (ReLU) (Parhi and Nowak, 2020)
exponential linear unit (ELU) (Clevert et al., 2015)
sigmoid (Dubey et al., 2022)
hyperbolic tangent (tanh)
softplus (Dugas et al., 2000).
3. Methodology
This section outlines the methodology adopted in this study, encompassing data preprocessing, architecture tuning, the design of final NN configurations and the evaluation metrics used. A synthetically generated dataset served as the basis for extracting input–output pairs required for the supervised training of surrogate models. Two NN architectures were considered: (a) an FNN comprising only dense layers and (b) a hybrid model integrating BiLSTM layers with dense layers (BiLSTM–FNN model).
The training process was iterative, using a randomly selected training subset () to optimise the model parameters, while a separate validation subset () was used to monitor performance after each epoch. Once trained, the models were used to predict output values () for a dedicated prediction subset (). These predicted output values were compared against ground-truth values () drawn from an isolated test set () not involved in the training process.
The primary objective of this study was to assess whether incorporating sequential input data by way of BiLSTM layers enhanced surrogate model performance compared with a purely feedforward architecture relying solely on static input features. To ensure a fair comparison, both models underwent automated architecture tuning, optimising hyperparameters such as the number and size of layers, activation functions, the learning rate of the optimiser and batch size. Final training was carried out until no further improvements in predictive accuracy were observed. The aim of the evaluation procedure was to quantify performance through a combination of qualitative and quantitative metrics.
3.1 Preprocessing
The dataset was imported from an external file into a structured data frame, with specific columns designated as
static input features (): embedment depth (t), wall length (H), bending stiffness of sheet pile profile (EI), anchor level (za) and water level on the excavated side (Wa)
sequential input array (): horizontal displacements along the sheet pile (ux,y)
output variable (): maximum bending moment (Mmax).
The dataset was randomly divided into training, validation and testing (, and ) using a 60/20/20 split, with each subset containing a corresponding proportion of the total N data samples. To ensure reproducibility, a fixed random seed of 1993 was applied consistently across all major libraries. For the output variable , the absolute value of the maximum bending moment |Mmax| was used during training. As the sign of the maximum bending moment was consistent throughout the dataset, the use of absolute values did not affect the learning process.
Although data scaling is known to improve training efficiency (Sinsomboonthong, 2022), it was intentionally omitted in this study. Scaling would have introduced unnecessary complications when applying the model to out-of-range datasets, due to the need for extrapolating scaled values. Additionally, scaling should be based only on training data, since normalisation on test data as well would leak information and inflate the predictive accuracy (Kumar, 2025). This extrapolation can lead to inaccuracies and operational limitations, particularly when the magnitudes of horizontal displacements (ux,y) differed significantly from those observed in the training data. The dense neurons in the surrogate models used the ReLU-based activation function (linear for positive values), so no saturation could occur. The exploding of weighted sum due to large magnitudes of input variables was mitigated through weight compensation within the training, enhanced by the small learning rates of the optimiser.
3.2 Architecture
For the architecture tuning of hyperparameters in both the FNN and BiLSTM-FNN surrogate models, the KerasTuner library (O’Malley et al., 2019) was employed, utilising the Hyperband tuner (Li et al., 2018) owing to its efficiency in combining random search with early stopping criteria. Tables 1 and 2 summarise the hyperparameters adjusted during the tuning process, along with the corresponding search ranges.
Search ranges and hyperparameters explored during architecture tuning of FNN-based surrogate model
| Hyperparameter | Search range/options | Description |
|---|---|---|
| Activation function | ReLU, tanh, ELU, softplus | Activation function for hidden dense layers |
| Number of dense layers | 1–3 | Number of dense hidden layers |
| Dense neurons per layer | 16–256 (step 16) | Number of neurons in each dense layer |
| Learning rate | 1 × 10−5 to 1 × 10−1 | Learning rate for Adam optimiser |
| Batch size | {16, 32, 64, 128} | Number of samples processed together in one training step |
| Hyperparameter | Search range/options | Description |
|---|---|---|
| Activation function | ReLU, tanh, ELU, softplus | Activation function for hidden dense layers |
| Number of dense layers | 1–3 | Number of dense hidden layers |
| Dense neurons per layer | 16–256 (step 16) | Number of neurons in each dense layer |
| Learning rate | 1 × 10−5 to 1 × 10−1 | Learning rate for Adam optimiser |
| Batch size | {16, 32, 64, 128} | Number of samples processed together in one training step |
Search ranges and hyperparameters explored during architecture tuning of hybrid BiLSTM-FNN-based surrogate model
| Hyperparameter | Search range/options | Description |
|---|---|---|
| Dense activation function | ReLU, tanh, ELU, softplus | Activation function for hidden dense layers |
| Number of dense layers | 1–3 | Number of dense hidden layers |
| Dense neurons per layer | 16–256 (step 16) | Number of neurons in each dense layer |
| Activation function for static branch | ReLU, tanh, ELU, softplus | Activation function for static input in the layer merging recurrent and static branches |
| BiLSTM activation function | tanh, ReLU, ELU, sigmoid, softplus | Activation function for shaping unit output |
| BiLSTM recurrent activation function | tanh, ELU, sigmoid, softplus | Activation function used in gating mechanism |
| Number of BiLSTM layers | 1–3 | Number of BiLSTM layers |
| BILSTM units per layer | 16–256 (step 16) | Number of units in each BiLSTM layer |
| Learning rate | 1 × 10−5 to 1 × 10−1 | Learning rate for Adam optimiser |
| Batch size | {16, 32, 64, 128} | Number of samples processed together in one training step |
| Hyperparameter | Search range/options | Description |
|---|---|---|
| Dense activation function | ReLU, tanh, ELU, softplus | Activation function for hidden dense layers |
| Number of dense layers | 1–3 | Number of dense hidden layers |
| Dense neurons per layer | 16–256 (step 16) | Number of neurons in each dense layer |
| Activation function for static branch | ReLU, tanh, ELU, softplus | Activation function for static input in the layer merging recurrent and static branches |
| BiLSTM activation function | tanh, ReLU, ELU, sigmoid, softplus | Activation function for shaping unit output |
| BiLSTM recurrent activation function | tanh, ELU, sigmoid, softplus | Activation function used in gating mechanism |
| Number of BiLSTM layers | 1–3 | Number of BiLSTM layers |
| 16–256 (step 16) | Number of units in each BiLSTM layer | |
| Learning rate | 1 × 10−5 to 1 × 10−1 | Learning rate for Adam optimiser |
| Batch size | {16, 32, 64, 128} | Number of samples processed together in one training step |
An early stopping strategy was implemented to monitor the validation loss, quantified by the mean squared error (MSE) (Chicco et al., 2021). Training was halted if no improvement greater than 1.0 was observed over ten consecutive epochs. The maximum number of training epochs during tuning was set to 50.
Figure 2 shows the evolution of the validation loss function across all trials conducted for the FNN-based surrogate model. The tuning process comprised 90 trials, with the final trial yielding the best performance with a validation loss function of 3567.7. The optimised architecture consists of a single dense layer with 32 neurons, employing the softplus activation function, a learning rate of 0.000593, and a batch size of 128. This configuration results in only 225 trainable parameters, classifying the network as a relatively small and computationally efficient model.
The graph plots validation loss function versus trial. The vertical axis shows Validation loss function from 0 to 60000 and the horizontal axis shows Trial from 0 to 100. The legend includes Validation loss function, Best trial, Minimum loss, and Total number of trials 90. The curve fluctuates between approximately 5000 and 40000 across trials. The best trial is marked at Trial 89 with Validation loss 3567.7. The annotation lists Learning rate 0.000593, Batch size 128, and Dense layer 1 32 neurons softplus.Validation loss function values across trials for various combinations of hyperparameters in the preliminary study of the architecture for FNN-based surrogate model
The graph plots validation loss function versus trial. The vertical axis shows Validation loss function from 0 to 60000 and the horizontal axis shows Trial from 0 to 100. The legend includes Validation loss function, Best trial, Minimum loss, and Total number of trials 90. The curve fluctuates between approximately 5000 and 40000 across trials. The best trial is marked at Trial 89 with Validation loss 3567.7. The annotation lists Learning rate 0.000593, Batch size 128, and Dense layer 1 32 neurons softplus.Validation loss function values across trials for various combinations of hyperparameters in the preliminary study of the architecture for FNN-based surrogate model
Figure 3 shows the results of the tuning process for the hybrid surrogate model combining LSTM and FNN layers. A total of 64 trials were conducted to determine the optimal network architecture. However, 22 of these trials produced NaN (not-a-number) values, indicating a failure to converge, and were consequently excluded from the evaluation. The non-convergence was primarily attributed to the use of unsuitable activation functions, which can result in issues such as exploding or vanishing gradients during training. Trial 54 achieved the lowest validation loss, with a value of 314.8, which was approximately 11 times lower than that of the optimised FNN model. The resulting architecture comprised three BiLSTM layers with 88, 72 and 88 neurons, respectively. Each BiLSTM layer used the sigmoid activation function for signal propagation and tanh as the recurrent activation for updating the cell state (gating mechanism). The static input branch employed the softplus activation function across its 64 neurons. The outputs from the BiLSTM layers and the first dense layer were concatenated and passed to two subsequent dense layers with 168 and 216 neurons, using ELU and ReLU activation functions, respectively. The optimal model configuration included a learning rate of 0.000447 and a batch size of 64. This model required 448 409 trainable parameters, significantly more than FNN model.
The graph plots validation loss function versus trial. The vertical axis shows Validation loss function from 0 to 90000 and the horizontal axis shows Trial from 0 to 70. The legend includes Validation loss function, Best trial, Minimum loss, Total number of trials 64, and Number of dropped trials N a N 22. The curve fluctuates between approximately 1000 and 40000 across trials. The best trial is marked at Trial 54 with Validation loss 314.8. The annotation lists Learning rate 0.000447, Batch size 64, L S T M activation sigmoid, L S T M recurrent activation tanh, L S T M layer 1 88 neurons, L S T M layer 2 72 neurons, L S T M layer 3 88 neurons, Activation static branch softplus, Dense layer 1 168 neurons E L U, and Dense layer 2 216 neurons R e L U.Validation loss function values across trials for various combinations of hyperparameters in the preliminary study of the architecture for BiLSTM-FNN-based surrogate model
The graph plots validation loss function versus trial. The vertical axis shows Validation loss function from 0 to 90000 and the horizontal axis shows Trial from 0 to 70. The legend includes Validation loss function, Best trial, Minimum loss, Total number of trials 64, and Number of dropped trials N a N 22. The curve fluctuates between approximately 1000 and 40000 across trials. The best trial is marked at Trial 54 with Validation loss 314.8. The annotation lists Learning rate 0.000447, Batch size 64, L S T M activation sigmoid, L S T M recurrent activation tanh, L S T M layer 1 88 neurons, L S T M layer 2 72 neurons, L S T M layer 3 88 neurons, Activation static branch softplus, Dense layer 1 168 neurons E L U, and Dense layer 2 216 neurons R e L U.Validation loss function values across trials for various combinations of hyperparameters in the preliminary study of the architecture for BiLSTM-FNN-based surrogate model
The final architectures shown in Figures 4 and 5 were used to train the final surrogate models based on the FNN and the BiLSTM–FNN, respectively. For model compilation, the Adam optimiser (Kingma and Ba, 2014) was employed. No regularisation techniques were incorporated in the model architectures, as no signs of overfitting were observed during training or validation (e.g. performance divergence between the training and validation sets). The MSE was used as the evaluation metric for both the training and validation subsets.
The neural network diagram shows Static input features, Input neuron, Dense neuron, Output feature, and Output dense neuron. The input layer includes variables t, z a, W a, H, and E I connected to input neurons labelled x 1,i, x 2,i, x 4,i, and x 5,i. The hidden layer uses softplus activation and contains neurons labelled h 1,1, h 1,2, h 1,31, and h 1,32 with additional intermediate nodes indicated by ellipsis. The output layer contains neuron y hat 1,i connected to output feature M max.Architecture of FNN-based surrogate model
The neural network diagram shows Static input features, Input neuron, Dense neuron, Output feature, and Output dense neuron. The input layer includes variables t, z a, W a, H, and E I connected to input neurons labelled x 1,i, x 2,i, x 4,i, and x 5,i. The hidden layer uses softplus activation and contains neurons labelled h 1,1, h 1,2, h 1,31, and h 1,32 with additional intermediate nodes indicated by ellipsis. The output layer contains neuron y hat 1,i connected to output feature M max.Architecture of FNN-based surrogate model
The schematic shows a hybrid neural network architecture divided into Input layers, Hidden layers, and Output layer. The upper input branch receives Horizontal displacements u x, 1 through u x, y as sequential input neurons labelled X 1, i with intermediate nodes x R 1, i, x R 2, i, up to x R y, i enclosed in a dashed boundary. These sequential inputs pass through three stacked B i L S T M layers with sigmoid slash tanh activations, containing nodes h 1, 1 to h 1, 88, h 2, 1 to h 2, 72, and h 3, 1 to h 3, 88. The lower input branch receives Static input features t, z a, W a, H, E I into fully connected neurons x 1, i to x 5, i followed by a dense layer with softplus activation and nodes h 1, 1 to h 1, 64. Outputs from the B i L S T M branch and the static dense branch merge at concatenate B i L S T M plus F N N and feed into two fully connected layers labelled E L U and R e L U with nodes h 4, 1 to h 4, 168 and h 5, 1 to h 5, 216. The final output neuron y hat 1, i connects to the Output feature M max. A legend identifies Static input features, Horizontal displacements, Dense neuron, B i L S T M unit, Output feature, Input neuron, Sequential input neurons, and Output dense neuron.Architecture of BiLSTM-FNN-based surrogate model
The schematic shows a hybrid neural network architecture divided into Input layers, Hidden layers, and Output layer. The upper input branch receives Horizontal displacements u x, 1 through u x, y as sequential input neurons labelled X 1, i with intermediate nodes x R 1, i, x R 2, i, up to x R y, i enclosed in a dashed boundary. These sequential inputs pass through three stacked B i L S T M layers with sigmoid slash tanh activations, containing nodes h 1, 1 to h 1, 88, h 2, 1 to h 2, 72, and h 3, 1 to h 3, 88. The lower input branch receives Static input features t, z a, W a, H, E I into fully connected neurons x 1, i to x 5, i followed by a dense layer with softplus activation and nodes h 1, 1 to h 1, 64. Outputs from the B i L S T M branch and the static dense branch merge at concatenate B i L S T M plus F N N and feed into two fully connected layers labelled E L U and R e L U with nodes h 4, 1 to h 4, 168 and h 5, 1 to h 5, 216. The final output neuron y hat 1, i connects to the Output feature M max. A legend identifies Static input features, Horizontal displacements, Dense neuron, B i L S T M unit, Output feature, Input neuron, Sequential input neurons, and Output dense neuron.Architecture of BiLSTM-FNN-based surrogate model
The supervised learning process was limited to a maximum of 10 000 training epochs, with an early stopping criterion implemented to mitigate overfitting. Training was terminated if the validation loss failed to improve by more than 1.0 over 50 consecutive epochs, in accordance with the recommendations of Prechelt (2002).
The NNs were implemented and executed using Python (Van Rossum and Drake, 2009), utilising the TensorFlow framework (Abadi et al., 2015) and the Keras API (Chollet, 2015). All computational tasks were performed on a workstation equipped with an AMD Ryzen Threadripper PRO 7975WX 32-core CPU at 4.00 GHz, 512 GB RAM and SSD storage. Although the system included an NVIDIA RTX A6000 GPU with 48 GB VRAM, GPU acceleration was not used for training in this study, since the training durations remained within acceptable computational limits.
3.3 Postprocessing
Throughout all training epochs, the MSE for both training and validation subsets was continuously monitored and recorded. The model’s predicted outputs were saved in a structured data frame, alongside their corresponding actual values from the test subset. Evaluation of the performance of the surrogate models was conducted using both qualitative and quantitative metrics, including
the progression of training and validation loss curves over the epochs
a scatter plot comparing the predicted and actual values of the maximum bending moment, accompanied by the coefficient of determination (R2).
The coefficient of determination is a fundamental scale-independent metric used to assess the overall accuracy of regression models, including NN-based surrogate models. As defined in Equation 1, R2 quantifies the proportion of variance in the dependent variable that is explained by the model. In Equation 1, denotes the actual value, is the predicted value and is the mean of the actual values. R2 values range from 0 to 1, where 1 indicates a perfect prediction and 0 implies that the model performs no better than simply predicting the mean. Within the context of NNs, R2 is widely adopted to evaluate how well a model captures the underlying data patterns (Chicco et al., 2021). Additionally, the mean absolute error (MAE) over the test subset was calculated to provide an engineering critical metric for model predictive capabilities.
4. Results
The results of this study include the training of the FNN-based surrogate model and a hybrid model combining a BiLSTM layer with dense layers. These results demonstrate the superior predictive accuracy of the hybrid architecture. Figure 6 shows the training and validation loss curves recorded during the training of the FNN model, which reached a minimum validation loss of 1130.4 at epoch 867. Figure 7 shows the corresponding loss curves for the BiLSTM–FNN model. Although the training process appears somewhat unstable, with visible fluctuations, the best-performing model achieved a minimum validation loss of 14.2 at epoch 198. The training trajectory of the FNN model appears more stable than the hybrid model, characterised by a smooth and gradual decline in both training and validation loss curves. This contrast reflects the increased complexity and dynamic learning behaviour associated with the recurrent layers in the hybrid architecture. Regarding computational efficiency, the BiLSTM–FNN model required 2606.41 s to complete training, whereas the FNN model finished in 153.75 s. The purely FNN architecture was about 17 times quicker in supervised learning process, which might be explained by the much lower number of trainable parameters.
Loss function curve and restored best weight for the training and validation dataset across epochs for FNN-based surrogate model
Loss function curve and restored best weight for the training and validation dataset across epochs for FNN-based surrogate model
The graph plots Loss function on the vertical axis from 0 to 3000 and Epoch on the horizontal axis from 0 to 250, with Training loss and Validation loss fluctuating at higher values before epoch 100 and decreasing towards values below 200 after epoch 150, and a marked point at Epoch 198 with Validation loss 14.2 labelled Restored best weight.Loss function curve and the restored best weight for the training and validation dataset across epochs for BiLSTM-FNN-based surrogate model
The graph plots Loss function on the vertical axis from 0 to 3000 and Epoch on the horizontal axis from 0 to 250, with Training loss and Validation loss fluctuating at higher values before epoch 100 and decreasing towards values below 200 after epoch 150, and a marked point at Epoch 198 with Validation loss 14.2 labelled Restored best weight.Loss function curve and the restored best weight for the training and validation dataset across epochs for BiLSTM-FNN-based surrogate model
The FNN model achieved reasonable accuracy, with R2 = 0.885, as illustrated in Figure 8, which presents a scatter plot comparing the predicted values of the maximum bending moment () with the actual values () from the test subset . However, the broad scatter of predicted values around the perfect-fit line highlights the limitations of predictive accuracy of the FNN surrogate model. The MAE of the FNN model was 26.8 kN.m/m. The enhanced performance of the BiLSTM–FNN surrogate model is demonstrated in Figure 9. A near-perfect alignment of the data points along the ideal prediction line indicates excellent predictive capability. The model achieved R2 = 0.999 and MAE = 2.7 kN.m/m across 3696 unseen test samples, confirming its high accuracy and strong generalisation performance. The hybrid model was about ten times more accurate than the pure FNN model based on the MAE. Table 3 provides additional error statistics for both models.
The graph plots Predicted output y hat sub i in kilonewton metre per metre on the vertical axis from 0 to 500 against Actual output y sub i in kilonewton metre per metre on the horizontal axis from 0 to 500. The legend includes Comparison of predicted and actual values, Perfect prediction y hat sub i equals y sub i, and Number of data points 3696. A dashed line represents perfect prediction. The data points form an increasing relationship as actual output increases from about 20 to 450, with predicted output increasing from about 0 to 430. The coefficient of determination R squared equals 0.885.Comparison of predicted () and actual () maximum bending moments and coefficient of determination (R2) for the test subset using the trained FNN-based surrogate model
The graph plots Predicted output y hat sub i in kilonewton metre per metre on the vertical axis from 0 to 500 against Actual output y sub i in kilonewton metre per metre on the horizontal axis from 0 to 500. The legend includes Comparison of predicted and actual values, Perfect prediction y hat sub i equals y sub i, and Number of data points 3696. A dashed line represents perfect prediction. The data points form an increasing relationship as actual output increases from about 20 to 450, with predicted output increasing from about 0 to 430. The coefficient of determination R squared equals 0.885.Comparison of predicted () and actual () maximum bending moments and coefficient of determination (R2) for the test subset using the trained FNN-based surrogate model
The graph plots Predicted output y hat sub i in kilonewton metre per metre on the vertical axis from 0 to 500 against Actual output y sub i in kilonewton metre per metre on the horizontal axis from 0 to 500. The legend includes Comparison of predicted and actual values, Perfect prediction y hat sub i equals y sub i, and Number of data points 3696. A dashed line represents perfect prediction. The data points align closely along the increasing diagonal as actual output increases from about 30 to 450, with predicted output increasing from about 30 to 440. The coefficient of determination R squared equals 0.999.Comparison of predicted () and actual () maximum bending moments and coefficient of determination (R2) for the test subset using the trained BiLSTM-FNN-based surrogate model
The graph plots Predicted output y hat sub i in kilonewton metre per metre on the vertical axis from 0 to 500 against Actual output y sub i in kilonewton metre per metre on the horizontal axis from 0 to 500. The legend includes Comparison of predicted and actual values, Perfect prediction y hat sub i equals y sub i, and Number of data points 3696. A dashed line represents perfect prediction. The data points align closely along the increasing diagonal as actual output increases from about 30 to 450, with predicted output increasing from about 30 to 440. The coefficient of determination R squared equals 0.999.Comparison of predicted () and actual () maximum bending moments and coefficient of determination (R2) for the test subset using the trained BiLSTM-FNN-based surrogate model
Error statistics od FNN and BiLSTM-FNN models
| Surrogate model | Mean error: kNm/m | Error variance: kNm/m2 | Minimum error: kNm/m | Maximum error: kNm/m | Skewness | RMSE: kNm/m |
|---|---|---|---|---|---|---|
| FNN | −0.28 | 1119.3 | −74.2 | 125.8 | 0.67 | 33.5 |
| BiLSTM-FNN | 0.06 | 13.5 | −16.7 | 22.8 | 0.89 | 3.7 |
| Surrogate model | Mean error: kNm/m | Error variance: kNm/m2 | Minimum error: kNm/m | Maximum error: kNm/m | Skewness | RMSE: kNm/m |
|---|---|---|---|---|---|---|
| −0.28 | 1119.3 | −74.2 | 125.8 | 0.67 | 33.5 | |
| BiLSTM-FNN | 0.06 | 13.5 | −16.7 | 22.8 | 0.89 | 3.7 |
The improved performance of the BiLSTM–FNN model is attributed to its ability to effectively capture dependencies within the horizontal displacement sequences, which the purely FNN architecture was unable to model. By processing ordered spatial data through BiLSTM layers, the hybrid architecture yielded a more accurate surrogate model for the maximum bending moment.
The high accuracy of the BiLSTM–FNN hybrid model was critically evaluated. To prevent test data leakage, the dataset was split prior to training, and testing was conducted using the final trained model; therefore, the results were not compromised by prior exposure to test data. The activation functions used within the NN were non-linear mappings from input features to output labels, and no physical equations directly linking inputs to outputs were implemented.
5. Conclusion
This study demonstrates the added value of a hybrid NN-based surrogate model that integrated BiLSTM layers with dense layers to effectively process both static input features and sequential data, such as horizontal displacements measured along a quay wall. The hybrid BiLSTM–FNN model was systematically compared with a conventional FNN model composed solely of dense layers. Through architecture tuning, the optimal configurations of both models were identified. The trained models successfully predicted the maximum bending moment in quay wall structures, with the hybrid model outperforming the FNN model. Specifically, the BiLSTM–FNN surrogate achieved a coefficient of determination (R2) of 0.999, compared with R2 = 0.885 for the FNN model, clearly demonstrating its superior predictive accuracy. However, a trade-off between inference time and accuracy should be considered when choosing a surrogate modelling strategy. The purely FNN model was 17 times faster to train but approximately ten times less accurate (based on the MAE) than the hybrid BiLSTM–FNN model.
5.1 Limitations
While the proposed BiLSTM–FNN surrogate model demonstrated high predictive accuracy and computational efficiency, several limitations should be acknowledged.
Firstly, the model was developed and validated using a synthetic dataset generated under controlled boundary conditions. As such, its generalisation to other geotechnical configurations, soil types or real-world field monitoring data remains untested. Additional validation using diverse datasets, including field measurements, would be necessary to confirm the model’s robustness.
Secondly, no uncertainty quantification or sensitivity analysis was performed in this study. While the input variables were selected based on engineering judgement and physical relevance, the absence of formal feature importance analysis limits insights into how individual features influence the model’s output.
5.2 Outlook
Building on the promising results of this study, several directions for future research are proposed. Further validation of the BiLSTM–FNN surrogate modelling approach using field monitoring data from instrumented quay walls for training is crucial to assess its reliability under varying loading and soil conditions. The incorporation of uncertainty quantification is recommended to define confidence bounds for model predictions, thereby establishing a framework for using the proposed approach as a decision support tool through probabilistic assessment. Additionally, feature importance analysis could provide deeper insights into the relative contribution of input variables, offering guidance on potential model simplification or expansion without compromising predictive accuracy. The integration of physical relationships or constraints into the hybrid model may further enhance its robustness and reduce the amount of training data required, which is especially valuable for sites with limited monitoring records.
5.3 Potential applications
The results of this study underscore the importance of incorporating spatial-dependent displacement data alongside static features in geotechnical modelling. The proposed surrogate modelling approach offers a computationally efficient and scalable solution for real-time structural assessment and supervision of quay walls, as well as other geotechnical structures such as embankments, deep excavations and pile foundations. By enabling rapid predictions without the need for repeated numerical simulations, the method supports life-cycle management and contributes to infrastructure sustainability through optimised monitoring and informed decision making. Future research should focus on the integration of such models into digital platforms and life-cycle management systems to enable cost-effective, automated and data-driven geotechnical asset management.

