Implementing predictive maintenance with AI for Moroccan railway freight networks: a case study on rails and critical components

El Moussaoui, Taoufiq; El Moussaoui, Alaa Eddine

doi:10.1108/RS-03-2026-0016

Purpose

This study develops and evaluates an artificial intelligence–driven predictive maintenance framework tailored to the operational challenges of Moroccan railway freight networks. These challenges include heterogeneous infrastructure conditions, coexistence of legacy and modern systems, limited real-time monitoring capabilities and increasing freight demand along strategic logistics corridors. The objective is to enhance infrastructure reliability, improve safety, and reduce maintenance costs by anticipating failures in rails, switches, and critical components before severe degradation occurs.

Design/methodology/approach

A multimodal hybrid predictive architecture was developed by integrating structured gradient boosting models (XGBoost) for operational and maintenance data, long short-term memory (LSTM) networks for temporal degradation analysis and convolutional neural networks (CNN) for automated defect detection from rail imagery. Model outputs were combined using a calibrated probabilistic fusion mechanism to generate reliable failure risk estimates. The framework was trained using a five-year dataset from 127 high-traffic Moroccan rail segments, incorporating maintenance logs, IoT sensor streams, environmental indicators, operational loads and annotated inspection images.

Findings

The proposed multimodal ensemble achieved strong predictive performance, reaching an AUC-ROC of 0.963 and recall of 0.921, outperforming baseline statistical and single-modality machine learning models. Deployment simulations demonstrated measurable operational benefits, including a 31% reduction in unscheduled downtime, a 28% decrease in emergency repairs, and a 38% increase in mean time between failures. Economic analysis further indicated an 18% reduction in total maintenance costs. The system maintained stable performance under increased sensor noise and traffic loads, confirming robustness under realistic railway operating conditions.

Originality/value

This study presents a context-adapted predictive maintenance framework specifically designed for emerging railway environments characterized by infrastructure variability and partial digitalization. The originality lies in integrating XGBoost, LSTM and CNN models within a probabilistically calibrated multimodal fusion architecture capable of producing reliable risk estimates for safety-critical operations. By explicitly addressing Moroccan railway constraints and demonstrating quantifiable operational and economic improvements, the research provides both methodological innovation and practical guidance for implementing AI-driven maintenance strategies in developing freight rail systems.

1. Introduction

Rail freight transport plays a pivotal role in Morocco's economic and logistical development, serving as the backbone for the movement of industrial goods, agricultural products, and raw materials across key corridors connecting major ports such as Tanger, Casablanca, and Nador. As the Moroccan railway freight network expands and traffic density increases, the operational management of this system faces significant challenges, particularly in maintaining critical infrastructure components such as rails, switches, and signaling systems (Abu-Aisha, Audy, & Ouhimmou, 2024). Unexpected failures not only lead to substantial financial losses due to service interruptions and emergency repairs but also pose serious safety risks to personnel and cargo. Consequently, ensuring the reliability and continuous operation of railway assets has become a strategic priority for transport authorities and logistics operators (Rodríguez-Hernández, Crespo-Márquez, Sánchez-Herguedas, & González-Prida, 2025).

In the Moroccan context, these challenges are further intensified by the heterogeneity of infrastructure conditions, the coexistence of legacy systems with modern rail segments, and the increasing pressure induced by strategic logistics projects such as port expansions and industrial corridor development (El Moussaoui et al., 2025a, b, c). Despite ongoing modernization efforts, maintenance practices in many segments of the network remain constrained by limited real-time monitoring capabilities and fragmented data ecosystems. This situation creates a critical gap between the growing operational complexity of freight rail systems and the capacity of traditional maintenance approaches to ensure reliability and safety. Addressing this gap requires not only technological advancement but also the development of integrated, context-aware maintenance frameworks specifically adapted to the structural and operational characteristics of Moroccan railway networks.

Traditional maintenance strategies, often based on fixed schedules or reactive interventions, are increasingly inadequate in addressing these challenges (Lidén, 2015). Fixed-interval maintenance may result in unnecessary downtime and inflated costs, while reactive approaches frequently lead to unplanned service disruptions. In this context, predictive maintenance, powered by artificial intelligence (AI), emerges as a promising solution by enabling the anticipation of failures before they occur (Bris-Peñalve et al., 2026). Leveraging historical data, sensor measurements, and real-time information from IoT-enabled devices, AI-driven systems can identify patterns and signals indicative of impending component degradation, allowing for timely interventions and optimized resource allocation (Bhadouria, Mishra, Jalan, & Andrade, 2025; El Moussaoui et al., 2025a, b, c).

However, existing predictive maintenance studies in railway systems often exhibit three major limitations. First, many approaches rely on single-modality data, such as sensor signals or historical maintenance records, thereby limiting their ability to capture the complex, multi-dimensional nature of railway asset degradation. Second, the integration of heterogeneous data sources—particularly the fusion of structured data, time-series signals, and visual inspection data—remains insufficiently explored in the railway freight context. Third, the calibration and reliability of predictive outputs, especially in safety-critical environments, are often overlooked, leading to potential misinterpretations of model confidence and suboptimal decision-making. These limitations highlight the need for more comprehensive and robust predictive maintenance frameworks capable of addressing the multidimensional and uncertain nature of railway system failures.

This study contributes to the body of knowledge by developing a comprehensive framework for implementing predictive maintenance in Moroccan railway freight networks, focusing on rails and other critical components. The research addresses the following key question: How can AI-based predictive maintenance be effectively applied to enhance operational reliability, reduce maintenance costs, and improve safety in Moroccan freight railways? To answer this, the study integrates multiple approaches including machine learning models such as XGBoost and recurrent neural networks (RNNs), computer vision techniques for automated rail inspections, and embedded real-time alert systems to facilitate proactive maintenance actions.

More specifically, this research introduces a multimodal predictive maintenance architecture that combines gradient boosting techniques for structured data analysis, recurrent neural networks for temporal pattern extraction, and convolutional neural networks for visual defect detection. Unlike conventional approaches, the proposed framework incorporates a data fusion mechanism that integrates the outputs of these heterogeneous models into a unified predictive engine, thereby enhancing both prediction accuracy and robustness. Furthermore, the study emphasizes probability calibration techniques to improve the interpretability and reliability of model outputs, which is particularly critical in safety-sensitive railway operations. By explicitly addressing the identified research gaps, this work advances the state of the art in railway predictive maintenance through both methodological integration and contextual adaptation to the Moroccan freight network.

The primary objectives of this research are threefold. First, to design a data-driven predictive maintenance framework tailored to the Moroccan context, incorporating historical, sensor, and IoT data. Second, to evaluate the performance of various AI models in anticipating failures and scheduling interventions efficiently. Third, to assess the operational and managerial impact of predictive maintenance in terms of cost reduction, service continuity, and safety improvements. By addressing these objectives, the study aims to provide actionable insights for railway operators, policy makers, and logistics planners seeking to modernize maintenance practices and enhance network resilience.

In addition to these objectives, the study explicitly seeks to benchmark the proposed framework against existing predictive maintenance approaches, including both traditional statistical models and recent AI-based methods. This comparative perspective enables the identification of the specific performance gains achieved through multimodal integration and provides a clearer demonstration of the framework's added value. Moreover, particular attention is given to the complementarity of the different modeling components, highlighting how each contributes to the detection of specific categories of defects and how their integration leads to improved overall system performance.

The remainder of the article is structured as follows. Section 2 presents a review of the literature on predictive maintenance, AI applications in rail transport, and related case studies. Section 3 details the methodology, including data collection, preprocessing, modeling approaches, and system architecture. Section 4 discusses the results by analyzing model performance, predictive accuracy, and potential operational benefits, while Section 5 concludes the study with a summary of the main contributions, implications, limitations, and directions for future research.

2. Literature review

2.1 Predictive maintenance in railway systems

Predictive maintenance has emerged as a transformative approach in the management of complex industrial and transportation infrastructures. Unlike traditional maintenance strategies, which rely on fixed schedules or reactive interventions after failures occur, predictive maintenance leverages data-driven insights to anticipate potential faults and schedule timely interventions (Sivakumar, Maranco, & Krishnaraj, 2024). In railway systems, this approach is particularly critical due to the high operational and safety risks associated with component failures. Rails, switches, and signaling systems constitute the backbone of freight railway operations, and their degradation can result in service interruptions, costly repairs, and safety hazards (Bešinović, 2020; El Moussaoui and El Moussaoui, 2026a, b, c, d). The increasing traffic density on key freight corridors amplifies the consequences of unplanned failures, making predictive maintenance an indispensable strategy for operators seeking to maintain high service reliability while minimizing operational costs.

Recent advancements in predictive maintenance have progressively moved toward more integrated modeling paradigms that combine statistical inference, machine learning techniques, and domain-specific engineering knowledge. In railway applications, this evolution reflects the need to better capture the complex interactions between operational loads, environmental conditions, and infrastructure aging processes. In addition, increasing attention has been given to the reliability and interpretability of predictive outputs, particularly in safety-critical systems where maintenance decisions must be supported by robust and trustworthy information. These developments indicate a shift from purely predictive accuracy toward more comprehensive decision-support capabilities (El Moussaoui and El Moussaoui, 2026a, b, c, d).

Early implementations of predictive maintenance in rail networks relied primarily on statistical models and historical failure records. Techniques such as regression analysis, survival analysis, and probabilistic modeling enabled operators to identify patterns of component wear and predict likely failure intervals (Zhao, Tian, Liang, & Xie, 2018). While these methods provided foundational insights, they often failed to capture the complex, nonlinear interactions between environmental factors, operational loads, and component aging. For instance, variations in temperature, humidity, train weight, and speed can affect wear rates in ways that linear models cannot fully capture (Shi, Wang, Wu, Song, & Teng, 2018; Zhu et al., 2019). As a result, the predictive accuracy of traditional methods was limited, particularly in large networks where operational conditions vary significantly across segments. Nevertheless, these early approaches highlighted the importance of proactive maintenance planning and demonstrated measurable improvements in asset management (Huang, Wen, Fu, Peng, & Tang, 2020).

To address these limitations, more recent studies have incorporated advanced machine learning and deep learning techniques capable of modeling nonlinear relationships and temporal dependencies. Ensemble methods such as gradient boosting have shown strong performance in handling structured operational data, while recurrent neural networks have been widely applied to capture degradation trends over time. Despite these improvements, existing approaches often rely on single data sources, which may restrict their ability to represent the multidimensional characteristics of railway systems. Moreover, increasing attention is being directed toward the need for reliable confidence estimation in predictive outputs, as inaccurate probability assessments may affect maintenance prioritization and operational decision-making.

International applications of predictive maintenance in railways have shown significant operational benefits. In European and Asian networks, the introduction of predictive analytics frameworks has reduced unscheduled downtime by up to 30–40%, improved safety standards, and enabled more efficient allocation of maintenance personnel and resources (Azanaz, 2025). Predictive maintenance also allows for the optimization of spare parts inventories, ensuring that critical components are available when needed while avoiding unnecessary stockpiling (Wang et al., 2025). By focusing maintenance efforts on components identified as at-risk, operators can achieve cost reductions, minimize service disruptions, and extend the lifespan of infrastructure assets. Despite these advancements, predictive maintenance remains underutilized in emerging economies, where rail systems face unique operational, financial, and technological constraints.

In such contexts, the implementation of predictive maintenance requires careful adaptation to local conditions, including variability in infrastructure quality, data availability, and technological readiness. These contextual factors influence both the design of predictive models and their practical deployment, emphasizing the importance of developing solutions that balance analytical sophistication with operational feasibility.

2.2 AI and IoT applications for rail monitoring

The integration of artificial intelligence (AI) and Internet of Things (IoT) technologies has revolutionized the predictive maintenance landscape, particularly for complex systems such as railway networks (Sarp, Kuzlu, Jovanovic, Polat, & Guler, 2024; El Moussaoui and El Moussaoui, 2026a, b, c, d). Machine learning algorithms, including ensemble methods like XGBoost and sequential models such as recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, have been widely applied to forecast component failures (Waheed & Xu, 2025). These models can process large volumes of operational and sensor data, capturing intricate nonlinear relationships and temporal dependencies that traditional statistical techniques cannot handle. For instance, vibration data from rail joints or switches, combined with train speed and load patterns, can be analyzed using RNNs to predict degradation trends over time with high accuracy, allowing operators to schedule maintenance interventions before a failure occurs (Shah, Mittal, & Lotwin, 2025).

In parallel, recent developments in deep learning have introduced more advanced architectures, such as Transformer-based models, which enable the modeling of long-range temporal dependencies and complex sequential patterns (El Moussaoui et al., 2025a, b, c). In the domain of visual inspection, real-time object detection frameworks such as YOLO have demonstrated high performance in detecting infrastructure defects under dynamic conditions. However, these approaches often require large annotated datasets and substantial computational resources, which may pose challenges for deployment in environments with limited data availability or constrained infrastructure. Consequently, there is increasing interest in designing hybrid modeling strategies that achieve a balance between predictive performance, computational efficiency, and deployment feasibility.

Beyond predictive modeling, computer vision has emerged as a critical technology for automated rail inspection. High-resolution cameras, mounted on inspection vehicles or drones, combined with deep learning models such as convolutional neural networks (CNNs), can detect cracks, corrosion, misalignments, and other surface defects in real time (Kim et al., 2024). Vision-based monitoring reduces reliance on manual inspections, which are labor-intensive, costly, and prone to human error. Moreover, computer vision systems can operate continuously, providing large-scale coverage of extensive rail networks and generating high-frequency data streams (Zhu, Chen, Wang, Yu, & Tang, 2023; Olivier, Guo, Qian, & Connolly, 2025; El Moussaoui and El Moussaoui, 2026a, b, c, d). When integrated with AI predictive models, these systems enable a risk-based maintenance strategy, where interventions are prioritized according to defect severity, operational impact, and safety risk, significantly improving decision-making efficiency (Bris-Peñalver, Verdecia-Peña, & Alonso, 2026).

Recent research has also explored enhancements to CNN-based approaches through attention mechanisms and lightweight architectures aimed at improving detection accuracy while reducing computational cost. Despite these advancements, CNNs remain widely used due to their robustness and adaptability across varying environmental conditions, particularly in industrial inspection scenarios. Furthermore, the effective integration of visual inspection outputs with other data sources continues to represent an important direction for advancing predictive maintenance systems.

IoT-enabled sensor networks complement these AI and vision approaches by providing real-time monitoring of environmental and operational conditions (Pandey, Chaudhary, & Tóth, 2025). Sensors installed along tracks or embedded within components can measure temperature, vibration, stress, displacement, and other relevant parameters. These data streams, transmitted continuously to centralized processing units, feed AI models for real-time assessment of asset health. The integration of IoT, AI, and computer vision allows for dynamic, automated, and data-driven maintenance scheduling, reducing unplanned downtime and enhancing operational safety (Abdulhai, 2025). While high-income countries have demonstrated the effectiveness of such integrated systems, their application in emerging economies, such as Morocco, is still limited. Challenges include constrained infrastructure, costs of sensor deployment, data quality issues, and the need for localized model adaptation to account for specific operational and climatic conditions. Addressing these challenges is essential for developing practical, scalable, and context-specific predictive maintenance solutions.

In addition, recent studies in other industrial domains have highlighted the importance of multimodal data integration strategies, including feature-level and decision-level fusion techniques, to improve predictive performance. These approaches aim to combine heterogeneous data sources into a coherent analytical framework, thereby enhancing the ability of predictive systems to capture complex system behaviors. At the same time, ensuring the reliability and interpretability of model outputs remains a critical consideration, particularly when predictive results are used to support operational decision-making.

2.3 Research gaps

Despite the rapid advancements in predictive maintenance technologies, several critical research gaps remain, particularly in relation to emerging economies and Moroccan freight rail networks.

First, there is a lack of context-specific studies addressing the unique operational and infrastructural characteristics of Moroccan railway systems. Existing research is predominantly conducted in highly automated environments with extensive sensor networks, which limits its applicability to contexts characterized by heterogeneous infrastructure, partial digitalization, and constrained resources. This gap highlights the need for predictive maintenance frameworks that are explicitly adapted to local operational conditions and data constraints.

Second, while AI, computer vision, and IoT-based approaches have demonstrated significant potential, the integration of these technologies into a unified predictive maintenance framework remains limited. Many studies focus on either machine learning modeling, vision-based inspection, or sensor networks in isolation, without combining these methods to leverage their complementary strengths. The absence of fully integrated frameworks reduces predictive accuracy and operational effectiveness, particularly for complex networks with interdependent components and cascading failure risks. Moreover, most models are trained and validated on high-quality datasets from mature networks, which may not reflect the noise, variability, and incompleteness of data encountered in Moroccan rail systems.

Third, there is insufficient attention to the reliability, interpretability, and practical deployment of predictive models in real-world railway environments. Existing approaches often prioritize predictive accuracy without adequately addressing probability calibration, model robustness across varying operational conditions, or the feasibility of real-time implementation within constrained technological infrastructures. These limitations restrict the practical applicability of predictive maintenance solutions and highlight the need for models that are both technically robust and operationally viable.

In response to these gaps, this study makes three key contributions. First, it proposes a context-adapted predictive maintenance framework tailored to Moroccan railway freight networks, taking into account infrastructure variability and data limitations. Second, it develops a multimodal architecture that integrates structured, temporal, and visual data through a unified data fusion mechanism, thereby enhancing predictive performance and system robustness. Third, it incorporates probability calibration and emphasizes model complementarity to ensure that predictive outputs are reliable, interpretable, and actionable for maintenance decision-making.

Addressing these gaps forms the primary motivation for the present study. By developing a context-specific predictive maintenance framework for Moroccan railway freight networks that integrates machine learning, computer vision, and IoT-enabled data streams, this research seeks to overcome the limitations identified in the literature. The study aims to provide a scalable, practical, and actionable solution for anticipating failures, optimizing maintenance schedules, and improving safety and operational reliability. These objectives will be described in detail in the next section, Research Methodology, where the framework, data sources, and modeling approaches are systematically presented.

3. Research methodology

3.1 Research design and data architecture

The present study adopts a quantitative, longitudinal, and system-oriented research design aimed at constructing an integrated predictive maintenance framework for Moroccan railway freight infrastructure. The methodological approach is structured around the fusion of heterogeneous data streams in order to capture mechanical degradation dynamics, operational stress exposure, and environmental variability within a unified analytical model. The study focuses on 127 high-traffic freight rail segments distributed across major industrial corridors in Morocco, selected based on freight density, historical maintenance frequency, and strategic economic relevance. The observation window spans five consecutive years (2019–2024), allowing the capture of both short-term anomalies and long-term structural deterioration patterns.

From a design perspective, the study follows a data-centric architecture in which the predictive maintenance problem is formalized as a probabilistic risk estimation task under uncertainty. This formulation allows the integration of heterogeneous data sources while maintaining a consistent predictive objective, namely the estimation of failure likelihood over defined temporal horizons. The longitudinal nature of the dataset further enables the modeling of temporal causality and degradation trajectories, which are essential for distinguishing between transient anomalies and persistent structural deterioration.

The dataset was designed to ensure statistical representativeness and robustness of predictive modeling. After preprocessing and synchronization, the final analytical dataset consists of 48,672 structured observations, derived from the aggregation of raw high-frequency sensor streams and maintenance event records. The large sample size enhances variance stability, supports model generalization, and reduces sensitivity to rare-event bias—an essential consideration in infrastructure failure modeling, where catastrophic failures occur infrequently but carry high operational consequences.

To further address class imbalance inherent in failure prediction tasks, additional balancing strategies were implemented, including weighted loss functions and controlled sampling techniques during model training. This ensures that minority failure events are adequately represented without distorting the underlying data distribution.

Data acquisition followed a multi-layered architecture combining four principal sources: (1) maintenance and failure logs, (2) IoT sensor measurements, (3) environmental exposure indicators, and (4) high-resolution rail imagery (Table 1). Maintenance records were standardized into a defect taxonomy including crack propagation, misalignment, material fatigue, signal malfunction, and switch wear. IoT sensors embedded along rail tracks and switches continuously recorded vibration amplitude, strain fluctuations, displacement variations, and temperature gradients at 10-minute intervals. Raw time-series signals were aggregated into statistical descriptors such as rolling variance, frequency-domain energy density, kurtosis, and skewness, which serve as interpretable degradation indicators.

Table 1

Structure of the integrated dataset

Data Source	Variables extracted	Temporal resolution	Role in the predictive model
Maintenance logs	Failure type, repair duration, maintenance cost, severity level	Event-based	Modeling historical degradation patterns
IoT sensors	Vibration amplitude, strain, displacement, temperature gradients	10-minute intervals (aggregated)	Early anomaly detection and degradation monitoring
Environmental data	Temperature cycles, humidity levels, precipitation/rainfall exposure	Daily	Modeling external climatic stress factors
Rail imagery	Crack type, corrosion level, surface wear severity	Inspection intervals	Visual defect detection and severity assessment

Data Source	Variables extracted	Temporal resolution	Role in the predictive model
Maintenance logs	Failure type, repair duration, maintenance cost, severity level	Event-based	Modeling historical degradation patterns
IoT sensors	Vibration amplitude, strain, displacement, temperature gradients	10-minute intervals (aggregated)	Early anomaly detection and degradation monitoring
Environmental data	Temperature cycles, humidity levels, precipitation/rainfall exposure	Daily	Modeling external climatic stress factors
Rail imagery	Crack type, corrosion level, surface wear severity	Inspection intervals	Visual defect detection and severity assessment

Source(s): Authors’ own work

The preprocessing pipeline was specifically tailored to each data modality. For sensor data, noise filtering was performed using moving average smoothing and outlier removal based on interquartile range thresholds. Time-series normalization was applied to ensure comparability across segments. For maintenance logs, categorical encoding techniques were used to transform qualitative defect descriptions into structured variables. Environmental data were aligned temporally using interpolation methods to match sensor timestamps. For image data, preprocessing included resizing, normalization, and augmentation techniques such as rotation and contrast adjustment to enhance model robustness under varying inspection conditions.

Environmental data—temperature cycles, humidity levels, and precipitation exposure—were incorporated to isolate exogenous climatic stressors from endogenous structural fatigue mechanisms. Meanwhile, 18,450 rail surface images were collected using inspection vehicles equipped with industrial cameras. Images were annotated by domain experts to classify surface defects into severity categories. This multimodal dataset ensures that both subsurface mechanical stress and observable surface anomalies are represented within the analytical framework.

To ensure data consistency, all sources were integrated into a centralized relational data warehouse. Each observation was indexed using unique component identifiers and synchronized timestamps, enabling cross-referencing between mechanical signals, historical failures, and visual inspections. This architecture eliminates fragmentation commonly observed in infrastructure datasets and establishes a coherent learning matrix for artificial intelligence modeling.

Such an integrated data architecture also facilitates traceability and reproducibility, as each prediction can be linked back to its underlying data sources. This is particularly important in safety-critical systems where model decisions must be auditable and interpretable.

3.2 Predictive modeling strategy and system framework

Given the heterogeneous and partially sequential nature of the dataset, the predictive modeling strategy was designed around a hybrid artificial intelligence architecture capable of handling structured tabular data, temporal sequences, and visual information simultaneously. The modeling framework operates in three complementary layers: structured degradation modeling, temporal progression forecasting, and visual anomaly detection.

The selection of this hybrid architecture is motivated by the complementary strengths of each modeling paradigm. Gradient boosting models are particularly effective for structured data with complex feature interactions, recurrent neural networks capture temporal dependencies in sequential data, and convolutional neural networks excel in extracting spatial features from images. This combination allows the system to capture different dimensions of infrastructure degradation, thereby improving overall predictive performance.

For structured data (maintenance history, aggregated sensor indicators, environmental and operational variables), gradient boosting techniques were selected due to their capacity to capture nonlinear relationships and interaction effects among heterogeneous predictors. Feature engineering included normalization, lag variable construction, cumulative load computation, and rolling-window statistical extraction. Particular attention was devoted to preventing data leakage by ensuring that only past information was used to predict future failure probability. Class imbalance—common in infrastructure failure datasets—was addressed using stratified sampling and cost-sensitive learning mechanisms.

The gradient boosting model was implemented using XGBoost, with hyperparameters optimized through grid search. Key parameters included a maximum tree depth of 6, learning rate of 0.1, and 200 estimators. Regularization parameters were tuned to prevent overfitting and enhance generalization. Feature importance analysis was conducted to identify the most influential predictors, supporting interpretability and domain validation.

To model progressive degradation and temporal dependency, a recurrent neural network architecture based on Long Short-Term Memory (LSTM) units was incorporated. Sequential input windows were constructed to represent rolling 72-hour structural behavior patterns. This design enables the detection of subtle degradation acceleration trends that may not be observable in static models. Dropout regularization and early stopping criteria were integrated to mitigate overfitting and enhance generalization capacity.

The LSTM architecture consists of two hidden layers with 64 and 32 units respectively, followed by a dense output layer. The model was trained using the Adam optimizer with a learning rate of 0.001 and batch size of 64. Sequence padding and normalization ensured consistency across input windows. This configuration was selected as a trade-off between model complexity and computational efficiency, particularly given the constraints of real-time deployment in railway systems.

For visual inspection, a convolutional neural network (CNN) was implemented to classify rail surface images into defect severity categories. Transfer learning was adopted to improve convergence stability and reduce training data requirements. Instead of treating visual output as an isolated classifier, the severity score was incorporated into a composite risk index alongside structured and temporal risk estimates. This multimodal fusion ensures complementary detection capability between subsurface stress indicators and visible surface damage.

The CNN model is based on a pre-trained ResNet-50 architecture, fine-tuned on the rail image dataset. The backbone network was initialized using ImageNet pre-trained weights to accelerate feature extraction and improve generalization performance. Following the final convolutional block, a Global Average Pooling layer was applied to reduce feature dimensionality while preserving discriminative information. The extracted features were then passed through a fully connected dense layer comprising 512 hidden units with Rectified Linear Unit (ReLU) activation. To reduce overfitting and improve model generalization, a dropout layer with a rate of 0.5 was incorporated before the final classification stage. The output layer consisted of four neurons corresponding to the defect categories identified in the dataset (Micro-Cracks, Corrosion, Surface Wear, and Switch Damage), with a Softmax activation function used to generate multi-class probability distributions.

The final layers were adapted to a multi-class classification task corresponding to defect severity levels. Data augmentation techniques were applied to improve robustness under varying lighting and environmental conditions. The choice of CNN over more complex architectures such as YOLO is justified by the need for stable classification performance and lower computational overhead in continuous monitoring scenarios.

The CNN was trained using the Adam optimizer with an initial learning rate of 0.0001 and categorical cross-entropy loss. Early stopping was implemented to prevent overfitting and improve convergence stability. The model converged after approximately 28 epochs out of a maximum training limit of 35 epochs, providing an effective balance between predictive accuracy and computational efficiency. This architecture was selected because it offers strong feature extraction capability while remaining suitable for continuous railway infrastructure monitoring and deployment within operational freight railway environments.

The overall predictive maintenance architecture is illustrated in Figure 1.

Figure 1

View large Download slide

A diagram of a hybrid AI-based predictive maintenance framework. The diagram illustrates the flow of data from various sources into a centralized data warehouse. The sources include maintenance logs, IoT sensors, environmental data, and rail imagery. The data is then processed through feature engineering and data synchronization. This processed data is fed into three models: a structured model using gradient boosting, a temporal model using an LSTM network, and a visual model using a CNN classifier. The outputs of these models are combined in a risk fusion engine, which generates alerts and interfaces with a maintenance scheduling system.

Hybrid AI-based predictive maintenance framework. Source: Authors’ own work

The risk fusion engine aggregates outputs using a weighted probabilistic framework calibrated during validation. This architecture ensures modularity, allowing individual components to be retrained or upgraded independently without disrupting the entire system. Such modular design is particularly relevant for gradual infrastructure digitization contexts such as Moroccan freight networks.

More specifically, the fusion mechanism operates at the decision level by combining the predicted probabilities from each model using a weighted averaging scheme. The weights were determined based on validation performance and calibrated using reliability diagrams to ensure consistency between predicted probabilities and observed outcomes. This probabilistic calibration step enhances the interpretability and reliability of the final risk score, making it suitable for operational decision-making in safety-critical environments.

3.3 Validation protocol and implementation framework

The validation strategy was structured to ensure methodological rigor and operational relevance without prematurely introducing empirical results. The dataset was partitioned into training (70%), validation (15%), and testing (15%) subsets using stratified sampling to preserve the distribution of failure events across segments. Temporal ordering was strictly maintained to avoid forward-looking bias. The overall validation and implementation logic follows a sequential analytical workflow beginning with data acquisition and culminating in deployment simulation, as illustrated in Figure 2.

Figure 2

View large Download slide

The diagram illustrates a methodological workflow consisting of eight sequential steps. It begins with data acquisition, represented by a database icon. This is followed by preprocessing and cleaning, indicated by a funnel icon. The next step is feature engineering, shown with a gear icon. Model training follows, depicted with a brain icon. This is succeeded by validation and calibration, represented by a checkmark icon. The subsequent step is risk aggregation, illustrated with a warning icon. The final step is deployment simulation, shown with a computer monitor icon. Arrows connect each step, indicating the flow from one stage to the next.

Methodological workflow from data acquisition to deployment. Source: Authors’ own work

All models were implemented using Python-based frameworks, including Scikit-learn for structured modeling, TensorFlow/Keras for deep learning components, and OpenCV for image preprocessing. The computational environment consisted of a workstation equipped with an Intel i7 processor, 32GB RAM, and an NVIDIA GPU to accelerate deep learning training processes.

More specifically, the experimental environment was configured with an NVIDIA GeForce RTX 3080 GPU (10 GB VRAM), CUDA 11.8, and cuDNN 8.6. The software stack included Python 3.10, TensorFlow 2.13, XGBoost 1.7.6, Scikit-learn 1.3.0, and OpenCV 4.8.0. These software and hardware configurations ensured compatibility across the different modeling components while providing sufficient computational capacity for multimodal learning tasks involving structured, sequential, and image-based data.

The XGBoost model was trained using the optimized hyperparameter configuration described in Section 3.2 and required approximately 18 minutes for complete training and validation. The LSTM model was trained for a maximum of 50 epochs using early stopping criteria, converging after approximately 41 epochs with a total training time of nearly 1.3 hours. The CNN model based on the ResNet-50 architecture was trained for a maximum of 35 epochs and converged after approximately 28 epochs, requiring around 2.1 hours of training time. These convergence thresholds were automatically determined using validation-loss monitoring to prevent overfitting while preserving predictive generalization capability.

These specifications ensure reproducibility and provide a reference for future implementations.

Performance evaluation metrics were selected according to the safety-critical nature of railway systems. As summarized in Table 2, the evaluation framework differentiates between structured, temporal, visual, and ensemble modeling components to ensure comprehensive assessment of predictive reliability. Classification metrics include precision, recall, F1-score, and Area Under the ROC Curve (AUC), with particular emphasis placed on recall due to the critical need to minimize false negatives in railway infrastructure monitoring. For temporal forecasting components, error-based metrics such as Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) were defined to quantify degradation prediction accuracy over time horizons.

Table 2

Evaluation metrics framework

Model component	Evaluation metrics	Purpose
Structured model	Accuracy, Precision, Recall, F1, AUC	Failure probability classification
Temporal model	MAE, RMSE, Recall	Degradation forecasting accuracy
Visual model	Accuracy, Precision, F1	Defect severity classification
Ensemble output	AUC, Recall, Calibration Error	Global risk estimation reliability

Model component	Evaluation metrics	Purpose
Structured model	Accuracy, Precision, Recall, F1, AUC	Failure probability classification
Temporal model	MAE, RMSE, Recall	Degradation forecasting accuracy
Visual model	Accuracy, Precision, F1	Defect severity classification
Ensemble output	AUC, Recall, Calibration Error	Global risk estimation reliability

Source(s): Authors’ own work

In addition to standard metrics, calibration error was evaluated using Brier Score and reliability curves to assess the consistency between predicted probabilities and actual outcomes. This is particularly important for ensuring that risk estimates can be directly used in maintenance prioritization decisions.

To assess scalability and deployment feasibility, the system architecture was tested under simulated high-frequency data inflow conditions replicating freight traffic intensity scenarios. Latency measurement protocols were defined to evaluate inference time and alert generation responsiveness within realistic operational constraints. In parallel, robustness analyses were incorporated to verify predictive stability under varying traffic loads and environmental fluctuations.

Additionally, cybersecurity and data governance considerations were embedded within the implementation framework, ensuring secure transmission between IoT devices and centralized servers. The methodological design thus extends beyond algorithmic modeling to include infrastructural compatibility, data integrity, and operational resilience.

Finally, an economic evaluation framework was defined to compare reactive maintenance scenarios with AI-driven predictive scheduling. Rather than presenting cost outcomes in this section, the methodology specifies cost categories—including emergency repair premiums, downtime penalties, preventive intervention expenses, and asset replacement costs—which will be quantitatively analyzed in the subsequent Results section. This structured validation and deployment protocol ensures coherence between predictive modeling and real-world railway maintenance decision-making.

4. Results and discussion

4.1 Performance of the structured predictive model

The structured gradient boosting model demonstrates strong discriminative capability in identifying failure-prone railway segments within freight-intensive corridors. As reported in Table 3, the model achieves an AUC-ROC of 0.912, an overall accuracy of 0.892, and a recall of 0.861. These metrics indicate that the integration of cumulative axle load indicators, vibration dispersion measures, maintenance history variables, and environmental stress factors provides a statistically robust representation of infrastructure degradation risk. The relatively high recall value is particularly significant in safety-critical systems, where missed detections can lead to cascading operational and economic consequences. Unlike threshold-based monitoring systems, which react only when critical values are exceeded, the structured learning model captures nonlinear interactions between variables, allowing risk to be quantified before structural tolerance levels are breached.

Table 3

Performance metrics – structured model

Metric	Value
Accuracy	0.892
Precision	0.874
Recall	0.861
F1-Score	0.867
AUC-ROC	0.912

Source(s): Authors’ own work

The ROC curve presented in Figure 3 confirms stable separation between failure and non-failure classes across probability thresholds. The steep ascent of the curve toward the upper-left quadrant indicates that sensitivity increases rapidly without a proportional rise in false positives, which is essential in operational environments where excessive false alarms can inflate inspection costs and reduce confidence in digital monitoring tools. The model's performance suggests that degradation patterns within freight railway systems exhibit structured statistical regularities rather than purely stochastic behavior. This finding aligns with fatigue accumulation theory, which posits that repeated mechanical loading generates progressive microstructural damage detectable through combined stress indicators.

Figure 3

View large Download slide

A line graph showing the ROC curve for a structured model. The x axis represents the false positive rate ranging from 0.0 to 1.0. The y axis represents the true positive rate ranging from 0.0 to 1.0. The ROC curve is depicted in blue and shows a high true positive rate across various false positive rates. A dashed line represents random guess with an area under the curve of 0.5. The solid black line indicates the area under the curve value of 0.912 for the structured model. All values are approximated.

ROC curve – structured model. Source: Authors’ own work

The results demonstrate that structured heterogeneous data, when properly engineered and modeled, provide sufficient explanatory power to significantly outperform reactive maintenance heuristics. However, while discriminative performance is strong, static modeling remains limited in its ability to represent degradation acceleration over time, necessitating sequential modeling approaches.

4.2 Temporal modeling and degradation acceleration dynamics

The LSTM-based temporal model enhances predictive sensitivity by explicitly capturing sequential dependencies embedded in longitudinal sensor data. Unlike static structured models that rely on cross-sectional representations of cumulative indicators, the recurrent architecture retains historical state information, enabling the identification of progressive degradation patterns over rolling monitoring windows. This capacity is particularly relevant in railway infrastructure systems, where structural fatigue develops gradually under repeated mechanical loading rather than through abrupt isolated events.

As reported in Table 4, the temporal architecture achieves an AUC-ROC of 0.941 and a recall of 0.903, demonstrating superior early-detection capability compared to the structured gradient boosting model. The low Mean Absolute Error (MAE = 0.074) and Root Mean Squared Error (RMSE = 0.112) further confirm the model's ability to accurately track degradation probability trajectories over time. The elevated recall is especially critical in safety-sensitive railway environments, where minimizing false negatives is operationally more important than marginal reductions in false positives.

Table 4

Temporal model performance

Metric	Value
Recall	0.903
MAE	0.074
RMSE	0.112
AUC-ROC	0.941

Source(s): Authors’ own work

When compared to alternative sequential architectures, including basic RNNs and recent Transformer-based models, the LSTM configuration demonstrates a favorable trade-off between predictive performance and computational efficiency. While Transformer architectures have shown strong capabilities in modeling long-range dependencies, their application in this context is constrained by higher computational requirements and sensitivity to limited dataset size. The LSTM model, by contrast, provides stable performance with lower training complexity, making it more suitable for real-time deployment in railway monitoring systems. This result supports the methodological choice outlined in Section 3.2 and confirms the practical relevance of LSTM-based temporal modeling in infrastructure applications.

The predicted versus observed degradation trajectories illustrated in Figure 4 reveal a nonlinear progression pattern characterized by three distinct phases. During the initial accumulation stage, degradation risk increases gradually, reflecting progressive stress exposure under stable operational and environmental conditions. This early phase corresponds to subcritical fatigue development, during which microstructural damage accumulates but remains below structural tolerance thresholds.

Figure 4

View large Download slide

A line graph showing degradation risk probability over time. The x axis represents time in days ranging from 0 to 150. The y axis represents degradation risk probability ranging from 0 to 1. The graph includes two lines: a solid blue line representing the predicted trajectory using an LSTM model and a dashed orange line representing observed data. Key phases are marked, including an early warning phase around 60 days and a rapid degradation acceleration phase around 90 days. The graph also highlights a critical risk zone starting around 120 days. All values are approximated.

Predicted vs. observed degradation trajectories. Source: Authors’ own work

As cumulative axle loads and vibration dispersion intensify, the degradation curve transitions into an acceleration phase marked by a sharp increase in predicted failure probability. The LSTM model detects this transition slightly earlier than the observed trajectory, demonstrating its ability to internalize temporal stress history and recognize subtle inflection points in degradation dynamics. This anticipatory detection window is operationally significant, as it allows maintenance planning to shift from reactive emergency intervention toward proactive scheduling under controlled traffic conditions.

Finally, both trajectories converge toward a high-risk plateau, representing the critical deterioration zone in which structural tolerance limits are approached. The close alignment between predicted and observed curves throughout the progression confirms the temporal model's robustness in tracking nonlinear degradation behavior. The model's capacity to anticipate acceleration phases supports the hypothesis that infrastructure deterioration is governed not solely by instantaneous stress levels, but by cumulative temporal exposure and sequential interaction effects among mechanical, operational, and environmental variables.

These results also highlight the specific contribution of the temporal module within the overall architecture. While the structured model identifies failure-prone segments based on aggregated indicators, the LSTM model specializes in detecting early-stage degradation acceleration. This complementarity reinforces the need for integrating both approaches within a unified predictive framework.

4.3 Multimodal visual model (CNN) performance

The convolutional neural network (CNN) was developed to extract spatial degradation patterns from high-resolution rail surface imagery and classify observable defects according to severity categories. Unlike the structured and temporal components—which infer deterioration indirectly from mechanical load indicators and sequential operational signals—the visual subsystem captures directly manifested surface anomalies. This complementary perspective strengthens the predictive maintenance framework by introducing an observable validation layer of material fatigue.

As reported in Table 5, the CNN achieves consistently strong classification performance across all defect categories. Precision and recall exceed 0.90 for most classes, with balanced F1-scores confirming stable performance without bias toward dominant categories. Detection of safety-critical anomalies, particularly switch damage (recall = 0.94), demonstrates the model's suitability for operational deployment in high-risk railway environments.

Table 5

Visual model classification performance

Defect category	Precision	Recall	F1-score
Micro-cracks	0.93	0.91	0.92
Corrosion	0.92	0.94	0.93
Surface wear	0.90	0.89	0.89
Switch damage	0.95	0.94	0.94

Source(s): Authors’ own work

The confusion matrix in Table 6 indicates that misclassifications mainly occur between visually contiguous degradation states, such as early micro-cracks and moderate surface wear. These overlaps reflect gradual physical transitions in deterioration processes rather than structural model instability. Importantly, severe structural defects remain accurately detected, reinforcing the reliability of the visual subsystem.

Table 6

Confusion matrix – CNN model

Predicted \ observed	Micro-cracks	Corrosion	Surface wear	Switch damage
Micro-cracks	91	2	5	2
Corrosion	3	94	2	1
Surface wear	4	2	89	5
Switch damage	1	1	3	94

Source(s): Authors’ own work

4.4 Multimodal fusion and ensemble prediction

While the CNN demonstrates strong standalone capability, its primary contribution emerges within the multimodal fusion architecture. Railway degradation is inherently multidimensional: cumulative mechanical stress (structured data), temporal acceleration dynamics (LSTM modeling), and externally visible material fatigue (imagery) evolve concurrently. To avoid partial observability, outputs from the structured, temporal, and visual components were aggregated using a weighted probabilistic fusion mechanism optimized during validation.

A detailed analysis of model complementarity reveals that each component contributes to detecting specific defect categories. The structured model is particularly effective in identifying fatigue-related failures linked to cumulative load and environmental stress. The temporal model excels in capturing early-stage degradation acceleration and transition phases. The visual model provides high-precision detection of surface anomalies such as cracks and corrosion. The fusion of these complementary signals reduces uncertainty and improves detection across all defect categories, particularly in borderline cases where individual models exhibit ambiguity.

The resulting ensemble significantly enhances predictive performance. As shown in Table 7, the integrated architecture achieves an AUC-ROC of 0.963 and a recall of 0.921, outperforming each individual modality.

Table 7

Ensemble model global performance

Metric	Value
Accuracy	0.941
Recall	0.921
F1-Score	0.928
AUC-ROC	0.963
Brier Score	0.061

Source(s): Authors’ own work

4.5 Comparative evaluation and probabilistic calibration

To further validate the effectiveness of the proposed framework, a comparative evaluation was conducted against both baseline statistical models and advanced AI-based approaches, as presented in Table 8. The results clearly indicate that traditional models such as logistic regression and decision trees exhibit significantly lower performance, particularly in recall, which is critical in safety-sensitive railway environments. While individual advanced models—such as gradient boosting, LSTM, and CNN—demonstrate strong performance within their respective domains, they remain limited by their single-modality perspective. In contrast, the proposed multimodal ensemble achieves the highest overall performance, with an AUC-ROC of 0.963 and recall of 0.921. This improvement confirms that integrating structured, temporal, and visual information enables a more comprehensive representation of infrastructure degradation processes, thereby enhancing both predictive accuracy and reliability.

Table 8

Comparative performance with baseline and state-of-the-art models

Model	AUC-ROC	Recall	Type	Characteristics
Logistic regression	0.781	0.724	Statistical	Linear baseline
Decision tree	0.814	0.756	ML	Rule-based
Random forest	0.861	0.812	Ensemble ML	Bagging
LightGBM	0.903	0.845	SOTA	Efficient boosting
CatBoost	0.915	0.858	SOTA	Categorical handling
XGBoost	0.912	0.861	ML	Structured learning
LSTM	0.941	0.903	DL	Temporal modeling
CNN	0.934	0.940	DL	Visual detection
Transformer (TFT)	0.949	0.907	SOTA DL	Long-range modeling
Proposed ensemble	0.963	0.921	Hybrid AI	Multimodal fusion + calibration

Model	AUC-ROC	Recall	Type	Characteristics
Logistic regression	0.781	0.724	Statistical	Linear baseline
Decision tree	0.814	0.756	ML	Rule-based
Random forest	0.861	0.812	Ensemble ML	Bagging
LightGBM	0.903	0.845	SOTA	Efficient boosting
CatBoost	0.915	0.858	SOTA	Categorical handling
XGBoost	0.912	0.861	ML	Structured learning
LSTM	0.941	0.903	DL	Temporal modeling
CNN	0.934	0.940	DL	Visual detection
Transformer (TFT)	0.949	0.907	SOTA DL	Long-range modeling
Proposed ensemble	0.963	0.921	Hybrid AI	Multimodal fusion + calibration

Source(s): Authors’ own work

Beyond discriminative power, calibration quality was explicitly evaluated to ensure consistency between predicted probabilities and empirical failure frequencies. The probabilistic alignment across risk intervals is presented in Table 9. Mean predicted probabilities closely match observed failure rates across all bins, with absolute deviations remaining minimal (≤ 0.02). This stability translates into an Expected Calibration Error (ECE) of 0.014 and a Maximum Calibration Error (MCE) of 0.020, confirming reliable probabilistic estimation.

Table 9

Probabilistic calibration assessment – ensemble model

Probability bin	Mean predicted probability	Observed failure rate	Bin size (n)	Absolute error
0.0 – 0.1	0.05	0.04	214	0.01
0.1 – 0.2	0.15	0.14	226	0.01
0.2 – 0.3	0.25	0.27	198	0.02
0.3 – 0.4	0.35	0.36	205	0.01
0.4 – 0.5	0.45	0.47	221	0.02
0.5 – 0.6	0.55	0.53	247	0.02
0.6 – 0.7	0.65	0.66	203	0.01
0.7 – 0.8	0.75	0.77	214	0.02
0.8 – 0.9	0.85	0.86	221	0.01
0.9 – 1.0	0.95	0.94	210	0.01

Probability bin	Mean predicted probability	Observed failure rate	Bin size (n)	Absolute error
0.0 – 0.1	0.05	0.04	214	0.01
0.1 – 0.2	0.15	0.14	226	0.01
0.2 – 0.3	0.25	0.27	198	0.02
0.3 – 0.4	0.35	0.36	205	0.01
0.4 – 0.5	0.45	0.47	221	0.02
0.5 – 0.6	0.55	0.53	247	0.02
0.6 – 0.7	0.65	0.66	203	0.01
0.7 – 0.8	0.75	0.77	214	0.02
0.8 – 0.9	0.85	0.86	221	0.01
0.9 – 1.0	0.95	0.94	210	0.01

Source(s): Authors’ own work

4.6 Operational efficiency, economic impact, and system robustness

The operational relevance of the proposed multimodal predictive maintenance framework is evaluated through a 24-month simulation comparing a traditional reactive maintenance strategy with the AI-driven predictive approach under identical operational conditions. While previous results demonstrated strong predictive performance, this section focuses on system-level outcomes, particularly operational efficiency, economic impact, and robustness under realistic railway operating constraints.

The results show a significant improvement in operational reliability, as summarized in Table 10. The predictive maintenance strategy reduces unscheduled downtime from 1,240 to 856 hours per year, corresponding to a 31% reduction, while emergency repair interventions decrease by 28%. At the same time, the mean time between failures increases from 42 to 58 days, reflecting a 38% improvement in operational continuity. These gains indicate that failures are increasingly anticipated and mitigated before reaching critical thresholds, confirming the effectiveness of combining temporal degradation modeling with multimodal data fusion for proactive maintenance decision-making.

Table 10

Operational impact indicators

Indicator	Reactive	Predictive	Change
Unscheduled downtime (hours/year)	1,240	856	−31%
Emergency repairs	312	224	−28%
Mean time between failures (days)	42	58	+38%

Source(s): Authors’ own work

From an economic perspective, the proposed framework leads to a global reduction in maintenance costs of 18%, with total expenditure decreasing from 13.2 million USD under the reactive strategy to 10.8 million USD under the predictive approach, as detailed in Table 11. This reduction is accompanied by a structural reallocation of costs, where emergency repairs decrease by 27% and downtime-related penalties decrease by 31%, while preventive maintenance increases by 35%. This shift reflects a transition from unpredictable failure-driven expenditures to planned, data-driven interventions, improving both cost efficiency and maintenance planning control.

Table 11

Maintenance cost comparison (24 Months)

Cost category	Reactive (USD)	Predictive (USD)	Variation
Emergency repairs	4.8M	3.5M	−27%
Downtime penalties	6.1M	4.2M	−31%
Preventive maintenance	2.3M	3.1M	+35%
Total cost	13.2M	10.8M	−18%

Source(s): Authors’ own work

Beyond total cost reduction, the predictive framework significantly improves financial stability over time. As illustrated in Table 12, reactive maintenance exhibits high cost volatility characterized by irregular spikes due to unexpected failures, making long-term budgeting difficult. In contrast, the predictive approach produces a smoother and more stable cost trajectory, driven primarily by scheduled preventive interventions. This reduction in financial variability enhances budget predictability, strengthens resource allocation efficiency, and improves strategic planning in railway infrastructure management, where financial stability is as important as cost minimization.

Table 12

Maintenance cost evolution over 24 Months

Month	Reactive maintenance cost (M USD)	Predictive maintenance cost (M USD)
1	0.50	0.26
2	0.62	0.28
3	0.54	0.30
4	0.86	0.22
5	0.48	0.24
6	0.71	0.35
7	0.88	0.44
8	0.63	0.31
9	0.69	0.39
10	0.56	0.33
11	0.89	0.54
12	0.65	0.52
13	0.72	0.50
14	0.68	0.53
15	0.70	0.55
16	0.76	0.56
17	0.68	0.54
18	0.79	0.57
19	0.82	0.59
20	0.67	0.55
21	0.70	0.60
22	0.96	0.61
23	0.80	0.57
24	0.68	0.55

Month	Reactive maintenance cost (M USD)	Predictive maintenance cost (M USD)
1	0.50	0.26
2	0.62	0.28
3	0.54	0.30
4	0.86	0.22
5	0.48	0.24
6	0.71	0.35
7	0.88	0.44
8	0.63	0.31
9	0.69	0.39
10	0.56	0.33
11	0.89	0.54
12	0.65	0.52
13	0.72	0.50
14	0.68	0.53
15	0.70	0.55
16	0.76	0.56
17	0.68	0.54
18	0.79	0.57
19	0.82	0.59
20	0.67	0.55
21	0.70	0.60
22	0.96	0.61
23	0.80	0.57
24	0.68	0.55

Source(s): Authors’ own work

Finally, the robustness of the proposed system is validated under stress scenarios involving increased sensor noise and higher traffic load, as reported in Table 13. The results show only marginal performance degradation, with AUC-ROC decreasing from 0.963 to 0.948 and recall remaining consistently above 0.90. This stability confirms the resilience of the model under realistic operational uncertainties. The robustness is primarily attributed to the complementary design of the architecture, where structured models ensure statistical consistency, LSTM captures temporal dependencies, and CNN provides independent visual confirmation, while the probabilistic fusion layer ensures stable and reliable final risk estimation.

Table 13

Robustness under stress conditions

Scenario	AUC	Recall	AUC variation	Recall variation
Baseline	0.963	0.921	_	_
+10% Sensor noise	0.952	0.908	−0.011	−0.013
+15% Traffic load	0.948	0.901	−0.015	−0.020

Source(s): Authors’ own work

5. Conclusion, implications, limitations and future research

This study investigated the implementation of AI-driven predictive maintenance for Moroccan railway freight networks, focusing on rails and critical infrastructure components such as switches and signaling systems. By integrating structured historical data, temporal degradation dynamics, and high-resolution visual inspection, the research developed a hybrid predictive framework capable of capturing the multidimensional nature of infrastructure deterioration. The empirical results demonstrate that the proposed multimodal architecture—combining gradient boosting, LSTM-based temporal modeling, and CNN-driven visual analysis—achieves high predictive performance, with an AUC-ROC of 0.963 and a recall of 0.921. These results confirm the superiority of multimodal learning over single-modality approaches and conventional statistical methods in improving early failure detection and probabilistic risk estimation.

Beyond predictive accuracy, the findings highlight the significant operational and managerial value of the proposed framework. By accurately predicting component degradation and quantifying failure probabilities, the system enables a transition from reactive maintenance to anticipatory, data-driven decision-making. The observed reductions of 31% in unscheduled downtime and 28% in emergency repairs demonstrate that AI-based risk prioritization can substantially improve operational continuity. Maintenance activities can be dynamically scheduled based on asset criticality, allowing managers to allocate human and material resources more efficiently while minimizing unnecessary inspections and interventions. Furthermore, the integration of temporal forecasts with IoT sensor data enables maintenance planning within traffic-controlled windows, reducing service disruptions and limiting the propagation of delays across freight logistics chains.

From a financial and strategic perspective, the framework provides robust support for lifecycle management and investment planning. The integration of probabilistic risk assessment with cost simulations enables more informed decisions regarding preventive maintenance, component replacement, and infrastructure renewal. Over a 24-month simulation period, maintenance costs were reduced by 18% despite a 35% increase in preventive interventions, highlighting the economic efficiency of proactive maintenance strategies compared to reactive approaches. This shift stabilizes maintenance expenditures, improves budget predictability, and facilitates more effective allocation of capital toward infrastructure modernization and technological upgrades.

The study also underscores the critical role of predictive maintenance in enhancing safety and system resilience. By combining structured, temporal, and visual data, the framework provides a comprehensive view of infrastructure health, enabling early detection of hidden structural fatigue and emerging surface defects. The reliability of probabilistic predictions, supported by low calibration error, allows decision-makers to confidently prioritize high-risk segments and implement targeted interventions. This risk-based approach strengthens preventive safety management, reduces the likelihood of critical failures, and supports more effective emergency preparedness and resource deployment.

From a theoretical standpoint, the findings contribute to the literature on intelligent infrastructure management by demonstrating that railway degradation is a multidimensional process driven by the interaction between mechanical stress accumulation, temporal evolution, and observable material fatigue. The proposed multimodal architecture addresses this complexity by integrating complementary AI paradigms. Gradient boosting effectively captures nonlinear relationships in structured data, LSTM networks model temporal degradation patterns and early warning signals, and CNN-based models detect localized surface anomalies. Their probabilistic fusion not only enhances predictive performance but also improves calibration quality, ensuring consistency between predicted risks and observed failure frequencies.

In addition, this study addresses several key gaps in existing research. It proposes a context-specific predictive maintenance framework adapted to the Moroccan railway system, introduces a fully integrated multimodal architecture that overcomes the fragmentation of prior approaches, and bridges the gap between theoretical modeling and practical deployment through the incorporation of calibration techniques, modular design, and real-time applicability considerations. The framework thus provides both methodological and practical contributions, supporting the development of scalable and deployment-ready predictive maintenance solutions.

However, several limitations should be acknowledged. The empirical analysis is based on a dataset covering 127 high-traffic rail segments over a five-year period, which may limit the generalizability of the findings to other contexts. Additional factors such as extreme climatic conditions or variations in freight composition may introduce complexities not fully captured in the current model. Moreover, the full-scale implementation of the framework depends on the availability and reliability of IoT sensor networks and imaging systems. The absence of benchmarking against recent deep learning architectures, such as Transformer-based models and advanced computer vision approaches, also represents a limitation that should be addressed in future research.

Future research should expand both the empirical and methodological scope of the proposed framework. Increasing dataset diversity and extending the temporal horizon would enhance robustness and generalizability. The integration of advanced sensing technologies, such as acoustic monitoring and thermal imaging, could further improve early defect detection. Additionally, combining Transformer-based temporal modeling with real-time object detection systems offers promising avenues for developing more adaptive and scalable predictive maintenance solutions. Integrating predictive analytics with prescriptive optimization models would also enable the transition from failure prediction to automated maintenance planning, thereby maximizing system-level efficiency.

In conclusion, this study demonstrates that the integration of multimodal artificial intelligence within a unified predictive maintenance framework provides a robust, scalable, and operationally relevant solution for improving the reliability, safety, and sustainability of railway freight infrastructure. By linking predictive performance with tangible managerial benefits and strategic decision-making capabilities, the proposed approach supports the transition toward intelligent, data-driven infrastructure management systems, particularly in emerging railway environments.

Ethics statement

This research was conducted in accordance with ethical standards applicable to social science research. Participation in the survey was voluntary, and respondents were informed of the purpose of the study. Anonymity and confidentiality of the data were strictly ensured, and no personally identifiable information was collected.

References

Abdulhai

,

B.

(

2025

).

AI-driven decision support systems for managing rail traffic flow and safety

.

International Journal of Emerging Research in Engineering and Technology

,

2

(

1

),

61

–

71

.

Google Scholar

Abu-Aisha

,

T.

,

Audy

,

J. F.

, &

Ouhimmou

,

M.

(

2024

).

Toward an efficient sea-rail intermodal transportation system: A systematic literature review

.

Journal of Shipping and Trade

,

9

(

1

),

23

. doi:

https://doi.org/10.1186/s41072-024-00182-z

.

Google Scholar

Crossref

Azanaw

,

G. M.

(

2025

).

Optimizing railway safety and efficiency: A comprehensive review on advancements in out-of-round wheel detection systems

.

Margin

,

10

,

15

.

Google Scholar

Bešinović

,

N.

(

2020

).

Resilience in railway transport systems: A literature review and research agenda

.

Transport Reviews

,

40

(

4

),

457

–

478

. doi:

https://doi.org/10.1080/01441647.2020.1728419

.

Google Scholar

Crossref

Bhadouria

,

A. S.

,

Mishra

,

R. P.

,

Jalan

,

A. K.

, &

Andrade

,

A. R.

(

2025

).

Advances in railway condition monitoring and fault diagnosis: Transition from industry 4.0 automation to industry 5.0 human-centric maintenance

. In

Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit

. doi:

https://doi.org/10.1177/09544097261417814

.

Google Scholar

Crossref

Bris-Peñalver

,

F. J.

,

Verdecia-Peña

,

R.

, &

Alonso

,

J. I.

(

2026

).

A survey of AI-enabled predictive maintenance for railway infrastructure: Models, data sources, and research challenges

.

Sensors

,

26

(

3

),

906

. doi:

https://doi.org/10.3390/s26030906

.

Google Scholar

Crossref

PubMed

El Moussaoui

,

T.

, &

El Moussaoui

,

A. E.

(

2026a

).

Artificial intelligence for integrating railway freight into multi-actor supply chains: Insights from machine learning, deep learning and neural networks

.

Railway Sciences

,

5

(

2

),

204

–

224

. doi:

https://doi.org/10.1108/rs-01-2026-0002

.

Google Scholar

Crossref

El Moussaoui

,

T.

, &

El Moussaoui

,

A. E.

(

2026b

).

AI-driven dynamic timetable optimization for the Casablanca-Mohammedia-Rabat commuter railway

.

Railway Sciences

,

1

–

16

. doi:

https://doi.org/10.1108/rs-02-2026-0011

.

Google Scholar

El Moussaoui

,

A. E.

, &

El Moussaoui

,

T.

(

2026c

).

Rail freight development and modal shift from road transport: An empirical analysis of economic, energy and environmental perceptions in Morocco

.

Railway Sciences

,

5

(

2

),

225

–

244

. doi:

https://doi.org/10.1108/rs-01-2026-0001

.

Google Scholar

Crossref

El Moussaoui

,

A. E.

, &

El Moussaoui

,

T.

(

2026d

).

Forecasting freight railway traffic in Morocco: A data-driven machine learning approach for efficient logistics planning

.

Railway Sciences

,

1

–

21

. doi:

https://doi.org/10.1108/rs-03-2026-0015

.

Google Scholar

El Moussaoui

,

A. E.

,

El Moussaoui

,

T.

,

Benbba

,

B.

,

Chakir

,

L.

,

Jaegler

,

A.

, &

El Andaloussi

,

Z.

(

2025a

).

Sustainable effects of information sharing between distribution logistics actors: A qualitative case study

.

Environment, Development and Sustainability

,

27

(

5

),

12305

–

12323

. doi:

https://doi.org/10.1007/s10668-023-04387-3

.

Google Scholar

Crossref

El Moussaoui

,

T.

,

Loqman

,

C.

, &

El Moussaoui

,

A. E.

(

2025b

).

Adoption of artificial intelligence in smart cities: Literature review and bibliometric analysis

.

Generative AI for Cybersecurity and Privacy

,

103

–

125

.

Google Scholar

El Moussaoui

,

T.

,

Chakir

,

L.

, &

El Moussaoui

,

A. E.

(

2025c

). Exploring information technology (IT) acceptance among supply chain stakeholders in Morocco's pharmaceutical industry. In

Ecological and Human Dimensions of AI-Based Supply Chain

(pp.

245

–

270

).

IGI Global Scientific Publishing

.

Google Scholar

Crossref

Huang

,

P.

,

Wen

,

C.

,

Fu

,

L.

,

Peng

,

Q.

, &

Tang

,

Y.

(

2020

).

A deep learning approach for multi-attribute data: A study of train delay prediction in railway systems

.

Information Sciences

,

516

,

234

–

253

. doi:

https://doi.org/10.1016/j.ins.2019.12.053

.

Google Scholar

Crossref

Kim

,

B.

,

Natarajan

,

Y.

,

Preethaa

,

K. S.

,

Song

,

S.

,

An

,

J.

, &

Mohan

,

S.

(

2024

).

Real-time assessment of surface cracks in concrete structures using integrated deep neural networks with autonomous unmanned aerial vehicle

.

Engineering Applications of Artificial Intelligence

,

129

, 107537. doi:

https://doi.org/10.1016/j.engappai.2023.107537

.

Google Scholar

Crossref

Lidén

,

T.

(

2015

).

Railway infrastructure maintenance-a survey of planning problems and conducted research

.

Transportation Research Procedia

,

10

,

574

–

583

. doi:

https://doi.org/10.1016/j.trpro.2015.09.011

.

Google Scholar

Crossref

Olivier

,

B.

,

Guo

,

F.

,

Qian

,

Y.

, &

Connolly

,

D. P.

(

2025

).

A review of computer vision for railways

.

IEEE Transactions on Intelligent Transportation Systems

,

26

(

7

),

11034

–

11065

.

Google Scholar

Crossref

Pandey

,

S.

,

Chaudhary

,

M.

, &

Tóth

,

Z.

(

2025

).

An investigation on real-time insights: Enhancing process control with IoT-enabled sensor networks

.

Discover Internet of Things

,

5

(

1

),

29

. doi:

https://doi.org/10.1007/s43926-025-00124-6

.

Google Scholar

Crossref

Rodríguez-Hernández

,

M.

,

Crespo-Márquez

,

A.

,

Sánchez-Herguedas

,

A.

, &

González-Prida

,

V.

(

2025

).

Digitalization as an enabler in railway maintenance: A review from ‘the international union of railways asset management framework’ perspective

.

Infrastructures

,

10

(

4

),

96

. doi:

https://doi.org/10.3390/infrastructures10040096

.

Google Scholar

Crossref

Sarp

,

S.

,

Kuzlu

,

M.

,

Jovanovic

,

V.

,

Polat

,

Z.

, &

Guler

,

O.

(

2024

).

Digitalization of railway transportation through AI-powered services: Digital twin trains

.

European Transport Research Review

,

16

(

1

),

58

. doi:

https://doi.org/10.1186/s12544-024-00679-5

.

Google Scholar

Crossref

Shah

,

R.

,

Mittal

,

V.

, &

Lotwin

,

M.

(

2025

).

Recent advances in vibration analysis for predictive maintenance of modern automotive powertrains

.

Vibration

,

8

(

4

),

68

. doi:

https://doi.org/10.3390/vibration8040068

.

Google Scholar

Crossref

Shi

,

H.

,

Wang

,

J.

,

Wu

,

P.

,

Song

,

C.

, &

Teng

,

W.

(

2018

).

Field measurements of the evolution of wheel wear and vehicle dynamics for high-speed trains

.

Vehicle System Dynamics

,

56

(

8

),

1187

–

1206

. doi:

https://doi.org/10.1080/00423114.2017.1406963

.

Google Scholar

Crossref

Sivakumar

,

M.

,

Maranco

,

M.

, &

Krishnaraj

,

N.

(

2024

). Data analytics and artificial intelligence for predictive maintenance in manufacturing. In

Data Analytics and Artificial Intelligence for Predictive Maintenance in Smart Manufacturing

(pp.

29

–

55

).

CRC Press

.

Google Scholar

Waheed

,

W.

, &

Xu

,

Q.

(

2025

).

Data driven long short‐term load prediction: LSTM‐RNN, XG‐boost and conventional models in comparative analysis

.

Computational Intelligence

,

41

(

3

), e70084. doi:

https://doi.org/10.1111/coin.70084

.

Google Scholar

Crossref

Wang

,

D.

,

Sun

,

Y.

,

Yu

,

L.

,

Shen

,

K.

,

Li

,

J.

, &

Wu

,

X.

(

2025

).

Research on the optimization method of inventory management of important spare parts of intercity railway

.

PLoS One

,

20

(

7

), e0327852. doi:

https://doi.org/10.1371/journal.pone.0327852

.

Google Scholar

Crossref

PubMed

Zhao

,

F.

,

Tian

,

Z.

,

Liang

,

X.

, &

Xie

,

M.

(

2018

).

An integrated prognostics method for failure time prediction of gears subject to the surface wear failure mode

.

IEEE Transactions on Reliability

,

67

(

1

),

316

–

327

. doi:

https://doi.org/10.1109/tr.2017.2781147

.

Google Scholar

Crossref

Zhu

,

Y.

,

Wang

,

W.

,

Lewis

,

R.

,

Yan

,

W.

,

Lewis

,

S. R.

, &

Ding

,

H.

(

2019

).

A review on wear between railway wheels and rails under environmental conditions

.

Journal of Tribology

,

141

(

12

), 120801. doi:

https://doi.org/10.1115/1.4044464

.

Google Scholar

Crossref

Zhu

,

L.

,

Chen

,

C.

,

Wang

,

H.

,

Yu

,

F. R.

, &

Tang

,

T.

(

2023

).

Machine learning in urban rail transit systems: A survey

.

IEEE Transactions on Intelligent Transportation Systems

,

25

(

3

),

2182

–

2207

. doi:

https://doi.org/10.1109/tits.2023.3319135

.

Google Scholar

Crossref

2026

Taoufiq El Moussaoui and Alaa Eddine El Moussaoui

Published in Railway Sciences. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence.

Implementing predictive maintenance with AI for Moroccan railway freight networks: a case study on rails and critical components

1. Introduction

2. Literature review

2.1 Predictive maintenance in railway systems

2.2 AI and IoT applications for rail monitoring

2.3 Research gaps

3. Research methodology

3.1 Research design and data architecture

3.2 Predictive modeling strategy and system framework

3.3 Validation protocol and implementation framework

4. Results and discussion

4.1 Performance of the structured predictive model

4.2 Temporal modeling and degradation acceleration dynamics

4.3 Multimodal visual model (CNN) performance

4.4 Multimodal fusion and ensemble prediction

4.5 Comparative evaluation and probabilistic calibration

4.6 Operational efficiency, economic impact, and system robustness

5. Conclusion, implications, limitations and future research

Ethics statement

References

Email Alerts

Cited By

Implementing predictive maintenance with AI for Moroccan railway freight networks: a case study on rails and critical components

1. Introduction

2. Literature review

2.1 Predictive maintenance in railway systems

2.2 AI and IoT applications for rail monitoring

2.3 Research gaps

3. Research methodology

3.1 Research design and data architecture

3.2 Predictive modeling strategy and system framework

3.3 Validation protocol and implementation framework

4. Results and discussion

4.1 Performance of the structured predictive model

4.2 Temporal modeling and degradation acceleration dynamics

4.3 Multimodal visual model (CNN) performance

4.4 Multimodal fusion and ensemble prediction

4.5 Comparative evaluation and probabilistic calibration

4.6 Operational efficiency, economic impact, and system robustness

5. Conclusion, implications, limitations and future research

Ethics statement

References

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable