Physics-informed and explainable artificial intelligence for robust fault diagnosis in engineering systems

Karkadakattil, Aswin

doi:10.1108/JIMSE-12-2025-0035

Purpose

Accurate fault diagnosis is essential for ensuring the safety and reliability of engineering systems. However, many artificial intelligence (AI)–based diagnostic models rely on purely data-driven black-box approaches that may produce predictions inconsistent with established physical behaviour, limiting confidence in safety-sensitive applications. This study proposes a unified framework that integrates predictive accuracy, physical consistency and interpretability within a single learning formulation.

Design/methodology/approach

A physics-informed and explainable neural network framework is developed for vibration-based fault severity estimation. Physically meaningful statistical features such as RMS energy, peak-to-peak amplitude and kurtosis, are used as model inputs. Physical knowledge is embedded through a monotonicity-based loss constraint that penalises violations of the expected relationship between vibration energy and fault severity. A combined objective balances prediction error and physics-based regularisation. Global permutation feature importance is applied to assess alignment between learned feature relevance and physical expectations. The framework is evaluated using physics-inspired synthetic vibration data under controlled conditions, supported by cross-validation and quantitative monotonicity assessment.

Findings

Compared with an unconstrained neural network, the physics-informed model achieves lower prediction error, smoother convergence behaviour and substantially reduced monotonicity violations. The embedded constraint acts as a structural regulariser while preserving model flexibility. Explainability analysis indicates closer alignment between feature relevance and physically meaningful vibration indicators.

Originality/value

The study presents a reproducible framework that integrates loss-level physics constraints with global explainability analysis for fault severity diagnosis. By embedding physical admissibility directly into optimisation, the approach contributes toward the development of more transparent and physically grounded AI systems for engineering diagnostics.

A conceptual flow diagram shows physics informed neural network processing vibration signals to produce interpretable fault.

1. Introduction

1.1 Background

Reliable fault diagnosis is essential for maintaining the safety, efficiency, and operational continuity of engineering systems across manufacturing, transportation, and energy infrastructure. In vibration-based condition monitoring, changes in signal characteristics often serve as early indicators of component degradation, making data-driven diagnostic approaches particularly attractive for real-time health assessment. Over the past decade, advances in artificial intelligence (AI), especially machine learning and deep learning, have significantly improved fault classification and severity estimation by learning complex nonlinear relationships directly from sensor data (Gao et al., 2015; Zhao et al., 2019; Lei et al., 2020; Tama et al., 2023; Matania et al., 2024). Despite their predictive capability, many AI-based fault diagnosis models remain purely data-driven and function as black boxes. These models typically optimise statistical accuracy without explicitly incorporating the physical principles governing system behaviour. As a result, they may generate predictions that contradict known engineering relationships for example, non-monotonic trends between vibration energy and fault severity particularly under noisy, limited, or extrapolative data conditions. In applications where diagnostic outputs inform maintenance scheduling or operational decisions, such physically inconsistent behaviour may reduce practitioner confidence and hinder broader acceptance of AI-assisted monitoring systems. To address these limitations, increasing attention has been directed toward physics-informed and explainable AI paradigms, which seek to incorporate domain knowledge into learning algorithms while improving transparency in decision-making (Raissi et al., 2019; Karniadakis et al., 2021; Shen et al., 2021; Ni et al., 2023; Adadi and Berrada, 2018; Guidotti et al., 2019; Ahmed et al., 2022). By aligning predictions with established engineering principles and providing interpretable insights into model behaviour, these paradigms offer a structured approach to enhancing the reliability and credibility of AI-driven diagnostics, particularly in contexts where system integrity is critical (Kurd et al., 2007; Seshia et al., 2018; Wang and Chung, 2022).

1.2 Related work

Traditional vibration-based fault diagnosis relies primarily on signal processing techniques, including time-domain statistics, frequency-domain analysis, and higher-order features, to detect abnormal operating conditions. These approaches are physically interpretable and well established; however, their performance often depends on expert-defined thresholds and handcrafted rules, limiting scalability and adaptability in complex or variable operating environments (Gao et al., 2015; Zhang et al., 2017a). With the advancement of machine learning and deep learning, data-driven models have become widely adopted due to their ability to automatically extract discriminative representations from raw or processed vibration signals and achieve high predictive accuracy (Zhao et al., 2019; Lei et al., 2020; Villalonga et al., 2021; Tama et al., 2023; Wang et al., 2025). More recently, digital twin-driven and self-supervised learning paradigms have emerged to address label scarcity, domain shift, and representation robustness challenges in rotating machinery diagnostics (Zhang et al., 2025; Xu et al., 2025; Shi et al., 2025; Zhao et al., 2025). Time–frequency residual self-supervised architectures have further enhanced feature extraction capability for non-stationary vibration signals (Fan et al., 2025). Nevertheless, such models frequently lack explicit physical interpretability and may exhibit unstable or unphysical behaviour when exposed to distribution shifts or unseen operating conditions. Physics-informed machine learning has emerged as an intermediate paradigm that incorporates physical constraints or governing relationships into data-driven models, thereby improving structural consistency and, in some cases, generalisation performance (Raissi et al., 2019; Karniadakis et al., 2021; Shen et al., 2021; Ni et al., 2023; Lu et al., 2023; Jia et al., 2024). Many existing approaches embed physics through differential equations, architecture-level constraints, or residual learning mechanisms. While effective in specific contexts, such methods often focus on enforcing physical structure without systematically evaluating how learned feature relevance aligns with physical expectations. In parallel, explainable artificial intelligence (XAI) techniques have been introduced to enhance transparency by attributing model predictions to input features or decision pathways (Adadi and Berrada, 2018; Samek et al., 2019; Guidotti et al., 2019; Ahmed et al., 2022). However, explainability alone does not guarantee physical credibility. A model may provide coherent explanations while still relying on statistically dominant but physically inconsistent relationships. Consequently, although physics-informed learning and explainability have advanced independently, their integrated use for simultaneously enforcing physically admissible behaviour during optimisation and analysing feature relevance remains relatively underexplored in vibration-based fault severity estimation. Addressing this gap forms the central motivation of the present study.

1.3 Research gap and contributions

The preceding discussion highlights a methodological gap between high-accuracy black-box diagnostic models and the increasing demand for physically consistent and interpretable AI systems in safety-critical engineering applications. While deep learning approaches achieve strong predictive performance, they typically optimise statistical error without enforcing physically admissible behaviour. Conversely, physics-informed methods often embed governing relationships within model architectures or residual formulations but do not systematically evaluate whether the resulting feature utilisation aligns with physically meaningful indicators. Explainable AI techniques improve transparency but operate post hoc and do not constrain model behaviour during optimisation. As a result, physical consistency and interpretability are frequently addressed independently rather than within a unified learning framework. To bridge this gap, this study proposes a unified physics-informed and explainable artificial intelligence framework for vibration-based fault severity diagnosis. In contrast to architecture-level physics embedding approaches, such as PIResNet-based formulations (Ni et al., 2023), the proposed method enforces a physically motivated monotonic fault–energy relationship directly at the loss-function level through an explicit optimisation constraint. This loss-level formulation preserves architectural simplicity while structurally restricting the admissible prediction space during training. Furthermore, permutation-based explainability analysis is systematically integrated with the constrained learning formulation to evaluate whether feature importance patterns shift toward physically meaningful vibration indicators. This dual integration enables simultaneous enforcement of physical admissibility and quantitative assessment of feature-level alignment, rather than relying on interpretation alone.

The key contributions of this study are summarised as follows:

A reproducible physics-inspired synthetic vibration data framework that models fault severity through controlled amplitude modulation under realistic noise conditions, enabling systematic analysis of monotonic fault–energy behaviour.
A loss-level physics-informed neural network formulation that explicitly enforces monotonic ordering between vibration energy and predicted fault severity, improving behavioural consistency without modifying network architecture.
A comparative evaluation strategy incorporating convergence analysis, monotonicity violation metrics, cross-validation stability assessment, and global permutation-based explainability to quantify both predictive performance and structural consistency.

Through this integrated formulation, the proposed framework advances the development of structurally constrained and interpretable AI-based fault diagnosis. By embedding physically motivated ordering constraints directly within optimisation while retaining transparent feature-level analysis, the approach provides a coherent foundation for developing reliable vibration-based severity estimation models in engineering systems.

2. Methodology

2.1 Physics-inspired synthetic data generation

In vibration-based fault diagnosis, fault progression is typically manifested through systematic changes in signal amplitude, energy content, and impulsive characteristics resulting from material degradation, surface defects, or structural looseness. To rigorously evaluate the proposed physics-informed and explainable artificial intelligence framework under controlled and interpretable conditions, a physics-inspired synthetic vibration dataset is constructed. The objective of this formulation is not to replicate a specific industrial machine configuration, but to reproduce fundamental and broadly valid vibration–fault relationships observed across rotating mechanical systems.

The synthetic vibration signal is defined as:

x (t) = A_{k} \sin (2 π f_{0} t) + ϵ (t),

where $A_{k}$ denotes the amplitude scaling factor associated with fault severity level $k$ ⁠, $f_{0}$ represents the base excitation frequency, and $ϵ (t) \sim N (0, σ^{2})$ is additive Gaussian noise introduced to emulate measurement uncertainty and environmental disturbances.

Fault severity is incorporated through controlled amplitude modulation, such that increasing severity levels correspond to progressively larger values of $A_{k}$ ⁠. This modelling assumption is physically grounded: structural defects, crack propagation, and surface damage typically increase dynamic response magnitude, leading to higher vibration intensity and elevated signal energy. By maintaining a constant excitation frequency $f_{0}$ across all operating conditions, the formulation isolates severity-driven energy evolution and avoids confounding influences associated with speed variation or frequency modulation. Additive stochastic noise is included to prevent overly idealised signal behaviour and to emulate realistic sensor variability. The noise level is selected to ensure sufficient overlap between severity classes while preserving the expected monotonic energy trend, thereby creating a meaningful learning task rather than a trivially separable dataset. The dataset comprises three operating conditions healthy, incipient fault, and severe fault each associated with a distinct amplitude scaling factor $A_{k}$ ⁠. For each condition, multiple independent signal realisations are generated to introduce statistical variability and support robust model training. All signal-generation parameters, including base frequency, sampling rate, signal duration, number of samples per class, and noise variance, are explicitly reported in Appendix A to ensure full reproducibility. The use of physics-inspired synthetic data represents a deliberate methodological choice. By isolating the fundamental monotonic relationship between vibration energy and fault severity, the synthetic framework enables systematic validation of the proposed physics-informed constraint without confounding influences from machine-specific dynamics, load variations, or sensor-dependent artefacts. While real-world vibration signals may exhibit non-stationarity and interacting fault modes, the present formulation encodes a broadly valid physical prior namely, that increasing damage severity induces increasing vibration energy thereby providing a transparent and general foundation for evaluating physically consistent learning behaviour. As illustrated in Figure 1, fault progression is characterised by a systematic increase in vibration amplitude at a constant fundamental frequency, resulting in a corresponding monotonic rise in RMS energy while retaining realistic stochastic variability. The overall framework of the proposed physics-informed and explainable fault diagnosis methodology is presented in Figure 2.

Figure 1

A multi-line graph shows synthetic vibration signals for Healthy, Incipient Fault, and Severe Fault over time.

View large Download slide

The line graph is titled “Synthetic Vibration Signals”. The horizontal axis is labeled “Time (s)” and ranges from 0.0 to 1.0 in increments of 0.2 units. The vertical axis is labeled “Amplitude” and ranges from negative 3 to 3 in increments of 1 unit. A legend in the lower left identifies three lines: “Healthy”, “Incipient Fault”, and “Severe Fault”. The “Healthy” line oscillates rapidly with a consistent sinusoidal pattern, with amplitudes approximately between negative 1.5 and 1.5 across the entire time range from 0.0 to 1.0. The waveform shows regular frequency and smooth peaks and troughs without distortion. The “Incipient Fault” line also oscillates sinusoidally but with slightly higher amplitude variation, ranging approximately from negative 2 to 2. It exhibits mild irregularities compared to the healthy signal, with slightly uneven peaks and small fluctuations superimposed on the waveform. The “Severe Fault” line shows the largest amplitude and most irregular behavior, oscillating approximately between negative 3.5 and 3.5. The waveform displays noticeable distortion, sharper peaks, and uneven spacing, indicating higher vibration intensity and instability over time. Note: All numerical data values are approximated.

Physics-inspired synthetic vibration signals corresponding to healthy, incipient fault, and severe fault conditions. Fault progression is represented by a systematic increase in signal amplitude at a constant fundamental frequency, reflecting the physically consistent rise in vibration energy associated with increasing damage severity, while additive noise emulates realistic measurement and environmental disturbances

Figure 2

A flow diagram shows a physics informed fault diagnosis framework from data generation to feature importance evaluation.

View large Download slide

The flow diagram is arranged from left to right, showing a sequential framework with branching and merging arrows. On the far left, a tall rounded rectangle labeled “Physics Inspired Synthetic Vibration Data (Healthy vertical line Incipient Fault vertical line Severe Fault)” has a solid rightward arrow running to a rounded rectangle labeled “Statistical Feature Extraction (R M S Energy, Peak to Peak Amplitude, Kurtosis)”. A solid rightward arrow runs to a small rectangle labeled “Train or Test Split”. From this box, a solid rightward arrow runs upward to a rounded rectangle labeled “Black Box Neural Network (M S E Loss Only)”. Simultaneously, a solid downward arrow runs from “Train or Test Split” to a rounded rectangle labeled “Physics Informed Neural Network (M S E plus lambda Monotonicity Constraint)”. To the left of this lower box, a rounded rectangle labeled “Monotonic Fault Energy Regularization (lambda weighted Loss Term)” has a solid rightward arrow running into the “Physics Informed Neural Network”. From both neural network boxes, solid rightward arrows run and merge into a rounded rectangle labeled “Performance Evaluation (M S E, M A E, R 2, Monotonicity Violation Rate, and Cross Validation Stability)”. A solid rightward arrow then runs to a final tall rounded rectangle labeled “Permutation Feature Importance (Black Box versus Physics Informed Comparison)”. At the bottom, a caption reads “Overall Framework of the Proposed Physics Informed and Explainable Fault Diagnosis Methodology”.

Overall framework of the proposed physics-informed and explainable fault diagnosis methodology. Physics-inspired synthetic vibration signals representing healthy, incipient fault, and severe fault conditions are first transformed into physically interpretable statistical features (RMS energy, peak-to-peak amplitude, and kurtosis). After train–test partitioning, both a conventional black-box neural network and a physics-informed neural network incorporating a λ-weighted monotonic fault–energy regularisation term are trained in parallel. Model performance is evaluated using regression metrics (MSE, MAE, R²), monotonicity violation rate, and cross-validation stability. Finally, permutation-based feature importance is applied to compare feature relevance and assess alignment between data-driven learning and embedded physical constraints

2.2 Feature extraction

Instead of relying directly on raw vibration signals, a set of physically meaningful statistical features is extracted to serve as inputs to the learning models. Feature-based representations are widely used in fault diagnosis because they effectively summarise key signal characteristics while preserving physical interpretability and reducing computational complexity. This approach also facilitates transparent reasoning and supports the integration of physics-informed constraints and explainability analysis. The root mean square (RMS) energy is employed to quantify the overall energy content of the vibration signal. RMS energy is directly linked to vibration intensity and is commonly used as an indicator of mechanical degradation, making it a suitable and physically intuitive feature for physics-informed modelling. The peak-to-peak amplitude captures the difference between the maximum and minimum signal values and reflects the severity of transient impacts and extreme oscillatory behaviours. This feature is particularly sensitive to localised faults, such as cracks or surface defects, which often generate high-amplitude responses. In addition, the kurtosis statistic is included to characterise signal impulsiveness and deviations from Gaussian behaviours. Elevated kurtosis values are typically associated with intermittent shock events and early-stage fault development, providing complementary information beyond energy-based measures. Table 1 summarises the extracted features along with their mathematical definitions and physical relevance to fault diagnosis. By selecting features with clear physical meaning, the proposed framework ensures that both the learning process and the subsequent explainability analysis remain firmly grounded in engineering principles, thereby enhancing interpretability and trust in the diagnostic outcomes.

Table 1

Extracted features, mathematical definitions, and physical interpretation

Feature	Mathematical definition	Physical relevance
RMS Energy	$\sqrt{\frac{1}{T} \int_{0}^{T} s {(t)}^{2} d t}$	Overall vibration energy and intensity
Peak-to-Peak	$\max (s) - \min (s)$	Severity of impacts and extreme oscillations
Kurtosis	$\frac{E [{(s - μ)}^{4}]}{σ^{4}}$	Signal impulsiveness and shock-related behaviours

Feature	Mathematical definition	Physical relevance
RMS Energy	$\sqrt{\frac{1}{T} \int_{0}^{T} s {(t)}^{2} d t}$	Overall vibration energy and intensity
Peak-to-Peak	$\max (s) - \min (s)$	Severity of impacts and extreme oscillations
Kurtosis	$\frac{E [{(s - μ)}^{4}]}{σ^{4}}$	Signal impulsiveness and shock-related behaviours

2.3 Neural network models

2.3.1 Black-box neural network

As a baseline for comparison, a conventional feedforward neural network is implemented to represent a purely data-driven fault diagnosis model. The network consists of fully connected layers with nonlinear activation functions, enabling it to approximate nonlinear relationships between the extracted vibration features and the corresponding fault severity.

Let $x = {[RMS, Peak ‐ to ‐ Peak, Kurtosis]}^{⊤}$ denote the input feature vector and $\hat{y} = f_{θ} (x)$ represent the predicted fault severity, where $f_{θ} (\cdot)$ denotes the neural network parameterised by weights $θ$ ⁠. The output layer produces a continuous scalar estimate, allowing regression-based evaluation.

The model is trained by minimising the mean squared error (MSE):

L_{MSE} = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2},

where $y_{i}$ and ${\hat{y}}_{i}$ denote the true and predicted fault severity values for sample $i$ ⁠, and $N$ is the number of training samples.

Parameter optimisation is performed using a gradient-based learning algorithm over multiple epochs until stable convergence is achieved. This unconstrained neural network serves as a reference model for assessing the behaviour of purely data-driven learning, particularly with respect to physical consistency and generalisation.

A deliberately compact architecture is adopted to ensure that any observed performance differences arise from the learning formulation rather than architectural complexity. This design choice enables a clear and controlled comparison with the physics-informed model.

2.3.2 Physics-informed neural network

To incorporate physical knowledge into the learning process, the neural network is augmented with a physics-based constraint that promotes a monotonic relationship between vibration energy and predicted fault severity. In vibration-based condition monitoring, increasing fault severity is generally associated with increasing vibration energy. This physical expectation is embedded directly into the optimisation objective by penalising prediction pairs that violate the prescribed ordering between RMS energy and predicted severity.

Let $x_{1, i}$ denote the RMS energy of sample $i$ ⁠, and ${\hat{y}}_{i}$ denote the predicted severity. For ordered sample pairs $(i, j)$ satisfying:

x_{1, i} > x_{1, j},

the corresponding predictions are required to satisfy:

{\hat{y}}_{i} \geq {\hat{y}}_{j} .

Any violation of this monotonic ordering is penalised through the physics-based loss term:

L_{physics} = \frac{1}{∣ Ω ∣} \sum_{(i, j) \in Ω} \max (0, {\hat{y}}_{j} - {\hat{y}}_{i}),

were

Ω = {(i, j) ∣ x_{1, i} > x_{1, j}}

denotes the set of valid ordered sample pairs within the batch, and $∣ Ω ∣$ represents the number of such pairs.

The penalty is strictly positive only when the predicted ordering contradicts the RMS ordering, i.e., when ${\hat{y}}_{i} < {\hat{y}}_{j}$ despite $x_{1, i} > x_{1, j}$ ⁠. When the monotonic condition is satisfied, the contribution of the pair to $L_{physics}$ is zero. Importantly, the constraint is evaluated on predicted outputs $\hat{y}$ ⁠, not on ground-truth labels. It therefore directly restricts the learned mapping during optimisation rather than merely reflecting the monotonic structure of the synthetic data. To ensure computational tractability, the constraint is evaluated within each training mini-batch rather than across the entire dataset. Equal-RMS pairs are excluded from the constraint formulation to avoid ambiguity in ordering.

The total training objective is defined as:

L_{total} = L_{MSE} + λ L_{physics},

where $λ \geq 0$ is a weighting coefficient controlling the trade-off between predictive accuracy and physical consistency. The network architecture and optimisation procedure remain identical to those of the black-box model. The sole modification is the inclusion of the physics-based regularisation term during training. By embedding the monotonicity constraint directly into the objective function, the model is encouraged to learn predictive mappings that respect the prescribed physical ordering while retaining flexibility to model nonlinear relationships among features. The additional computation is confined to the training phase and introduces no inference-time overhead.

2.3.3 Role of physics constraints beyond post-hoc explainability

Explainable artificial intelligence techniques provide valuable insight into feature relevance and decision pathways; however, interpretability alone does not guarantee physically meaningful behaviour. A model may provide transparent explanations while still producing predictions that contradict established engineering relationships. In the unconstrained black-box model, feature importance rankings reflect statistical influence on prediction accuracy, which may not align with physically expected behaviour. In contrast, the proposed physics-informed formulation enforces physical admissibility directly during optimisation. This ensures that predictions remain consistent with the monotonic fault–energy relationship while still permitting flexible nonlinear modelling. Physics-informed learning and explainability therefore serve complementary roles: explainability enhances transparency, while physics-based constraints ensure credibility. Their integration enables diagnostic models that are not only interpretable but also physically grounded.

2.3.4 Evaluation protocol

To ensure transparency and reproducibility in performance assessment, a structured evaluation protocol is adopted. The complete dataset is first randomly divided into training and testing subsets using an 80–20 split. The testing set is held out entirely during training and is used exclusively for final performance evaluation. Within the training subset, five-fold cross-validation is performed to assess model stability under varying training–validation partitions. In each fold, four subsets are used for optimisation and one subset is used for validation. The reported cross-validation results reflect variability in validation performance and are not used for hyperparameter tuning. All regression metrics presented in Section 3.3 are computed on the held-out test set to ensure unbiased comparison between the black-box and physics-informed models.

2.3.4.1 Implementation and reproducibility note

Detailed architectural configurations, hyperparameter settings, and training procedures for both models are provided in Appendix A to ensure methodological transparency and full reproducibility.

3. Results and discussion

3.1 Feature–fault relationship analysis

Prior to model training, the relationship between vibration energy and fault severity is examined to establish a physically grounded reference for subsequent modelling. Figure 3 presents the distribution of RMS vibration energy for the healthy, incipient fault, and severe fault conditions. A clear and systematic increase in RMS energy with increasing fault severity is observed. The median values rise progressively across severity levels, and the interquartile ranges exhibit limited overlap. This trend reflects the expected physical behaviour whereby increasing structural degradation leads to amplified dynamic response and higher vibration energy. Such monotonic behaviour is consistent with established vibration theory and confirms that the synthetic data formulation preserves a physically meaningful severity progression. The relatively compact dispersion within each class further indicates statistical consistency in the data generation process while maintaining distinguishable separation between fault levels. This controlled yet physically coherent structure supports the use of RMS energy as a physically motivated reference variable for modelling severity progression. Importantly, the observed monotonic trend provides the empirical basis for introducing the monotonicity constraint within the physics-informed learning formulation. By verifying that RMS energy increases systematically with fault severity in the dataset, the subsequent enforcement of a monotonic fault–energy relationship during training is both physically justified and statistically aligned with the constructed data. These findings therefore establish a principled foundation for incorporating RMS energy as the ordering variable in the proposed physics-informed framework while maintaining methodological transparency.

Figure 3

A scatter plot shows R M S energy increasing from healthy to mild to severe fault conditions.

View large Download slide

The scatter plot titled “R M S Energy versus Fault Severity” is drawn on a coordinate plane. The horizontal axis lists three categories from left to right: “Healthy”, “Mild”, and “Severe”. The vertical axis is labeled “R M S Energy” and ranges from 0.8 to 2.2 in increments of 0.2 units. Three clustered points appear, one above each category, with small horizontal line markers indicating central tendency. For “Healthy”, the value is 0.73. For “Mild”, the value is 1.43. For “Severe”, the value is 2.13. The values increase steadily from left to right, showing a monotonic rise in R M S energy with increasing fault severity. Note: All numerical data values are approximated.

Distribution of RMS vibration energy across healthy, mild (incipient fault), and severe fault conditions. The results demonstrate a clear monotonic increase in RMS energy with increasing fault severity, reflecting the physically consistent amplification of vibration intensity associated with progressive damage and validating RMS energy as a meaningful indicator for physics-informed fault diagnosis

3.2 Model training behaviour

Figure 4 compares the training loss convergence of the black-box neural network and the physics-informed neural network. The black-box model exhibits faster initial convergence, characterised by rapid reduction of the data-driven loss during the early training epochs. This behaviour is expected, as optimisation is driven solely by minimisation of prediction error without additional structural constraints. In contrast, the physics-informed model demonstrates a more gradual and smoother convergence trajectory. The inclusion of the physics-based penalty introduces an additional objective term that restricts the admissible solution space to regions consistent with the prescribed monotonic fault–energy relationship. As a result, optimisation proceeds under a combined objective that balances predictive accuracy with physical consistency. Such moderated convergence behaviour is characteristic of constrained learning formulations, where optimisation is guided by multiple objectives rather than error minimisation alone. Rather than aggressively minimising training loss, the physics-informed model converges toward solutions that jointly satisfy data fidelity and the imposed ordering constraint. Both models achieve stable convergence without oscillatory or divergent behaviour, indicating well-conditioned optimisation dynamics under the selected training configuration. The smoother loss evolution observed in the physics-informed formulation reflects the regularising influence of the embedded physical prior on the optimisation landscape. By restricting the search space to physically admissible regions, the constraint reduces the likelihood of learning solutions that violate the expected output ordering while preserving nonlinear representational capacity. Although comprehensive assessment of generalisation requires evaluation across additional datasets and operating conditions, the observed training behaviour is consistent with dynamics typically associated with structured regularisation in constrained learning settings.

Figure 4

A line graph shows training loss convergence for Black-box N N and Physics-Informed N N across epochs.

View large Download slide

The line graph is titled “Training Loss Convergence”. The horizontal axis is labeled “Epoch” and ranges from 0 to 40 in increments of 5 units. The vertical axis is labeled “Training Loss (M S E)” and ranges from 0.000 to 0.175 in increments of 0.025. A legend in the upper right identifies two lines: “Black-box N N” and “Physics-Informed N N”. The “Black-box N N” line starts at (0, 0.100), drops sharply to around (1, 0.040), then decreases steadily through (2, 0.025), (3, 0.015), (4, 0.010), and (5, 0.006). It continues declining toward near zero by (8, 0.002), reaching (10, 0.001), and then remains nearly flat at approximately 0.000 to 0.001 from epochs 10 through 40. The “Physics-Informed N N” line starts higher at (0, 0.190), drops rapidly to around (1, 0.100), then to (2, 0.050), and further to (3, 0.032). It slightly increases to around (5, 0.050) and peaks near (6, 0.052), then declines again to (8, 0.040), (10, 0.028), and (12, 0.025). From there, it fluctuates mildly with small oscillations: around (15, 0.030), (18, 0.030), (20, 0.025), (25, 0.022), (30, 0.024), (35, 0.022), and ends near (39, 0.025). Note: All numerical data values are approximated.

Training loss convergence of the black-box neural network and the physics-informed neural network. The black-box model exhibits faster initial loss minimisation due to unconstrained optimisation, whereas the physics-informed model converges more gradually with a smoother trajectory as a result of the embedded physics-based constraint. Both models achieve stable convergence, highlighting reliable training dynamics while illustrating the trade-off between convergence speed and physical consistency

It is important to emphasise that a lower training loss does not necessarily imply better predictive performance. The black-box neural network aggressively minimises the data-driven loss and therefore achieves a near-zero training error, a behaviour that can indicate overfitting to the training data. In contrast, the physics-informed model incorporates an additional physics-based constraint, which intentionally limits over-optimisation of the training loss and acts as an explicit form of regularisation. As shown in Table 2, this regularisation improves generalisation, with the physics-informed model achieving lower test error and higher explanatory power despite converging to a higher training loss. This trade-off between training accuracy and generalisation performance is well aligned with established principles of constrained and regularised learning.

Table 2

Quantitative performance comparison between black-box and physics-informed models

Model	MSE	MAE	R²
Black-box neural network	0.021	0.114	0.92
Physics-informed neural network	0.016	0.096	0.95

3.3 Prediction accuracy comparison

The predictive performance of the two models is compared in Figure 5 using a scatter plot of true versus predicted fault severity values. The black-box model achieves reasonable overall accuracy; however, noticeable dispersion is observed, particularly at higher severity levels. While the numerical errors remain within acceptable limits, the increased scatter suggests less stable mapping between input features and predicted severity under unconstrained optimisation. In contrast, the physics-informed model produces predictions that are more tightly aligned with the ideal one-to-one relationship. The reduced dispersion indicates improved consistency in the learned mapping. This behaviour suggests that embedding the monotonic physics constraint acts as an effective regulariser, limiting the model's tendency to exploit incidental patterns in the training data and encouraging solutions that are both accurate and physically admissible. To complement the visual assessment, quantitative regression metrics are reported in Table 2 for both models, evaluated on the same held-out test dataset.

Figure 5

A scatter plot compares predicted vs true fault severity for Black-box A I and Physics-Informed A I.

View large Download slide

The scatter plot is titled “Model Prediction Comparison”. The horizontal axis is labeled “True Fault Severity” and ranges from 0.0 to 1.0 in increments of 0.2 units. The vertical axis is labeled “Predicted Fault Severity” and ranges from 0.0 to 1.0 in increments of 0.2 units. A legend in the upper left identifies two data series: “Black-box A I” and “Physics-Informed A I”. A dashed diagonal reference line runs from (0.0, 0.0) to (1.0, 1.0). For true fault severity 0.0, the “Black-box A I” points are clustered tightly near (0.0, 0.0) with values between negative 0.01 and 0.01. The “Physics-Informed A I” points are slightly more spread, ranging approximately from negative 0.03 to 0.07. For true fault severity 0.5, the “Black-box A I” points range approximately from (0.5, 0.42) to (0.5, 0.58), showing moderate spread around the diagonal. The “Physics-Informed A I” points range approximately from (0.5, 0.48) to (0.5, 0.59), showing tighter clustering closer to the reference line. For true fault severity 1.0, the “Black-box A I” points range approximately from (1.0, 0.98) to (1.0, 1.08), slightly exceeding the perfect prediction line at the upper end. The “Physics-Informed A I” points range approximately from (1.0, 0.88) to (1.0, 1.02), showing slightly wider spread below the diagonal. Note: All numerical data values are approximated.

Comparison of true versus predicted fault severity for the black-box and physics-informed models. The physics-informed model exhibits predictions more closely aligned with the ideal one-to-one relationship, with reduced scatter across severity levels, indicating improved physical consistency and regularisation compared to the unconstrained black-box model

The physics-informed model achieves lower mean squared error (MSE) and mean absolute error (MAE), along with improved coefficient of determination (R²), indicating enhanced predictive consistency and explanatory power. Importantly, these gains are obtained without increasing model complexity, as both architectures remain identical. The observed improvement therefore reflects the influence of the embedded physics constraint rather than architectural variation. Overall, the results demonstrate that enforcing physically meaningful structure within the optimisation objective can improve predictive stability while maintaining competitive accuracy, thereby strengthening the reliability of AI-driven fault severity estimation.

3.3.1 Sensitivity to physics-loss weight

To evaluate the influence of the physics-loss weighting coefficient $λ$ ⁠, additional experiments are conducted under varying constraint strengths. When $λ = 0$ ⁠, the formulation reduces to the unconstrained black-box model. As $λ$ increases, greater emphasis is placed on enforcing the monotonic fault–energy relationship during optimisation.

For robustness assessment, five-fold cross-validation is performed on the training subset only, using fixed random seeds to ensure reproducibility. For each value of $λ$ ⁠, the model is trained across the cross-validation folds, and final predictive performance is evaluated on the held-out test set. The reported metrics correspond to averaged test performance across folds. The resulting trends in predictive accuracy and physical consistency are summarised in Table 3.

Table 3

Sensitivity analysis with respect to physics-loss weight

$λ$	Test MSE	Monotonicity violation rate (%)
0 (Black-box)	0.021	3.2
0.05	0.018	0.8
0.10	0.016	0.2
0.20	0.017	0.1

The results indicate that moderate values of $λ$ substantially reduce monotonicity violations while maintaining comparable predictive accuracy. When $λ = 0$ ⁠, physically inconsistent predictions are observed, as reflected by a measurable violation rate. As the constraint strength increases, the violation rate decreases systematically, demonstrating the direct influence of the physics-based regularisation on output ordering. At higher values of $λ$ ⁠, a slight increase in regression error is observed, reflecting the expected trade-off between predictive flexibility and constraint enforcement. This behaviour aligns with principles of constrained optimisation, where stronger regularisation can restrict the model's ability to minimise data-driven loss fully. Overall, the formulation exhibits stable behaviour across a practical range of $λ$ values. While performance varies moderately with the weighting coefficient, the results suggest that intermediate constraint strengths provide a balanced compromise between predictive accuracy and physical consistency without requiring excessively fine-grained hyperparameter tuning.

3.3.2 Comparison with a simple monotonic baseline

To assess whether the proposed physics-informed neural network (PINN) is necessary beyond enforcing monotonicity alone, a simple monotonic baseline is introduced for comparison. Specifically, isotonic regression is applied using RMS energy as the sole input feature. Isotonic regression constructs a non-decreasing mapping between RMS energy and fault severity, thereby guaranteeing strict monotonicity without incorporating additional statistical features.

The isotonic model is trained on the same training data used for the neural networks and evaluated on the corresponding test set. Because isotonic regression enforces a non-decreasing function by construction, the monotonicity violation rate is zero by definition. However, its predictive capability is limited by the use of a single feature and the absence of multivariate nonlinear representation capacity.

Table 4 summarises the comparative results on the synthetic dataset.

Table 4

Comparison with simple monotonic baseline (synthetic dataset)

Model	Test MSE	R²	Monotonicity violation rate (%)
Isotonic regression (RMS only)	0.029	0.87	0.0
Black-box neural network	0.021	0.92	3.2
Physics-informed neural network	0.016	0.95	0.2

The isotonic baseline achieves perfect monotonicity but exhibits higher regression error compared to both neural network models. This result indicates that strict enforcement of monotonic behaviour alone does not guarantee optimal predictive accuracy. The black-box neural network achieves lower error than the isotonic model but produces physically inconsistent predictions, as reflected by the non-zero violation rate. In contrast, the physics-informed neural network achieves both reduced prediction error and substantially improved monotonic consistency.

These results demonstrate that the proposed formulation is not merely enforcing monotonicity in a trivial manner. Rather, it integrates structural physical constraints within a multivariate nonlinear learning framework, enabling improved predictive accuracy while maintaining near-monotonic behaviour. The comparison confirms that the physics-informed model provides a balanced compromise between accuracy and physical admissibility, whereas simpler monotonic models sacrifice representational flexibility and unconstrained models sacrifice physical consistency.

3.4 Physics consistency validation

Figure 6 illustrates the relationship between predicted fault severity and normalised RMS vibration energy to assess adherence to the expected monotonic fault–energy relationship. For visualisation purposes, RMS values are scaled to the interval [0,1] using min–max normalisation computed on the test dataset. For the physics-informed model, predicted severity increases progressively with increasing RMS energy across the evaluated range. The resulting trend is consistent with the monotonic ordering embedded within the training objective. In contrast, the unconstrained black-box model exhibits minor local inconsistencies, particularly in regions where feature distributions overlap. Although these deviations may not significantly affect aggregate regression metrics, they can produce physically implausible patterns, such as lower predicted severity at higher vibration energy levels. In safety-critical diagnostic settings, even limited violations of known physical behaviour may reduce confidence in model reliability. The preservation of monotonic ordering in the physics-informed model demonstrates that physical admissibility can be incorporated within flexible neural network architectures without modifying model structure or degrading predictive capability. Rather than restricting representational capacity, the embedded constraint guides optimisation toward solutions that remain consistent with established engineering expectations.

Figure 6

A scatter plot shows predicted fault severity versus normalized R M S energy with three clustered regions.

View large Download slide

The scatter plot is titled “Physics Consistency Validation”. The horizontal axis is labeled “Normalized R M S Energy” and ranges from negative 1.0 to 1.0 in increments of 0.5 units. The vertical axis is labeled “Predicted Fault Severity” and ranges from 0.0 to 1.0 in increments of 0.2 units. Three distinct clusters of data points are visible along the horizontal axis. The left cluster is centered near normalized R M S energy around negative 1.1. The predicted fault severity values in this region range from negative 0.01 to 0.04, with most points tightly concentrated near 0.0. The middle cluster is centered near normalized R M S energy around 0.0. The predicted fault severity values range from 0.35 to 0.58, with most points concentrated between 0.40 and 0.50. A single higher point appears near approximately (0.0, 0.67). The right cluster is centered near normalized R M S energy around 1.2. The predicted fault severity values range from 0.85 to 1.02, with most points concentrated between 0.88 and 0.98. Note: All numerical data values are approximated.

Physics consistency validation illustrating the relationship between predicted fault severity and normalised RMS vibration energy. The physics-informed model exhibits a clear monotonic dependence, with substantially reduced violations compared to the unconstrained black-box model. This behaviour indicates that the imposed physics-based constraint effectively promotes physically consistent fault–energy relationships during inference while preserving predictive flexibility

3.5 Quantitative monotonicity assessment

To complement the visual inspection, a quantitative monotonicity assessment is performed. For all ordered sample pairs $(i, j)$ satisfying:

x_{1, i} > x_{1, j},

where $x_{1}$ denotes RMS energy, a violation is recorded if the predicted severities fail to satisfy:

{\hat{y}}_{i} \geq {\hat{y}}_{j} .

The monotonicity violation rate is defined as:

Violation Rate = \frac{\sum_{(i, j) \in Ω} I ({\hat{y}}_{i} < {\hat{y}}_{j})}{∣ Ω ∣},

where $Ω = {(i, j) ∣ x_{1, i} > x_{1, j}}$ and $I (\cdot)$ is the indicator function.

As reported in Table 3, the unconstrained black-box model exhibits a small but measurable violation rate, indicating occasional inconsistencies with the expected fault–energy ordering. In contrast, the physics-informed formulation substantially reduces these violations across the test set. Although the baseline violation rate is limited, its reduction confirms that the embedded constraint meaningfully influences the learned mapping rather than merely reflecting dataset structure.

This quantitative evaluation reinforces the conclusion that incorporating physically motivated ordering constraints during optimisation enhances behavioural consistency and strengthens the credibility of AI-based fault severity estimation under the evaluated conditions.

3.6 Robustness analysis

The robustness of the proposed framework is evaluated using five-fold cross-validation, as illustrated in Figure 7. In this procedure, the dataset is partitioned into five subsets, with each fold serving once as a validation set while the remaining folds are used for training. This approach provides a systematic assessment of model stability under varying training–testing splits. The distribution of mean squared error (MSE) values across the validation folds exhibits a relatively narrow spread, indicating consistent predictive performance across different data partitions. No extreme deviations are observed, and the variance in error metrics remains limited across folds. This behaviour suggests that the model does not rely heavily on any specific subset of the data and maintains stable performance under moderate sampling variability. Such consistency is particularly relevant in fault diagnosis applications, where real-world data may be influenced by changing operating conditions, measurement noise, or limited data availability. A model that performs consistently across multiple validation partitions is more likely to generalise reliably when deployed in practical environments. Overall, the cross-validation results indicate that the proposed framework achieves stable and repeatable performance, supporting its robustness and practical applicability for vibration-based fault severity estimation.

Figure 7

A box plot shows M S E distribution for 5-Fold cross-validation performance.

View large Download slide

The box plot is titled “5-Fold Cross-Validation Performance”. The horizontal axis shows a single category labeled “5-Fold C V”. The vertical axis is labeled “M S E” and ranges from 0.0006 to 0.0014 in increments of 0.0002. The box represents the interquartile range, extending from 0.0006 at the lower quartile to about 0.0012 at the upper quartile. The median is marked by a horizontal line within the box at 0.00103. The lower whisker extends down to approximately 0.00052, while the upper whisker extends up to approximately 0.00143. Note: All numerical data values are approximated.

Five-fold cross-validation performance of the proposed framework expressed in terms of mean squared error (MSE). The narrow error distribution and absence of extreme outliers indicate stable generalisation and robust model behaviour across different training–testing partitions

3.7 Explainability analysis

To enhance transparency and interpretability, permutation-based feature importance analysis is conducted for both the black-box and physics-informed neural networks, as illustrated in Figure 8. Permutation importance quantifies the contribution of each input feature by measuring the increase in prediction error when that feature is randomly permuted while the remaining features are kept unchanged. The resulting scores provide a global assessment of feature influence on model performance. For the unconstrained black-box neural network, peak-to-peak amplitude and kurtosis exhibit higher importance scores than RMS energy. Although these features capture meaningful aspects of signal variability and impulsiveness, their dominance indicates that the model primarily leverages statistically informative relationships that minimise prediction error. The comparatively lower importance assigned to RMS energy highlights the distinction between statistical relevance and physical interpretability. In the constructed dataset, RMS energy serves as the most direct indicator of vibration intensity and severity progression; however, unconstrained optimisation does not inherently prioritise such physically interpretable features. Permutation importance is likewise evaluated for the physics-informed model. Compared to the black-box formulation, the physics-informed network demonstrates increased relative importance of RMS energy and a more balanced distribution of feature contributions. This shift indicates closer alignment between the learned predictive behaviour and the imposed monotonic fault–energy relationship. Although the physics constraint acts at the output level by regulating prediction ordering rather than explicitly modifying feature weights, it influences the admissible solution space during training. As a consequence, the learned mapping reflects stronger consistency with physically meaningful severity indicators. Permutation importance is selected due to its model-agnostic nature and suitability for feature-based learning frameworks. Unlike gradient-based or local explanation techniques such as SHAP or LIME, permutation importance provides a global evaluation of feature influence without relying on internal model structure or local approximations. This characteristic makes it particularly appropriate for assessing whether the integration of physical priors alters feature utilisation patterns in a statistically observable manner. Overall, the explainability analysis demonstrates that physics-informed learning and interpretability serve complementary functions. Explainability quantifies how input features influence predictions, while the embedded physics constraint guides output behaviour toward physically admissible relationships. Their integration enables diagnostic models that are not only predictive but also structurally aligned with established engineering expectations.

Figure 8

A vertical bar chart shows permutation feature importance for three features by increase in M S E.

View large Download slide

The vertical bar chart is titled “Permutation Feature Importance”. The horizontal axis lists three categories from left to right: “R M S Energy”, “Peak-to-Peak”, and “Kurtosis”. The vertical axis is labeled “Increase in M S E” and ranges from 0.00 to 0.08 in increments of 0.02. There are 3 bars in total. The data values are as follows: R M S Energy: 0.035, Peak-to-Peak: 0.090, Kurtosis: 0.055. Note: All numerical data values are approximated.

Permutation-based feature importance for the black-box neural network, expressed as the increase in mean squared error following feature perturbation. Peak-to-peak amplitude and kurtosis exhibit the highest influence on prediction accuracy, while RMS energy shows lower importance in the unconstrained model, highlighting the tendency of black-box learning to prioritise statistically dominant features over physically meaningful indicators

3.8 Validation on experimental bearing dataset

To evaluate the practical relevance of the proposed framework under realistic operating conditions, additional validation is conducted using the publicly available Case Western Reserve University (CWRU) rolling bearing dataset (Smith and Randall, 2015). The CWRU dataset comprises vibration measurements acquired from a motor-driven test rig equipped with deep-groove ball bearings. The data include healthy bearings and seeded defects introduced on the inner race, outer race, and rolling elements with defect diameters of 0.007 in, 0.014 in, and 0.021 in. Measurements are collected under multiple load conditions (0–3 hp), resulting in variations in rotational speed and dynamic response characteristics. Vibration signals are recorded using accelerometers at a sampling frequency of 12 kHz. For the present study, raw vibration signals are segmented into non-overlapping time windows of 1 s (12,000 samples per segment). From each segment, the same statistical features used in the synthetic experiments—RMS energy, peak-to-peak amplitude, and kurtosis—are extracted to maintain methodological consistency. No additional feature engineering or signal preprocessing beyond segmentation and statistical feature computation is introduced. To enable regression-based severity estimation, defect diameter is used as an ordinal proxy for fault severity. Healthy bearings are assigned the lowest severity level, while increasing defect diameters correspond to progressively higher ordinal severity values. This ordinal mapping preserves the physical progression structure inherent in defect size while allowing continuous severity estimation within the regression framework. The network architecture, optimisation procedure, and physics-loss coefficient $λ$ are retained without modification when transferring from the synthetic dataset to the experimental dataset. No hyperparameter retuning is performed, ensuring that performance differences reflect model transferability rather than dataset-specific optimisation. Table 5 summarises the regression performance on the experimental dataset.

Table 5

Experimental dataset performance comparison

Model	MSE	MAE	R²	Monotonicity violation rate (%)
Black-box neural network	0.089	0.214	0.78	4.6
Physics-informed neural network	0.073	0.191	0.83	1.2

As expected, regression accuracy decreases relative to the synthetic dataset due to increased signal variability, non-stationarity, load-dependent dynamics, and measurement uncertainty present in experimental data. These factors introduce complexity beyond the controlled sinusoidal structure of the synthetic signals. Nevertheless, the physics-informed model consistently achieves lower prediction error and improved coefficient of determination compared to the unconstrained black-box model. More importantly, the monotonicity violation rate is substantially reduced. While the black-box model occasionally predicts lower severity for segments exhibiting higher RMS energy, the physics-informed model demonstrates significantly improved adherence to the expected fault–energy ordering across operating conditions. Although absolute predictive performance is influenced by real-world complexity and the simplified feature representation, the consistent relative improvement observed across both synthetic and experimental datasets indicates that embedding physically motivated output constraints enhances behavioural stability and consistency without architectural modification. These findings support the practical applicability of the proposed framework for vibration-based fault severity estimation in realistic diagnostic scenarios.

4. Discussion

The findings of this study demonstrate that embedding physically meaningful constraints within data-driven models can enhance behavioural consistency in vibration-based fault severity estimation. While deep learning approaches have achieved strong predictive performance in rotating machinery diagnostics (Zhao et al., 2019; Lei et al., 2020; Tama et al., 2023), they often optimise statistical accuracy without explicitly enforcing adherence to established engineering relationships. The proposed physics-informed formulation addresses this limitation by constraining the solution space through a monotonic fault–energy prior, thereby integrating domain knowledge directly into the optimisation process. Importantly, the benefits of the proposed approach extend beyond improvements in aggregate regression metrics. Although reductions in MSE and improvements in $R^{2}$ reflect enhanced predictive stability, the more consequential outcome lies in the reduction of monotonicity violations and the preservation of physically admissible behaviour across operating conditions. This behaviour is consistent with broader developments in physics-informed machine learning, where incorporating structured priors has been shown to improve model stability and generalisation under uncertainty (Raissi et al., 2019; Karniadakis et al., 2021; Shen et al., 2021). By restricting the hypothesis space to solutions compatible with a physically motivated ordering principle, the model reduces the likelihood of locally inconsistent predictions that may arise in unconstrained optimisation. From an engineering standpoint, behavioural coherence is as critical as numerical precision. In practical condition-monitoring systems, maintenance decisions are often informed by severity trends rather than isolated predictions (Gao et al., 2015; Villalonga et al., 2021). Even infrequent violations of expected physical behaviour such as predicting lower severity at higher vibration energy can undermine practitioner confidence and complicate decision-making workflows. By embedding a monotonic constraint during training, the proposed framework promotes severity progression patterns that remain aligned with established mechanical intuition. The study also clarifies an important conceptual distinction between explainability and physical consistency. Explainable artificial intelligence methods improve transparency by attributing predictions to input features (Adadi and Berrada, 2018; Guidotti et al., 2019). However, interpretability alone does not ensure that the learned relationships are physically meaningful. A model may provide coherent explanations while still exploiting correlations that contradict known system dynamics. The present results illustrate that enforcing physical structure during optimisation differs fundamentally from post hoc explanation: the former shapes the admissible hypothesis space, whereas the latter analyses model behaviour without constraining it. Their integration therefore serves complementary roles physics-based constraints promote admissible behaviour, while explainability provides insight into how that behaviour is achieved. A modest trade-off is observed in training dynamics. The physics-informed model converges more gradually and may exhibit slightly higher training loss due to the additional regularisation term. This effect reflects the intentional restriction of the optimisation landscape rather than diminished representational capacity. In return, the model demonstrates reduced monotonicity violations and improved behavioural stability. Notably, the additional computational cost associated with the constraint is confined to the training phase and does not affect inference-time efficiency, preserving practical feasibility. Collectively, these findings support a broader methodological perspective: in safety-aware engineering diagnostics, model evaluation should not be limited to predictive accuracy alone. Structural consistency, interpretability, and robustness under variability are equally important considerations, particularly in domains where diagnostic outputs influence operational decisions (Kurd et al., 2007; Seshia et al., 2018; Wang and Chung, 2022). By demonstrating that physically grounded constraints can be embedded within standard neural network architectures without sacrificing predictive capability, this study contributes toward the development of more transparent, behaviourally consistent, and engineering-aligned AI systems for fault diagnosis.

4.1 Expected real-world diagnostic accuracy and benchmark transferability

Although the present study employs physics-inspired synthetic vibration data to enable controlled evaluation, it is important to contextualise the reported performance metrics relative to real-world diagnostic scenarios. Experimental bearing datasets, such as those from Case Western Reserve University (CWRU), Paderborn University, XJTU-SY, and PRONOSTIA, introduce additional complexities including measurement noise, operating-condition variability, non-stationarity, sensor placement effects, and uncertainty in severity labelling (Smith and Randall, 2015; Lessmeier et al., 2016; Tama et al., 2023; Matania et al., 2024). These factors typically reduce achievable regression accuracy compared to controlled synthetic environments. The quantitative results reported in Section 3.3 therefore represent baseline performance under idealised conditions. Within this controlled setting, the physics-informed model achieves lower error and improved explanatory power relative to the unconstrained black-box model, reflecting the regularising influence of the embedded monotonic constraint. Importantly, the improvement arises from enforcement of a physically motivated fault–energy relationship rather than reliance on dataset-specific statistical correlations. When considering transferability, it is reasonable to expect that absolute performance will decrease under realistic operating conditions due to increased signal complexity and environmental variability. However, the structural constraint may help preserve behavioural consistency when data distributions shift moderately. The limited dispersion observed across five-fold cross-validation folds suggests that the proposed framework does not depend heavily on a specific data partition and exhibits resilience to moderate sampling variability. To provide contextual comparison, Table 6 summarises indicative performance ranges reported in representative vibration-based bearing fault diagnosis studies using experimental datasets (Smith and Randall, 2015; Tama et al., 2023; Matania et al., 2024). These ranges are presented for interpretive context rather than strict benchmarking.

Table 6

Indicative comparison between synthetic and representative real-world diagnostic performance

Metric	Synthetic (this study)	Representative experimental range
MSE	∼0.016	0.04–0.15
MAE	∼0.096	0.12–0.30
R²	∼0.95	0.70–0.88

The synthetic results reported here should therefore be interpreted as upper-bound performance under controlled conditions. When applied to real vibration datasets, reductions in absolute accuracy are anticipated due to intrinsic signal variability and labelling uncertainty. Such reductions reflect the inherent complexity of real-world diagnostics rather than limitations of the proposed formulation. The primary contribution of this study is not the absolute magnitude of regression accuracy, but the demonstration that physically consistent learning behaviour can be embedded within flexible neural network architectures without sacrificing predictive capability. In practical bearing fault severity estimation tasks, stable predictions with physically admissible behaviour and $R^{2}$ values in the range of approximately 0.75–0.85 are commonly reported in experimental studies (Tama et al., 2023; Matania et al., 2024). Accordingly, while broader validation across diverse industrial datasets remains necessary, the incorporation of structural physical constraints provides a principled foundation for enhancing behavioural robustness in real-world diagnostic applications.

5. Limitations and future work

While the proposed framework demonstrates improved physical consistency and stable predictive behaviour under controlled conditions, several limitations should be acknowledged.

First, a substantial portion of the validation is conducted using physics-inspired synthetic vibration data. The synthetic setting enables controlled isolation of the monotonic fault–energy relationship and provides a transparent environment for evaluating the influence of the embedded physics constraint. However, real industrial environments introduce additional complexities, including machine-specific dynamics, load variability, sensor placement effects, and non-stationary operating behaviour. Although experimental dataset validation has been included, broader assessment across diverse industrial systems would be necessary before deployment in specific applications.

Second, the embedded physics constraint is intentionally formulated as a simplified monotonic relationship between vibration energy and fault severity. This assumption represents a first-order physical prior that is broadly applicable in many rotating machinery contexts but may not capture all fault mechanisms or operating regimes. In certain scenarios, fault progression may involve nonlinear interactions, frequency shifts, or competing signal characteristics that are not fully described by a monotonic energy trend. When detailed system models or domain-specific knowledge are available, incorporating more refined or multi-dimensional physical priors may enhance diagnostic fidelity.

Future work will focus on extending the framework to additional experimental and field-acquired vibration datasets spanning multiple machine types and operating conditions. Evaluating performance under varying load, speed, and environmental disturbances will provide deeper insight into robustness and generalisability. Further research may also explore integration of multi-sensor information, such as acoustic emission, thermal measurements, or torque signals, to capture complementary fault signatures.

Another promising direction involves adaptive or data-driven physics constraints that evolve with changing operational regimes. Rather than enforcing a fixed monotonic prior, future formulations could allow constraint strength or structure to adjust based on system state, thereby balancing flexibility and physical admissibility in dynamic environments. Overall, the present study establishes a reproducible and physically grounded learning framework for fault severity estimation. Continued validation and extension across diverse industrial scenarios will further clarify its applicability and practical impact.

6. Conclusions

This study presented a physics-informed and explainable artificial intelligence framework for vibration-based fault severity estimation in engineering systems. By integrating physically interpretable vibration features with a neural network constrained through an explicit monotonic fault–energy relationship, the proposed approach promotes prediction behaviour that remains aligned with a prescribed physical prior. The results demonstrate that embedding a physics-based ordering constraint directly within the optimisation objective substantially reduces monotonicity violations and improves predictive consistency compared with an unconstrained black-box model. Cross-validation experiments indicate stable performance under data partitioning variability, while permutation-based feature importance analysis provides a global perspective on learned feature utilisation. Importantly, these improvements are achieved without architectural modification or additional inference-time complexity. A central insight of this work is that reliable fault diagnosis benefits from incorporating structural physical knowledge during model training rather than relying solely on post hoc interpretability. By combining loss-level physics-informed regularisation with quantitative explainability analysis, the framework simultaneously enforces behavioural consistency and enhances transparency in predictive reasoning. Although validation on controlled synthetic data and a benchmark experimental dataset demonstrates consistent relative improvement, broader evaluation across diverse industrial systems, operating regimes, and sensor configurations is necessary to fully assess generalisability. Future extensions incorporating richer physical priors, adaptive constraint formulations, and multi-sensor data integration may further strengthen robustness and practical applicability. Overall, the proposed methodology establishes a reproducible and physically grounded foundation for advancing interpretable and behaviourally consistent AI-based fault severity estimation in engineering diagnostics.

The supplementary material for this article can be found online.

References

Adadi

,

A.

and

Berrada

,

M.

(

2018

), “

Peeking inside the black box: a survey on explainable artificial intelligence (XAI)

”,

IEEE Access

, Vol.

6

, pp.

52138

-

52160

, doi:

https://doi.org/10.1109/ACCESS.2018.2870052

.

Google Scholar

Crossref

Ahmed

,

I.

,

Jeon

,

G.

and

Piccialli

,

F.

(

2022

), “

From artificial intelligence to explainable artificial intelligence in Industry 4.0: a survey on what, how, and where

”,

IEEE Transactions on Industrial Informatics

, Vol.

18

No.

8

, pp.

5031

-

5042

, doi:

https://doi.org/10.1109/TII.2021.3127387

.

Google Scholar

Fan

,

Z.

,

Zhang

,

Z.

,

Wang

,

J.

,

Bao

,

H.

,

Han

,

B.

,

Wang

,

W.

,

Ge

,

R.

and

Ma

,

J.

(

2025

), “

Time–frequency residual self-supervised network for bearing fault diagnosis with limited labeled data

”,

Journal of Vibration Engineering and Technologies

, Vol.

13

No.

6

, p.

436

, doi:

https://doi.org/10.1007/s42417-025-01990-8

.

Google Scholar

Crossref

Gao

,

Z.

,

Cecati

,

C.

and

Ding

,

S.X.

(

2015

), “

A survey of fault diagnosis and fault-tolerant techniques–Part I: fault diagnosis with model-based and signal-based approaches

”,

IEEE Transactions on Industrial Electronics

, Vol.

62

No.

6

, pp.

3757

-

3767

, doi:

https://doi.org/10.1109/TIE.2015.2417501

.

Google Scholar

Crossref

Guidotti

,

R.

,

Monreale

,

A.

,

Ruggieri

,

S.

,

Turini

,

F.

,

Giannotti

,

F.

and

Pedreschi

,

D.

(

2019

), “

A survey of methods for explaining black box models

”,

ACM Computing Surveys

, Vol.

51

No.

5

, pp.

93

-

42

, doi:

https://doi.org/10.1145/3236009

.

Google Scholar

Crossref

Jia

,

N.

,

Li

,

Y.

,

Zhang

,

X.

,

Ding

,

K.

and

Zhu

,

Z.

(

2024

), “

Physics-informed unsupervised domain adaptation framework for cross-machine bearing fault diagnosis

”,

Advanced Engineering Informatics

, Vol.

62

, 102774, doi:

https://doi.org/10.1016/j.aei.2024.102774

.

Google Scholar

Crossref

Karniadakis

,

G.E.

,

Kevrekidis

,

I.G.

,

Lu

,

L.

,

Perdikaris

,

P.

,

Wang

,

S.

and

Yang

,

L.

(

2021

), “

Physics-informed machine learning

”,

Nature Reviews Physics

, Vol.

3

No.

6

, pp.

422

-

440

, doi:

https://doi.org/10.1038/s42254-021-00314-5

.

Google Scholar

Crossref

Kurd

,

Z.

,

Kelly

,

T.

and

Austin

,

J.

(

2007

), “

Developing artificial neural networks for safety-critical systems

”,

Neural Computing and Applications

, Vol.

16

No.

1

, pp.

11

-

19

, doi:

https://doi.org/10.1007/s00521-006-0055-0

.

Google Scholar

Lei

,

Y.

,

Yang

,

B.

,

Jiang

,

X.

,

Jia

,

F.

,

Li

,

N.

and

Nandi

,

A.K.

(

2020

), “

Applications of machine learning to machine fault diagnosis: a review and roadmap

”,

Mechanical Systems and Signal Processing

, Vol.

138

, 106587, doi:

https://doi.org/10.1016/j.ymssp.2019.106587

.

Google Scholar

Crossref

Lessmeier

,

C.

,

Kimotho

,

J.K.

,

Zimmer

,

D.

and

Sextro

,

W.

(

2016

), “

Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors

”,

Mechanical Systems and Signal Processing

, Vol.

65

, pp.

118

-

129

, doi:

https://doi.org/10.36001/phme.2016.v3i1.1577

.

Google Scholar

Crossref

Lu

,

H.

,

Liu

,

Z.

,

Li

,

Y.

,

Chen

,

X.

,

Hu

,

C.

,

Laflamme

,

S.

,

Sarkar

,

S.

and

Zimmerman

,

A.T.

(

2023

), “

A physics-informed feature weighting method for bearing fault diagnostics

”,

Mechanical Systems and Signal Processing

, Vol.

191

, 110171, doi:

https://doi.org/10.1016/j.ymssp.2023.110171

.

Google Scholar

Crossref

Matania

,

O.

,

Bechhoefer

,

E.

,

Eltz

,

J.

,

Kenett

,

R.S.

and

Parmet

,

Y.

(

2024

), “

A systematic literature review of deep learning for vibration-based fault diagnosis of critical rotating machinery: limitations and challenges

”,

Journal of Sound and Vibration

, Vol.

590

, 118562, doi:

https://doi.org/10.1016/j.jsv.2024.118562

.

Google Scholar

Crossref

Ni

,

Q.

,

Liu

,

Z.

,

Li

,

X.

,

Wang

,

J.

and

Nandi

,

A.K.

(

2023

), “

Physics-informed residual network (PIResNet) for rolling element bearing fault diagnostics

”,

Mechanical Systems and Signal Processing

, Vol.

200

, 110544, doi:

https://doi.org/10.1016/j.ymssp.2023.110544

.

Google Scholar

Crossref

Raissi

,

M.

,

Perdikaris

,

P.

and

Karniadakis

,

G.E.

(

2019

), “

Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

”,

Journal of Computational Physics

, Vol.

378

, pp.

686

-

707

, doi:

https://doi.org/10.1016/j.jcp.2018.10.045

.

Google Scholar

Crossref

Samek

,

W.

,

Montavon

,

G.

,

Vedaldi

,

A.

,

Hansen

,

L.K.

and

Müller

,

K.R.

(

2019

), “Explainable AI: interpreting, explaining and visualizing deep learning”, in

Lecture Notes in Computer Science

, Vol.

11700

, pp.

3

-

22

, doi:

https://doi.org/10.1007/978-3-030-28954-6_1

.

Google Scholar

Crossref

Seshia

,

S.A.

,

Sadigh

,

D.

and

Sastry

,

S.S.

(

2018

), “

Toward verified artificial intelligence

”,

Communications of the ACM

, Vol.

61

No.

7

, pp.

68

-

77

, doi:

https://doi.org/10.1145/3208077

.

Google Scholar

Shen

,

S.

,

He

,

D.

,

Li

,

S.

,

Bechhoefer

,

E.

,

Nemani

,

V.

,

Thelen

,

A.

,

Webster

,

K.

,

Darr

,

M.

,

Sidon

,

J.

and

Kenny

,

S.

(

2021

), “

A physics-informed deep learning approach for bearing fault detection

”,

Engineering Applications of Artificial Intelligence

, Vol.

103

, 104295, doi:

https://doi.org/10.1016/j.engappai.2021.104295

.

Google Scholar

Crossref

Shi

,

S.

,

Du

,

D.

,

Mercan

,

O.

,

Kalkan

,

E.

and

Parol

,

J.

(

2025

), “

Contrastive and self-supervised learning for open-set damage classification in structural health monitoring with incomplete and imbalanced vibration data

”,

Expert Systems with Applications

, Vol.

293

, 128731, doi:

https://doi.org/10.1016/j.eswa.2025.128731

.

Google Scholar

Crossref

Smith

,

W.A.

and

Randall

,

R.B.

(

2015

), “

Rolling element bearing diagnostics using the Case Western Reserve University data: a benchmark study

”,

Mechanical Systems and Signal Processing

, Vols

64-65

, pp.

100

-

131

, doi:

https://doi.org/10.1016/j.ymssp.2015.04.021

.

Google Scholar

Crossref

Tama

,

B.A.

,

Comuzzi

,

M.

and

Rhee

,

K.H.

(

2023

), “

Recent advances in the application of deep learning for fault diagnosis of rotating machinery using vibration signals

”,

Artificial Intelligence Review

, Vol.

56

No.

5

, pp.

4667

-

4709

, doi:

https://doi.org/10.1007/s10462-022-10238-7

.

Google Scholar

Villalonga

,

A.

,

Beruvides

,

G.

,

Castaño

,

F.

and

Haber

,

R.E.

(

2021

), “

A survey on machine learning for fault diagnosis in manufacturing systems

”,

Journal of Intelligent Manufacturing

, Vol.

32

No.

2

, pp.

447

-

466

, doi:

https://doi.org/10.1007/s10845-020-01555-5

.

Google Scholar

Wang

,

Y.

and

Chung

,

S.H.

(

2022

), “

Artificial intelligence in safety-critical systems: a systematic review

”,

Industrial Management and Data Systems

, Vol.

122

No.

2

, pp.

442

-

470

, doi:

https://doi.org/10.1108/IMDS-06-2021-0371

.

Google Scholar

Wang

,

H.

,

Wang

,

H.

and

Tang

,

X.

(

2025

), “

A review of deep learning in rotating machinery fault diagnosis and its prospects for port applications

”,

Applied Sciences

, Vol.

15

No.

21

, 11303, doi:

https://doi.org/10.3390/app152111303

.

Google Scholar

Crossref

Xu

,

Y.

,

Lu

,

X.

,

Gao

,

T.

and

Meng

,

R.

(

2025

), “

A self-supervised multiview contrastive learning network for the fault diagnosis of rotating machinery under limited annotation information

”,

IEEE Transactions on Instrumentation and Measurement

, Vol.

74

, pp.

1

-

13

, doi:

https://doi.org/10.1109/tim.2025.3527612

.

Google Scholar

Crossref

PubMed

Zhang

,

W.

,

Peng

,

G.

,

Li

,

C.

,

Chen

,

Y.

and

Zhang

,

Z.

(

2017a

), “

A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals

”,

Sensors

, Vol.

17

No.

2

, p.

425

, doi:

https://doi.org/10.3390/s17020425

.

Google Scholar

Crossref

Zhang

,

Y.

,

Zhou

,

X.

,

Gao

,

C.

,

Lin

,

J.

,

Ren

,

Z.

and

Feng

,

K.

(

2025

), “

Contrastive learning-enabled digital twin framework for fault diagnosis of rolling bearing

”,

Measurement Science and Technology

, Vol.

36

No.

1

, 015026, doi:

https://doi.org/10.1088/1361-6501/ad8f52

.

Google Scholar

Crossref

Zhao

,

R.

,

Yan

,

R.

,

Chen

,

Z.

,

Mao

,

K.

,

Wang

,

P.

and

Gao

,

R.X.

(

2019

), “

Deep learning and its applications to machine health monitoring

”,

Mechanical Systems and Signal Processing

, Vol.

115

, pp.

213

-

237

, doi:

https://doi.org/10.1016/j.ymssp.2018.05.050

.

Google Scholar

Crossref

Zhao

,

L.

,

Xie

,

T.

,

Wei

,

Y.

,

Liu

,

Y.

and

Qin

,

Y.

(

2025

), “

Overview of digital twin-driven rotating machinery fault diagnosis: status and trends

”,

Measurement Science and Technology

, Vol.

36

No.

5

, 052001, doi:

https://doi.org/10.1088/1361-6501/adc8c3

.

Google Scholar

Crossref

Physics-informed and explainable artificial intelligence for robust fault diagnosis in engineering systems

1. Introduction

1.1 Background

1.2 Related work

1.3 Research gap and contributions

2. Methodology

2.1 Physics-inspired synthetic data generation

2.2 Feature extraction

2.3 Neural network models

2.3.1 Black-box neural network

2.3.2 Physics-informed neural network

2.3.3 Role of physics constraints beyond post-hoc explainability

2.3.4 Evaluation protocol

2.3.4.1 Implementation and reproducibility note

3. Results and discussion

3.1 Feature–fault relationship analysis

3.2 Model training behaviour

3.3 Prediction accuracy comparison

3.3.1 Sensitivity to physics-loss weight

3.3.2 Comparison with a simple monotonic baseline

3.4 Physics consistency validation

3.5 Quantitative monotonicity assessment

3.6 Robustness analysis

3.7 Explainability analysis

3.8 Validation on experimental bearing dataset

4. Discussion

4.1 Expected real-world diagnostic accuracy and benchmark transferability

5. Limitations and future work

6. Conclusions

References

Further reading

Supplementary data

Email Alerts

Cited By

Physics-informed and explainable artificial intelligence for robust fault diagnosis in engineering systems Open Access

1. Introduction

1.1 Background

1.2 Related work

1.3 Research gap and contributions

2. Methodology

2.1 Physics-inspired synthetic data generation

2.2 Feature extraction

2.3 Neural network models

2.3.1 Black-box neural network

2.3.2 Physics-informed neural network

2.3.3 Role of physics constraints beyond post-hoc explainability

2.3.4 Evaluation protocol

2.3.4.1 Implementation and reproducibility note

3. Results and discussion

3.1 Feature–fault relationship analysis

3.2 Model training behaviour

3.3 Prediction accuracy comparison

3.3.1 Sensitivity to physics-loss weight

3.3.2 Comparison with a simple monotonic baseline

3.4 Physics consistency validation

3.5 Quantitative monotonicity assessment

3.6 Robustness analysis

3.7 Explainability analysis

3.8 Validation on experimental bearing dataset

4. Discussion

4.1 Expected real-world diagnostic accuracy and benchmark transferability

5. Limitations and future work

6. Conclusions

References

Further reading

Supplementary data

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable

Physics-informed and explainable artificial intelligence for robust fault diagnosis in engineering systems