Intelligent fault diagnosis for elevator door system based on discriminative multi-scale residual network

Li, Jiefeng; Ren, He; Song, Qiuyu; Ye, Liang; Li, Gongning

doi:10.1108/JIMSE-11-2025-0027

Purpose

This study addresses the challenge of extracting effective state features for intelligent fault diagnosis in elevator door systems, aiming to enhance model representational capability for reliable decision-making and accident prevention in real-world engineering.

Design/methodology/approach

The proposed discriminative multi-scale residual network (DMSRN) uses a one-dimensional multi-scale architecture to capture rich and hierarchical features, combined with a novel feature discriminative enhancement strategy that enhances intra-class compactness and inter-class separability, thereby improving feature discriminability.

Findings

Comprehensive experimental results validate the superiority of the proposed approach. The DMSRN achieves an average diagnostic accuracy that exceeds the best baseline method by more than 18.16%, demonstrating a substantial competitive advantage in fault diagnosis for elevator door systems.

Originality/value

The key contribution of this research lies in the novel integration of multi-scale feature learning with a specialized feature discriminative enhancement mechanism. This combination offers significant practical value by improving diagnostic reliability in safety-critical elevator applications.

1. Introduction

The proliferation of high-rise structures in contemporary urban settings has established elevators as a critical component of vertical transit systems (Ren et al., 2024). Nevertheless, failures in these systems, including irregular door movement, passenger entrapment, operational noises and sudden impacts, may lead to serious outcomes, compromising both passenger safety and operational integrity. Statistics indicate that door system faults account for over 80% of elevator-related accidents (Wang et al., 2018), highlighting the pressing demand for reliable and precise fault diagnosis techniques to facilitate timely maintenance of elevator door mechanisms.

Traditional fault diagnosis approaches, such as rule-based expert systems, rely heavily on domain knowledge, requiring experts to meticulously define fault signatures and develop corresponding diagnostic procedures. Moreover, periodic shutdown maintenance often entails substantial time, labor and financial resources. In recent years, data-driven diagnostic techniques have gained increasing attention in the field of prognostics and health management for their potential to enhance asset reliability (Lu et al., 2025a). For instance, Bai et al. (2021) developed a fault prediction model based on an optimized particle swarm optimization–back propagation algorithm. Chen et al. (2019) provided a comprehensive review of various artificial intelligence techniques applied in elevator fault diagnosis, including back propagation neural networks, radial basis function networks, K-means clustering and support vector machines (SVMs). A key limitation of these approaches is their inadequate capability in learning discriminative features, for which deep learning presents a promising alternative.

In contrast to traditional machine learning methods, deep learning-based diagnosis has seen rapid development due to its capability to autonomously extract hierarchical features in a cost-effective manner. For example, Kim et al. (2023) utilized a variational autoencoder with a restricted latent space and Bayesian optimization to construct a margin-maximized hyperspace for accurate elevator door fault classification. Similarly, Wang et al. (2021) implemented a distributed deep learning model that incorporates geolocation information by processing features through hidden parameters of the network. Chen et al. (2025) introduces a principal component analysis–long short-term memory method for fault diagnosis of an elevator. Pan et al. (2024) conducted research on fault prediction method of elevator door system based on techniques including graph neural networks, long-short-term memory networks and so on. Wan et al. (2025) combined variational mode decomposition and multi-scale convolutional neural network (CNN) and applied them to intelligent fault diagnosis of the center-opening elevator door system. Lu et al. (2025b) proposed a multi-scale convolution capsule network with data augmentation and attention mechanisms and realized positive elevator fault diagnosis.

Nevertheless, traditional convolutional neural networks often exhibit limited feature representation capacity, owing to their single-scale receptive fields and susceptibility to gradient vanishing, which can compromise the reliability of intelligent diagnostic systems. Moreover, the discriminative feature extraction for elevator door systems remains an understudied area. To address these limitations, this article proposes a discriminative multi-scale residual network (DMSRN), which integrates a one-dimensional multi-scale residual architecture with metric learning. This design optimizes the structure of the feature space, enabling the extracted deep features to be not only separable but also semantically discriminative. As a result, the proposed model enhances the reliability of intelligent fault diagnosis. The main contributions of this work are summarized as follows.

A feature extractor with one-dimensional multi-scale residual mechanism is designed to enhance the quality of the extracted features. By integrating the powerful concepts of multi-scale parallel processing and fast identity mapping, it addresses the core pain points of traditional CNN in terms of depth, multi-scale processing, and training stability.
A feature discriminative enhancement strategy that applies metric learning to optimize the feature space, explicitly improves semantic discriminability through enforced intra-class compactness and inter-class separation.
A novel and simple framework called DMSRN is proposed for automatically and accurately diagnosing faults of elevator door systems. Experimental findings, including accuracy, confusion matrices and feature visualization obtained from the laboratory datasets, have demonstrated the superior diagnosis performance of the DMSRN method.

The remainder of this article is structured as follows. Section 2 details the proposed DMSRN methodology. Section 3 presents and discusses the experimental setup and results. Finally, Section 4 concludes the study and Section 5 outlines potential directions for future research.

2. Related work

2.1 Residual networks

Residual networks are widely applied in various fields, including rolling element bearing fault diagnosis (Ni et al., 2023), bird song recognition (Hu et al., 2023), classification of power quality disturbances (Pan et al., 2025) and so on. The fundamental advantage of residual networks over conventional networks lies in their ability to overcome the degradation problem, where increasing the depth of a neural network leads to higher training error rather than better performance. By introducing skip connections, residual networks allow the input to bypass one or more layers, effectively teaching the network to learn a residual function rather than the full underlying mapping. This architecture mitigates the vanishing gradient problem by providing a gradient highway that enables smoother backpropagation to earlier layers. Consequently, Residual networks can be scaled to hundreds or even thousands of layers, achieving significantly higher accuracy and faster convergence while remaining easier to optimize than their single-path counterparts.

2.2 Multi-scale convolutional architectures

Compared to single-scale designs, the key advantage of multi-scale convolutional architectures lies in their superior ability to handle scale variation and capture richer hierarchical features. While single-scale models employ a fixed kernel size and thus a static receptive field at each layer, multi-scale architectures process input data through multiple parallel pathways or hierarchical levels with varying spatial resolutions and kernel sizes. This enables the network to simultaneously extract fine-grained local details like edges and textures, as well as broad global context, including object shapes and spatial relationships. By integrating these complementary perspectives, multi-scale models become more robust in detecting objects of vastly different sizes, ranging from small distant instances to large foreground objects. Moreover, this approach often enhances computational efficiency; rather than simply increasing network depth to enlarge the receptive field, multi-scale designs decompose convolutions into factored operations, achieving stronger representational capacity with fewer parameters. As a result, they consistently deliver improved accuracy in challenging tasks such as medical image segmentation (Yin et al., 2023) and Electroencephalogram-based motor imagery decoding (Liu et al., 2022).

2.3 Metric-learning

Metric learning is widely utilized in tasks such as facial recognition and image retrieval, as it enables models to learn an effective embedding space where semantically similar samples are mapped to adjacent positions (Li et al., 2023). Depending on the learning paradigm, current mainstream metric learning methods are generally categorized into comparison-based (Wu et al., 2025), proxy-based (Chan et al., 2023), and center-based losses (Dong and Lam, 2024). Among these, center-based loss is increasingly favored for its superior optimization stability and rapid convergence. Specifically, center loss functions by simultaneously learning a centroid for each class and penalizing the distances between individual samples and their corresponding centers. While the traditional Softmax loss ensures inter-class separability, it does not explicitly encourage intra-class compactness. Center loss addresses this by directly minimizing intra-class variation, thereby producing tighter feature clusters. Furthermore, unlike triplet loss (Zhou et al., 2024), which requires computationally expensive hard-sample mining, center loss is highly efficient as it only involves updating class centers. Consequently, center loss is typically employed as a joint supervision signal with Softmax, i.e. the latter ensures global separability while the former enhances local compactness. This dual-objective approach results in more stable training and faster convergence compared to purely pair-based methods.

3. Methodology

This section presents the proposed framework for enhancing the accuracy and reliability of intelligent fault diagnosis in elevator door systems. The framework integrates a one-dimensional MSRN with a metric learning-based feature discriminative enhancement strategy, enabling effective extraction of discriminative state features and accurate fault identification.

3.1 Problem definition

Some notations and definitions should be firstly described. There are $n_{t r a i n}$ labeled samples $x$ in training dataset $D_{t r a i n} = {(x_{i}, y_{i})}_{i = 1}^{n_{t r a i n}}$ and $n_{t r a i n}$ unlabeled samples in test dataset $D_{t e s t} = {x_{i}}_{i = 1}^{n_{t e s t}}$ ⁠, where $y$ denotes the labels of M health states of elevator door systems. The goal is to learn powerful fault diagnosis models using labeled fault data. That is to say, the category prediction function f_s(⋅) established on the basis of the training dataset can be well applied to prediction tasks on test datasets that have not participated in model training.

3.2 Overview of the proposed DMSRN method

Figure 1 displays the overview of the proposed DMSRN. Its framework includes a feature extractor G based on MSRN and a classifier C based on fully connected layers. By integrating the multi-scale one-dimensional residual architecture with metric learning, DMSRN forms an effective fault diagnosis framework for elevator door systems.

Figure 1

A workflow diagram illustrates data preparation, feature extraction and fault classification for a lift door system.

View large Download slide

The illustration presents a three-stage workflow diagram for a fault diagnosis system applied to a lift door system. The process is divided into three labeled sections arranged from left to right: (1) Data preparation, (2) Feature extraction, and (3) Fault Classification. The left panel, labeled “(1) Data preparation”, shows the initial data acquisition and preprocessing stage. At the top, a label reads “Failed lift door system”. Below it, an image depicts a lift door mechanism connected to monitoring equipment, including a computer screen and a data acquisition device. An arrow points downward to the next step labeled “Data and Segmentation”. This section shows three time-series signals in different colors representing vibration or sensor measurements. An ellipsis is shown between the second and third series. Red dashed vertical boxes at the left and right on each series highlight segmented portions of the signal, indicating that the raw sensor data are divided into smaller samples for analysis. The segmented signals are labeled “Training data”. Two thick arrows from this panel point rightward to the central panel. The center panel is titled “(2) Feature extraction”. The panel is divided into two parts: the upper section shows neural network feature extraction layers, and the lower section illustrates metric learning in the feature space. At the top, several stacked neural network layers are displayed horizontally from left to right. Each layer is represented by a vertical rounded rectangle containing circular nodes. The first layers contain light yellow and light blue nodes, indicating intermediate feature maps. Solid arrows pointing right show the forward propagation of data from one layer to the next. Between some layers, dotted arrows pointing left indicate back propagation during training. The sequence of layers gradually transforms the input representation until the final layer on the far right, which contains blue circular nodes representing extracted features. Below the neural network layers, a label “Metric learning” marks the next stage. This section is enclosed by a red dashed rectangular boundary, representing the feature embedding space used to separate different health conditions. Within this feature space, several colored shapes represent samples from different health states: orange circles, blue squares, and green triangles. Each class forms clusters around center points, represented by larger outlined symbols such as a circled circle, a square outline, or a triangle outline. Gray arrows point toward these centers. Red dashed curved lines divide the feature space into regions. In the upper portion of the metric learning space, three clusters of different shapes are partially overlapping but being separated by the boundaries. In the lower portion, the clusters become more compact and clearly separated, illustrating how the metric learning process improves class separation. Two thick arrows from this central panel point rightward to the third panel. The right panel is labeled “(3) Fault Classification”. The panel depicts a simplified neural network used to classify system conditions based on extracted features. On the left side, a vertical column of green circular nodes. The circles are stacked vertically inside a rounded rectangular container, with a vertical ellipsis between them indicating additional nodes. From these input nodes, several connection lines extend to a second vertical column of rectangular output nodes, representing the classification layer. The output column is labeled “Predicted Labels” above it. The rectangles represent the predicted categories corresponding to different health states of the system. To the right of the output layer, a green rectangular box contains the symbols “L subscript m” and “L subscript c”, representing the loss functions used during training, typically metric loss and classification loss. A solid arrow pointing right connects the predicted label layer to this loss block. A dotted arrow pointing left indicates the back propagation of gradients from the loss functions back through the network. Below the classification diagram is a legend explaining the graphical symbols used in the overall framework: A solid black arrow represents forward propagation. A dotted arrow pointing left represents back propagation. Outlined symbols—a circle, square, and triangle—represent the centers of different health states. Filled symbols—blue squares, orange circles, and green triangles—represent samples belonging to different health states. A red dashed line represents a decision boundary. A gray arrow indicates distance reduction.

Structure of the proposed DMSRN method. Source(s): Created by authors

Figure 1

View large Download slide

The illustration presents a three-stage workflow diagram for a fault diagnosis system applied to a lift door system. The process is divided into three labeled sections arranged from left to right: (1) Data preparation, (2) Feature extraction, and (3) Fault Classification. The left panel, labeled “(1) Data preparation”, shows the initial data acquisition and preprocessing stage. At the top, a label reads “Failed lift door system”. Below it, an image depicts a lift door mechanism connected to monitoring equipment, including a computer screen and a data acquisition device. An arrow points downward to the next step labeled “Data and Segmentation”. This section shows three time-series signals in different colors representing vibration or sensor measurements. An ellipsis is shown between the second and third series. Red dashed vertical boxes at the left and right on each series highlight segmented portions of the signal, indicating that the raw sensor data are divided into smaller samples for analysis. The segmented signals are labeled “Training data”. Two thick arrows from this panel point rightward to the central panel. The center panel is titled “(2) Feature extraction”. The panel is divided into two parts: the upper section shows neural network feature extraction layers, and the lower section illustrates metric learning in the feature space. At the top, several stacked neural network layers are displayed horizontally from left to right. Each layer is represented by a vertical rounded rectangle containing circular nodes. The first layers contain light yellow and light blue nodes, indicating intermediate feature maps. Solid arrows pointing right show the forward propagation of data from one layer to the next. Between some layers, dotted arrows pointing left indicate back propagation during training. The sequence of layers gradually transforms the input representation until the final layer on the far right, which contains blue circular nodes representing extracted features. Below the neural network layers, a label “Metric learning” marks the next stage. This section is enclosed by a red dashed rectangular boundary, representing the feature embedding space used to separate different health conditions. Within this feature space, several colored shapes represent samples from different health states: orange circles, blue squares, and green triangles. Each class forms clusters around center points, represented by larger outlined symbols such as a circled circle, a square outline, or a triangle outline. Gray arrows point toward these centers. Red dashed curved lines divide the feature space into regions. In the upper portion of the metric learning space, three clusters of different shapes are partially overlapping but being separated by the boundaries. In the lower portion, the clusters become more compact and clearly separated, illustrating how the metric learning process improves class separation. Two thick arrows from this central panel point rightward to the third panel. The right panel is labeled “(3) Fault Classification”. The panel depicts a simplified neural network used to classify system conditions based on extracted features. On the left side, a vertical column of green circular nodes. The circles are stacked vertically inside a rounded rectangular container, with a vertical ellipsis between them indicating additional nodes. From these input nodes, several connection lines extend to a second vertical column of rectangular output nodes, representing the classification layer. The output column is labeled “Predicted Labels” above it. The rectangles represent the predicted categories corresponding to different health states of the system. To the right of the output layer, a green rectangular box contains the symbols “L subscript m” and “L subscript c”, representing the loss functions used during training, typically metric loss and classification loss. A solid arrow pointing right connects the predicted label layer to this loss block. A dotted arrow pointing left indicates the back propagation of gradients from the loss functions back through the network. Below the classification diagram is a legend explaining the graphical symbols used in the overall framework: A solid black arrow represents forward propagation. A dotted arrow pointing left represents back propagation. Outlined symbols—a circle, square, and triangle—represent the centers of different health states. Filled symbols—blue squares, orange circles, and green triangles—represent samples belonging to different health states. A red dashed line represents a decision boundary. A gray arrow indicates distance reduction.

Structure of the proposed DMSRN method. Source(s): Created by authors

3.3 One-dimensional MSRN

The one-dimensional MSRN represents an advanced architecture tailored for processing sequential one-dimensional data, such as time-series signals, through the incorporation of residual learning across multiple scales. Its fundamental concept aims to mitigate the vanishing gradient problem while improving feature propagation via residual blocks equipped with skip connections, thereby facilitating more stable training and enhanced hierarchical feature extraction.

The basic residual block in a one-dimensional residual network can be mathematically represented as Eq. (1)

r = g (z, {W}) + z,

(1)

where $z$ represents the input to the residual block, $g (z, {W})$ is the residual function (e.g. a series of one-dimensional convolutional layers with weights $W$ ⁠), and $r$ is the output of the block. The skip connection (the addition of $z$ ⁠) allows the network to learn residual mappings, which are often easier to optimize than the original mappings.

In a multi-scale configuration, the network processes input data across varying scales or resolutions. This is accomplished through parallel convolutional layers with diverse kernel sizes or dilation rates, enabling the extraction of features at distinct temporal granularities. The outputs from these parallel pathways are subsequently fused, typically via concatenation or summation, to construct a comprehensive feature representation.

As illustrated in Figure 2, a typical building block of the one-dimensional MSRN comprises three parallel convolutional layers with kernel sizes of 3, 5 and 7. The outputs from these layers are integrated and processed through an activation function before being propagated to subsequent residual blocks or layers.

Figure 2

A neural network architecture diagram with three convolution branches and concatenated feature vectors.

View large Download slide

The image shows a neural network architecture diagram arranged vertically. At the top, a rectangular box labeled “Conv-1 D 64 at 1 cross 7” appears, followed by another box labeled “Max Pooling 1 cross 3”. From this point, the structure splits into three parallel columns. In the left column, the first block contains a text box labeled “Conv-1 D 64 at 1 cross 3” followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 64 at 1 cross 3” followed by the arrow “B N plus Re L U”, which leads to a circular plus symbol. Below it, a text box labeled “Conv-1 D 128 at 1 cross 3” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 128 at 1 cross 3” followed by the arrow “B N plus Re L U”, leading to another circular plus symbol. Further below, a text box labeled “Conv-1 D 256 at 1 cross 3” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 256 at 1 cross 3” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. The downward arrow emerges from the arrow connecting “Max Pooling 1 cross 3” to “Conv-1 D 64 at 1 cross 3” and points to the first circular plus symbol. Another downward arrow emerges from the first circular plus symbol and points to the second circular plus symbol. Another downward arrow emerges from the second circular plus symbol and points to the third circular plus symbol. At the bottom of the column, a text box labeled “Average Pooling” appears, followed by another text box labeled “Feature Vector (256 cross 1)”. In the middle column, the first block contains a text box labeled “Conv-1 D 64 at 1 cross 5” followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 64 at 1 cross 5” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. Below it, a text box labeled “Conv-1 D 128 at 1 cross 5” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 128 at 1 cross 5” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. Further below, a text box labeled “Conv-1 D 256 at 1 cross 5” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 256 at 1 cross 5” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. The downward arrow emerges from the arrow connecting “Max Pooling 1 cross 3” to “Conv-1 D 64 at 1 cross 3” and points to the first circular plus symbol. Another downward arrow emerges from the first circular plus symbol and points to the second circular plus symbol. Another downward arrow emerges from the second circular plus symbol and points to the third circular plus symbol. At the bottom of the column, a text box labeled “Average Pooling” appears, followed by another text box labeled “Feature Vector (256 cross 1)”. In the right column, the first block contains a text box labeled “Conv-1 D 64 at 1 cross 7” followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 64 at 1 cross 7” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. Below it, a text box labeled “Conv-1 D 128 at 1 cross 7” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 128 at 1 cross 7” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. Further below, a text box labeled “Conv-1 D 256 at 1 cross 7” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 256 at 1 cross 7” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. The downward arrow emerges from the arrow connecting “Max Pooling 1 cross 3” to “Conv-1 D 64 at 1 cross 3” and points to the first circular plus symbol. Another downward arrow emerges from the first circular plus symbol and points to the second circular plus symbol. Another downward arrow emerges from the second circular plus symbol and points to the third circular plus symbol. At the bottom of the column, a text box labeled “Average Pooling” appears, followed by another text box labeled “Feature Vector (256 cross 1)”. The three “Feature Vector (256 cross 1)” outputs connect to a text box labeled “Concatenate”, which leads to the final text box labeled “Feature Vector (768 cross 1)”.

Details of the one-dimensional MSRN. Source(s): Created by authors

Figure 2

View large Download slide

The image shows a neural network architecture diagram arranged vertically. At the top, a rectangular box labeled “Conv-1 D 64 at 1 cross 7” appears, followed by another box labeled “Max Pooling 1 cross 3”. From this point, the structure splits into three parallel columns. In the left column, the first block contains a text box labeled “Conv-1 D 64 at 1 cross 3” followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 64 at 1 cross 3” followed by the arrow “B N plus Re L U”, which leads to a circular plus symbol. Below it, a text box labeled “Conv-1 D 128 at 1 cross 3” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 128 at 1 cross 3” followed by the arrow “B N plus Re L U”, leading to another circular plus symbol. Further below, a text box labeled “Conv-1 D 256 at 1 cross 3” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 256 at 1 cross 3” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. The downward arrow emerges from the arrow connecting “Max Pooling 1 cross 3” to “Conv-1 D 64 at 1 cross 3” and points to the first circular plus symbol. Another downward arrow emerges from the first circular plus symbol and points to the second circular plus symbol. Another downward arrow emerges from the second circular plus symbol and points to the third circular plus symbol. At the bottom of the column, a text box labeled “Average Pooling” appears, followed by another text box labeled “Feature Vector (256 cross 1)”. In the middle column, the first block contains a text box labeled “Conv-1 D 64 at 1 cross 5” followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 64 at 1 cross 5” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. Below it, a text box labeled “Conv-1 D 128 at 1 cross 5” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 128 at 1 cross 5” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. Further below, a text box labeled “Conv-1 D 256 at 1 cross 5” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 256 at 1 cross 5” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. The downward arrow emerges from the arrow connecting “Max Pooling 1 cross 3” to “Conv-1 D 64 at 1 cross 3” and points to the first circular plus symbol. Another downward arrow emerges from the first circular plus symbol and points to the second circular plus symbol. Another downward arrow emerges from the second circular plus symbol and points to the third circular plus symbol. At the bottom of the column, a text box labeled “Average Pooling” appears, followed by another text box labeled “Feature Vector (256 cross 1)”. In the right column, the first block contains a text box labeled “Conv-1 D 64 at 1 cross 7” followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 64 at 1 cross 7” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. Below it, a text box labeled “Conv-1 D 128 at 1 cross 7” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 128 at 1 cross 7” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. Further below, a text box labeled “Conv-1 D 256 at 1 cross 7” appears, followed by the arrow “B N plus Re L U”, then another text box labeled “Conv-1 D 256 at 1 cross 7” followed by the arrow “B N plus Re L U”, leading to a circular plus symbol. The downward arrow emerges from the arrow connecting “Max Pooling 1 cross 3” to “Conv-1 D 64 at 1 cross 3” and points to the first circular plus symbol. Another downward arrow emerges from the first circular plus symbol and points to the second circular plus symbol. Another downward arrow emerges from the second circular plus symbol and points to the third circular plus symbol. At the bottom of the column, a text box labeled “Average Pooling” appears, followed by another text box labeled “Feature Vector (256 cross 1)”. The three “Feature Vector (256 cross 1)” outputs connect to a text box labeled “Concatenate”, which leads to the final text box labeled “Feature Vector (768 cross 1)”.

Details of the one-dimensional MSRN. Source(s): Created by authors

The overall architecture generally consists of multiple such residual blocks stacked in sequence, with potential downsampling operations interspersed between blocks to reduce temporal dimensionality and expand the receptive field. This hierarchical design facilitates effective learning of multi-level representations, capturing both localized patterns and broader contextual features within the input data.

3.4 Feature discriminative enhancement strategy

The model incorporates metric learning to improve feature discrimination by increasing separability between classes. To this end, the center loss function is utilized, which learns a unique class center for each category and constrains the distance between feature representations and their respective centers. This approach enhances feature representation by jointly enforcing intra-class compactness and inter-class separability, as illustrated in Figure 3.

Figure 3

A diagram shows three health state clusters and dashed boundaries before and after separation.

View large Download slide

The image shows two panels connected by a rightward-pointing arrow in the center. At the bottom, a legend shows a blue circular point labeled “Health state 1”, a green triangular point labeled “Health state 2”, an orange square point labeled “Health state 3”, and a dashed line labeled “Boundary”. In the left panel, orange square points appear mainly in the upper right area, blue circular points appear in the upper left area, and green triangular points appear near the bottom area. Curved dashed boundary lines appear between the groups of points. Inside each group, a cross mark appears with arrows pointing inward from several nearby points toward the cross mark. In the green triangular group near the bottom, three triangular points show arrows pointing inward toward the cross mark. Similar inward arrows appear for the blue circular group and the orange square group toward their respective cross marks. In the right panel, the points appear arranged into three clearer clusters where blue circular points gather in the upper left area, orange square points gather in the upper right area, and green triangular points gather in the lower area. Each cluster contains a cross mark with arrows pointing inward from surrounding points toward the cross mark. Curved dashed boundary lines separate the clusters in both panels. The visible spaces appear between the dashed boundary lines around the clusters in the right panel.

Schematic diagram of central loss. Source(s): Created by authors

The center loss consists of two processes, i.e. (1) Learning the class centers of deep features; (2) Using the learned class centers to constrain the features belonging to that class. For the training set ${(x_{i}, y_{i})}_{i = 1}^{n_{t r a i n}}$ ⁠, the center loss $L_{c e n t e r}$ is defined as:

L_{c e n t e r} = \frac{1}{2} \sum_{i = 1}^{n_{t r a i n}} {‖ x_{i} - c_{y_{i}} ‖}_{2}^{2}

(2)

where $c_{y_{i}}$ is the $y_{i}$ th class center of the features.

Since $c_{y_{i}}$ cannot be directly obtained, it is first randomly initialized and then gradually optimized during model training. The network parameters are updated in each small batch of sample training. Meanwhile, the gradient of $c_{y_{i}}$ is calculated to update the position of the center. The calculation process of the gradient of $L_{c e n t e r}$ with respect to $x_{i}$ and $c_{y_{i}}$ is as follows:

\frac{\partial L_{c e n t e r}}{\partial x_{i}} = x_{i} - c_{y_{i}},

(3)

Δ c_{j} = \frac{\sum_{i = 1}^{m} δ (y_{i} = j) (c_{j} - x_{i})}{1 + \sum_{i = 1}^{m} δ (y_{i} = j)},

(4)

where, if $y_{i} = j$ ⁠, then $δ (y_{i} = j) = 1$ ⁠; otherwise, $δ (y_{i} = j) = 0$ ⁠. For the center $c_{y_{i}}$ of j, it is updated in the following way:

c_{j}^{t + 1} = c_{j}^{t} - α Δ c_{j}^{t},

(5)

where $c_{j}^{t}$ represents the feature center at time t and needs to be continuously updated during the model training process. $α$ is a scalar used to control the learning rate of the center to avoid disturbances caused by some mislabeled samples. In this article, the default value of $α$ is 0.5.

The center loss promotes the formation of tight clusters around class centers in the feature space, which significantly clarifies inter-class boundaries and enhances feature discriminability.

3.5 Objective function optimization

In conjunction with the center loss, the standard cross-entropy loss is employed to jointly train the feature extractor and the fault classifier for the classification task.

L_{c} = - \sum_{i = 1}^{n_{t r a i n}} y_{i} \log C (G (x_{i}))

(6)

The overall training object of DMSRN can be summarized as follows:

L_{a l l} = L_{c} + λ L_{c e n t e r}

(7)

where $λ$ is the weighting coefficient of different losses. With Adam optimizer, the computation of parameters at each training epoch q is formulated in Eqs. (8) and (9).

θ_{G}^{(q)} \leftarrow θ_{G}^{(q)} - μ (\frac{\partial L_{c}}{θ_{G}^{(q)}} + λ \frac{\partial L_{c e n t e r}}{θ_{G}^{(q)}}),

(8)

θ_{C}^{(q)} \leftarrow θ_{C}^{(q)} - μ \frac{\partial L_{c}}{θ_{C}^{(q)}}

(9)

where $μ$ denotes the learning rate.

3.6 Diagnosis process using proposed DMSRN method

Figure 4 introduces the diagnosis process using proposed DMSRN method step by step.

Figure 4

A flowchart shows training phase and test phase steps for a D M S R N fault diagnosis process.

View large Download slide

The image shows a flowchart divided into two horizontal sections labeled “Training phase” and “Test phase”. In the “Training phase”, four rectangular boxes appear in sequence, connected by right-pointing arrows. The first box reads “Step 1: Collecting vibration data”. The arrow points to the second box labeled “Step 2: Splitting data into training samples”. Another arrow points to the third box labeled “Step 3: Inputting training samples into the proposed D M S R N for training”. A final arrow points to the fourth box labeled “Step 4: Outputting the trained D M S R N”. In the “Test phase”, four rectangular boxes also appear in sequence, connected by right-pointing arrows. The first box reads “Step 1: Collecting target vibration data”. The arrow points to the second box labeled “Step 2: Splitting data into test samples”. Another arrow points to the third box labeled “Step 3: Inputting test samples into the trained D M S R N for test”. A final arrow points to the fourth box labeled “Step 4: Outputting fault diagnosis result”.

Diagnosis process using proposed DMSRN method. Source(s): Created by authors

4. Experimental verification

This section presents a comprehensive evaluation of the proposed DMSRN's diagnostic performance and practical utility, conducted on a custom-built laboratory dataset for elevator door systems.

4.1 Datasets description

Figure 5 illustrates the test bench used for evaluating the elevator door system in a laboratory setting. A vibration sensor was mounted on the inner surface near the outer edge of the car door to collect real-time operational data at a sampling frequency of 10 Hz. The system was subjected to four health conditions: normal operation (labeled as 0), missing one threshold slide block (labeled as 1), missing two threshold slide blocks (labeled as 2) and debris presence in the threshold (labeled as 3). From the collected data, samples with a length of 1,024 points were extracted, resulting in 720 samples per fault class.

Figure 5

A labeled image showing elevator car door components including drive belt, door vane assembly, and car door panel.

View large Download slide

The image contains two side by side panels showing an elevator car door system. The left panel shows a photograph of a vertical sliding elevator door structure with a metal frame, rails, belts, and door panels inside an industrial setting. The right panel shows a labeled schematic diagram of the elevator car door components. Several arrows point to labeled parts. The labels read “Drive belt”, “Door vane assembly”, “Door operator assembly”, “Car door guide rail”, “Permanent magnet motor”, “Car door panel”, “Door slider”, and “Sill”. The diagram shows the vertical door panels connected to a mechanical system at the top and a horizontal base labeled “Sill”.

Test bench of elevator door system. Source(s): Created by authors

4.2 Implementation details

To evaluate the effectiveness and superiority of the proposed DMSRN, comparative experiments are conducted against one-dimensional CNN, SVM, MSRN and intra-domain adversarial network (IDAN) (Huang et al., 2025). Implementation details of these comparison methods are introduced as follows. Specifically, the architecture of CNN used in this experiment is introduced in Table 1. The architecture of MSRN is the same as that of DMSRN presented in Figure 2. The fault classifier of CNN, MSRN and DMSRN are both composed of two fully connected layers. CNN, MSRN and DMSRN employ the same experimental settings. The Adam optimizer with betas of (0.9, 0.99) is adopted for model training, with the learning rate $μ$ empirically set to 0.001. IDAN is a state-of-the-art fault diagnosis method which leverages multi-scale branches, an improved adversarial learning mechanism and a perspective sharing strategy to extract generalized fault representations. The hyperparameters and architectures of IDAN are adopted from those of the reference of Huang et al. (2025). To ensure the robustness and stability of the model, a sixfold cross-validation strategy was employed. In this approach, the total dataset was partitioned into six disjoint subsets. In each iteration, 5 folds were used for training the model, while the remaining fold served as the test set to evaluate performance. The training uses 100 epochs with a batch size of 10. Additionally, SVM employs a linear kernel, with the regularization coefficient varying across the set {0.01, 0.1, 1, 10, 100, 1,000}, and the results are recorded at each traversal.

Table 1

Architecture of CNN in this experiment

Layer	Operation and parameters
1-Conv	Kernel 8–15 × 1, stride 2, BN, ReLU
2-Conv	Kernel 16–15 × 1, stride 4, BN, ReLU
3-Conv	Kernel 32–15 × 1, stride 4, BN, ReLU
4-Conv	Kernel 64–15 × 1, stride 4, BN, ReLU
5-AveragePool	Kernel 256–8 × 1, stride 1

Layer	Operation and parameters
1-Conv	Kernel 8–15 × 1, stride 2, BN, ReLU
2-Conv	Kernel 16–15 × 1, stride 4, BN, ReLU
3-Conv	Kernel 32–15 × 1, stride 4, BN, ReLU
4-Conv	Kernel 64–15 × 1, stride 4, BN, ReLU
5-AveragePool	Kernel 256–8 × 1, stride 1

4.3 Results and analysis

Figure 6 provides the iterative curves of training accuracy and training loss of the proposed DMSRN in the whole training phase, the convergence of which shows the optimization of the model.

Figure 6

A line graph shows training accuracy and training loss across epochs.

View large Download slide

The image shows a line graph with the horizontal axis labeled “Epoch” and the vertical axes labeled “Training accuracy” on the left and “Training loss” on the right. The horizontal axis displays epoch values from 1 to 100 in increments of 11 units. The left vertical axis shows percentages from 0.00 percent to 100.00 percent in increments of 20.00 percent, while the right vertical axis shows values from 0.00 to 8.00 in increments of 1.00 units. Two lines appear on the graph. A legend at the bottom identifies the orange line as “Training accuracy” and the blue line as “Training loss”. The orange “Training accuracy” line starts near the coordinate (1, 35.00 percent), rises sharply during the early epochs passing through (12, 81.00 percent), and gradually increases until it reaches near the coordinate (100, 100.00 percent), where the line becomes almost flat. The blue “Training loss” line starts near the coordinate (1, 7.00), decreases steadily during the early epochs passing through (12, 5.50), and gradually declines toward the coordinate (100, 4.00), where the line becomes nearly stable.

Iterative curves during training phase. Source(s): Created by authors

As for the test results based on sixfold cross-validation strategy, the fault diagnostic accuracies of the elevator door system using different methods are exhibited in Table 2 and Figure 7. The proposed DMSRN demonstrates a significant improvement of 58.85%, 43.06%, 18.16% and 10.60% in accuracy compared to CNN, SVM, MSRN and IDAN, respectively. For a more detailed and intuitive comparison, confusion matrices of different methods are presented in Figure 8. Compared with the proposed DMSRN, CNN, SVM, MSRN and IDAN exhibit inferior performance in identifying health states of the elevator door system. The results verify the advantage of the proposed DMSRN over other comparison methods.

Table 2

Diagnostic accuracy of different methods

Methods	1st time	2nd time	3rd time	4th time	5th time	6th time	Average	Standard deviation
SVM	29.67	30.72	28.3	27.5	30.00	27.5	28.95	1.37
CNN	44.47	44.93	46.97	44.21	43.16	44.67	44.74	1.25
MSRN	69.54	70.26	69.67	70.79	70.2	67.37	69.64	1.20
IDAN	76.72	77.76	76.13	78.37	76.42	77.78	77.20	0.89
DMSRN	88.82	85.07	88.55	87.04	88.62	88.68	87.80	1.49

Methods	1st time	2nd time	3rd time	4th time	5th time	6th time	Average	Standard deviation
SVM	29.67	30.72	28.3	27.5	30.00	27.5	28.95	1.37
CNN	44.47	44.93	46.97	44.21	43.16	44.67	44.74	1.25
MSRN	69.54	70.26	69.67	70.79	70.2	67.37	69.64	1.20
IDAN	76.72	77.76	76.13	78.37	76.42	77.78	77.20	0.89
DMSRN	88.82	85.07	88.55	87.04	88.62	88.68	87.80	1.49

Figure 7

A box plot compares test accuracy percent for S V M, C N N, M S R N, I D A N, and D M S R N methods.

View large Download slide

The image shows a box plot that compares test accuracy (percent) for different methods. The horizontal axis is labeled “Different methods”, and the vertical axis is labeled “Test accuracy (percent)”. The vertical axis ranges from 0 to 100 percent in increments of 10 percent. Five box plots appear along the horizontal axis labeled “S V M”, “C N N”, “M S R N”, “I D A N”, and “D M S R N”. The “S V M” box plot appears around 28 to 30 percent with a central mark at about 29 percent. The “C N N” box plot appears around 44 to 46 percent with a central mark at about 45 percent. The “M S R N” box plot appears around 69 to 71 percent with a central mark near about 70 percent. The “I D A N” box plot appears around 76 to 78 percent with a central mark at about 77 percent. The “D M S R N” box plot appears around 86 to 89 percent, with a central mark at about 88 percent. Each method shows a rectangular box with whisker lines extending above and below the box.

Box plot of diagnostic accuracy and standard deviation of different methods. Source(s): Created by authors

Figure 8

Five confusion matrix heatmaps compare predicted and truth labels for five different models.

View large Download slide

The image shows five confusion matrix heatmaps arranged in two rows. Each heatmap has the horizontal axis labeled “Predicted label”, with class labels 0, 1, 2, and 3 from left to right, and the vertical axis labeled “Truth label”, with class labels 0, 1, 2, and 3 from top to bottom. Each matrix contains numerical values inside shaded squares and a vertical color scale bar on the right. The first heatmap, labeled “(a)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 60 and the darkest blue as 120. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 138; Predicted label 1: 87; Predicted label 2: 98; Predicted label 3: 57. Row 2: Truth label: 1; Predicted label 0: 67; Predicted label 1: 114; Predicted label 2: 83; Predicted label 3: 116. Row 3: Truth label: 2; Predicted label 0: 126; Predicted label 1: 81; Predicted label 2: 93; Predicted label 3: 80. Row 4: Truth label: 3; Predicted label 0: 71; Predicted label 1: 116; Predicted label 2: 87; Predicted label 3: 106. The second heatmap, labeled “(b)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 50 and the darkest blue as 200. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 203; Predicted label 1: 26; Predicted label 2: 138; Predicted label 3: 13. Row 2: Truth label: 1; Predicted label 0: 24; Predicted label 1: 163; Predicted label 2: 48; Predicted label 3: 145. Row 3: Truth label: 2; Predicted label 0: 136; Predicted label 1: 46; Predicted label 2: 146; Predicted label 3: 52. Row 4: Truth label: 3; Predicted label 0: 18; Predicted label 1: 147; Predicted label 2: 48; Predicted label 3: 167. The third heatmap, labeled “(c)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 0 and the darkest blue as 300. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 367; Predicted label 1: 0; Predicted label 2: 13; Predicted label 3: 0. Row 2: Truth label: 1; Predicted label 0: 9; Predicted label 1: 179; Predicted label 2: 6; Predicted label 3: 186. Row 3: Truth label: 2; Predicted label 0: 137; Predicted label 1: 3; Predicted label 2: 233; Predicted label 3: 7. Row 4: Truth label: 3; Predicted label 0: 4; Predicted label 1: 124; Predicted label 2: 7; Predicted label 3: 245. The fourth heatmap, labeled “(d)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 0 and the darkest blue as 300. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 354; Predicted label 1: 0; Predicted label 2: 26; Predicted label 3: 0. Row 2: Truth label: 1; Predicted label 0: 4; Predicted label 1: 277; Predicted label 2: 5; Predicted label 3: 94. Row 3: Truth label: 2; Predicted label 0: 49; Predicted label 1: 4; Predicted label 2: 321; Predicted label 3: 6. Row 4: Truth label: 3; Predicted label 0: 2; Predicted label 1: 144; Predicted label 2: 14; Predicted label 3: 220. The fifth heatmap, labeled “(e)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 0 and the darkest blue as 300. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 369; Predicted label 1: 0; Predicted label 2: 11; Predicted label 3: 0. Row 2: Truth label: 1; Predicted label 0: 0; Predicted label 1: 293; Predicted label 2: 1; Predicted label 3: 86. Row 3: Truth label: 2; Predicted label 0: 12; Predicted label 1: 1; Predicted label 2: 366; Predicted label 3: 1. Row 4: Truth label: 3; Predicted label 0: 0; Predicted label 1: 59; Predicted label 2: 1; Predicted label 3: 320.

Confusion matrices of different methods: (a) SVM, (b) CNN, (c) MSRN, (d) IDAN and (e) DMSRN. Source(s): Created by authors

Figure 8

View large Download slide

The image shows five confusion matrix heatmaps arranged in two rows. Each heatmap has the horizontal axis labeled “Predicted label”, with class labels 0, 1, 2, and 3 from left to right, and the vertical axis labeled “Truth label”, with class labels 0, 1, 2, and 3 from top to bottom. Each matrix contains numerical values inside shaded squares and a vertical color scale bar on the right. The first heatmap, labeled “(a)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 60 and the darkest blue as 120. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 138; Predicted label 1: 87; Predicted label 2: 98; Predicted label 3: 57. Row 2: Truth label: 1; Predicted label 0: 67; Predicted label 1: 114; Predicted label 2: 83; Predicted label 3: 116. Row 3: Truth label: 2; Predicted label 0: 126; Predicted label 1: 81; Predicted label 2: 93; Predicted label 3: 80. Row 4: Truth label: 3; Predicted label 0: 71; Predicted label 1: 116; Predicted label 2: 87; Predicted label 3: 106. The second heatmap, labeled “(b)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 50 and the darkest blue as 200. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 203; Predicted label 1: 26; Predicted label 2: 138; Predicted label 3: 13. Row 2: Truth label: 1; Predicted label 0: 24; Predicted label 1: 163; Predicted label 2: 48; Predicted label 3: 145. Row 3: Truth label: 2; Predicted label 0: 136; Predicted label 1: 46; Predicted label 2: 146; Predicted label 3: 52. Row 4: Truth label: 3; Predicted label 0: 18; Predicted label 1: 147; Predicted label 2: 48; Predicted label 3: 167. The third heatmap, labeled “(c)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 0 and the darkest blue as 300. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 367; Predicted label 1: 0; Predicted label 2: 13; Predicted label 3: 0. Row 2: Truth label: 1; Predicted label 0: 9; Predicted label 1: 179; Predicted label 2: 6; Predicted label 3: 186. Row 3: Truth label: 2; Predicted label 0: 137; Predicted label 1: 3; Predicted label 2: 233; Predicted label 3: 7. Row 4: Truth label: 3; Predicted label 0: 4; Predicted label 1: 124; Predicted label 2: 7; Predicted label 3: 245. The fourth heatmap, labeled “(d)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 0 and the darkest blue as 300. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 354; Predicted label 1: 0; Predicted label 2: 26; Predicted label 3: 0. Row 2: Truth label: 1; Predicted label 0: 4; Predicted label 1: 277; Predicted label 2: 5; Predicted label 3: 94. Row 3: Truth label: 2; Predicted label 0: 49; Predicted label 1: 4; Predicted label 2: 321; Predicted label 3: 6. Row 4: Truth label: 3; Predicted label 0: 2; Predicted label 1: 144; Predicted label 2: 14; Predicted label 3: 220. The fifth heatmap, labeled “(e)”, shows the following matrix data. The vertical scale bar shows the lightest blue as 0 and the darkest blue as 300. The table contains four rows and four columns. The row-wise entries in the table are as follows: Row 1: Truth label: 0; Predicted label 0: 369; Predicted label 1: 0; Predicted label 2: 11; Predicted label 3: 0. Row 2: Truth label: 1; Predicted label 0: 0; Predicted label 1: 293; Predicted label 2: 1; Predicted label 3: 86. Row 3: Truth label: 2; Predicted label 0: 12; Predicted label 1: 1; Predicted label 2: 366; Predicted label 3: 1. Row 4: Truth label: 3; Predicted label 0: 0; Predicted label 1: 59; Predicted label 2: 1; Predicted label 3: 320.

Confusion matrices of different methods: (a) SVM, (b) CNN, (c) MSRN, (d) IDAN and (e) DMSRN. Source(s): Created by authors

To visually assess the robustness of the extracted features, t-distribution stochastic neighbor embedding is applied to project the high-dimensional features of the elevator door system dataset into a two-dimensional space. As shown in Figure 9, the features learned by CNN, SVM, IDAN and MSRN exhibit less distinct clustering, indicating limited discriminative power. In contrast, those extracted by the proposed DMSRN show well-separated clusters corresponding to different health states, demonstrating qualitatively stronger feature discrimination. Based on the abovementioned analysis, the experimental results attest to the superior diagnostic capability of DMSRN over all comparative methods in intelligent fault diagnosis for elevator door systems.

Figure 9

A multi-panel scatter plot compares feature visualization results for S V M, C N N, M S R N, I D A N, and D M S R N.

View large Download slide

The image shows five scatter plot panels arranged in two rows inside a dashed rectangular boundary. The panels are labeled “(a)”, “(b)”, and “(c)” in the top row and “(d)” and “(e)” in the bottom row. A legend on the right shows four classes represented by different markers: the orange circle labeled “0”, the pink star labeled “1”, the green triangle labeled “2”, and the blue square labeled “3”. In panel “(a)”, the symbols form a flower-shaped cluster with curved loops, and the square markers appear prominently across the structure, which correspond to the blue square markers. In the center of the shape, the blue square markers are also prominent and densely present. In panel “(b)”, a large oval-shaped cluster is shown. The left side of the oval is mainly covered by green triangular markers and orange circular markers, while the right side of the oval is mainly occupied by blue square markers and pink star markers. In panel “(c)”, a large cluster appears on the left side, formed mainly by blue square markers and pink star markers. In the center, a green cluster formed by triangular markers is visible, and on the right side, a separate cluster formed by orange circular markers appears. In panel “(d)”, two large clusters appear, one on the left and one on the right, and both clusters contain markers of all four types, which makes the clusters appear multicolored. In panel “(e)”, an inverted C-shaped curve appears on the bottom left side. The upper tip of this curve contains mainly pink star markers, and as the curve moves toward the right side, it gradually changes to blue square markers. Another curved line appears on the right side, which contains green triangular markers toward the upper portion and orange circular markers toward the lower portion.

Feature visualization of different methods: (a) SVM, (b) CNN, (c) MSRN, (d) IDAN and (e) DMSRN. Source(s): Created by authors

4.4 Ablation experiments on multi-scale branches

To evaluate the effectiveness of the proposed multi-scale residual architecture in DMSRN, two sets of ablation experiments were conducted. As illustrated in Table 3, the diagnosis accuracies of multi-scale branches significantly outperform any single-scale branch. Moreover, the influence of the number of residual blocks and channel multiplier, i.e. depth and width, on diagnostic accuracy is investigated and the results are summarized in Table 4. Consider minimizing network complexity as much as possible, (Depth, width)=(3,3), which brings the best result is set as the optimal network settings.

Table 3

Diagnostic accuracy of different branches

Methods	Branch of kernel size 3	Branch of kernel size 5	Branch of kernel size 7	Multi-scale branches
Accuracy	83.75	83.22	79.61	87.76

Table 4

Diagnostic accuracy of different network settings

(Depth, width)	(1,1)	(1,2)	(1,3)	(2,1)	(2,2)	(2,3)	(3,1)	(3,2)	(3,3)
Accuracy	43.20	45.78	56.80	54.30	68.41	70.64	69.60	77.21	87.71

4.5 Discussion on computational burden

To assess the computational complexity of the proposed DMSRN, Table 5 summarizes the floating-point operations (FLOPs), number of parameters and storage demands. While the multi-scale branches and metric learning modules introduce additional computational cost, this is offset by the enhanced feature extraction capacity of DMSRN. Moreover, as model training is performed offline, the overall computational burden remains acceptable. During testing, DMSRN achieves inference speeds comparable to those of state-of-the-art methods, preserving time efficiency. These findings underscore the model's potential for real-time fault diagnosis in practical engineering scenarios.

Table 5

The computational burden of different methods

Evaluation criterion	SVM	CNN	MSRN	IDAN	DMSRN
FLOPs/(M)	0.02	2.29	5.97	10.2	7.89
Parameters Number/(M)	0.01	0.13	0.37	0.51	0.42
Storage Occupancy/(MB)	0.01	0.67	1.98	5.11	3.20
Training Time/(s)	1.5	66	99.5	123	102.3
Test Time/(s)	1.0	1.3	1.4	1.5	1.3

Evaluation criterion	SVM	CNN	MSRN	IDAN	DMSRN
FLOPs/(M)	0.02	2.29	5.97	10.2	7.89
Parameters Number/(M)	0.01	0.13	0.37	0.51	0.42
Storage Occupancy/(MB)	0.01	0.67	1.98	5.11	3.20
Training Time/(s)	1.5	66	99.5	123	102.3
Test Time/(s)	1.0	1.3	1.4	1.5	1.3

4.6 Sensitivity analysis for weighting coefficient

To conduct sensitivity analysis, a grid search was implemented on weighting coefficient $λ$ for center loss, which is drawn from the range of {0.01, 0.1, 0.5,1}. As reported in Table 6, diagnostic accuracy is positively correlated with values of the weighting coefficient because $λ$ determines the contribution of center loss in feature learning. The value of $λ = 1$ bringing about the highest accuracy is recommended in this study.

Table 6

Diagnostic accuracy of different values for weighting coefficient

Value	0.01	0.1	0.5	1
Accuracy	70.02	72.39	82.65	87.14

4.7 Analysis of diagnostic performance with different noise in test conditions

In real-world scenarios, the test data may be affected by noise. To investigate the diagnostic performance of already-trained DMSRN as the additional stochastic noises in the test samples vary, we utilized the models to diagnose samples with different signal-to-noise ratios (SNRs) in the task. The method of adding noise to test samples according to SNRs is shown in Eq. (10). The training samples were not added with noise. Table 7 lists the accuracies with different test SNRs by different methods. A smaller SNR indicates stronger noise is applied, which mostly results in lower classification accuracies. It can be observed that although noise significantly destroys the performance of models, the general diagnostic performance of the proposed DMSRN is superior to comparison methods.

Table 7

Accuracy (%) with different testing SNRs by different methods

SNR/(dB)	20	10	0	−10
SVM	30.07	29.87	26.91	26.25
CNN	42.96	37.04	25.99	25.13
MSRN	65.39	44.74	27.30	25.66
IDAN	69.11	48.53	32.64	26.87
DMSRN	78.29	61.64	50.20	27.79

\tilde{x} = x + t o r c h . r a n d n_l i k e (x) \cdot \sqrt{\frac{m e a n (x^{2})}{10^{S N R / 10}}}

(10)

where $\tilde{x}$ and $x$ represent the noisy and original signals, respectively; $S N R$ represents signal to noise ratio; $t o r c h . r a n d n_l i k e$ is the function for generating Gaussian white noise in Pytorch; $m e a n (\cdot)$ represents the operation of calculating the mean.

5. Conclusion

The intelligent fault diagnosis of elevator door systems is critical to operational safety, yet remains challenging due to the difficulty in extracting discriminative state features via deep learning. To overcome this limitation, a DMSRN method is proposed for enhanced intelligent fault diagnosis of elevator door systems. The model leverages a one-dimensional multi-scale architecture to extract representative features, combined with a metric learning-based feature discriminative enhancement strategy that enhances feature discriminability by promoting intra-class compactness and inter-class separation. Experimental validation shows that DMSRN achieves a significant performance advantage, improving average diagnostic accuracy by over 18.16% compared to the best baseline method, thereby substantially increasing the reliability of fault identification in elevator door systems.

6. Future work

Future work will focus on the following aspects: (1) optimizing the model architecture to further improve fault recognition accuracy and inference efficiency; (2) enhancing the model's generalization capability to enable reliable diagnosis under diverse operating conditions and environments and (3) investigating the relationship between the physical mechanisms underlying typical elevator door faults and the representations learned by the artificial intelligence model.

References

Bai

,

D.

,

An

,

Z.

,

Wang

,

N.

,

Liu

,

S.

and

Yu

,

X.

(

2021

), “

The prediction of the elevator fault based on improved PSO-BP algorithm

”,

Journal of Physics: Conference Series

, Vol.

1906

No.

1

, 012017, doi:

https://doi.org/10.1088/1742-6596/1906/1/012017

.

Google Scholar

Crossref

Chan

,

P.

,

Li

,

S.

,

Deng

,

J.

,

Yeung

,

S.

and

Daniel

(

2023

), “

Multi-proxy based deep metric learning

”,

Information Sciences

, Vol.

643

, 119120, doi:

https://doi.org/10.1016/j.ins.2023.119120

.

Google Scholar

Crossref

Chen

,

L.

,

Lan

,

S.

and

Jiang

,

S.

(

2019

), “

Elevators Fault diagnosis based on artificial intelligence

”,

Journal of Physics: Conference Series

, Vol.

1345

, 042024, doi:

https://doi.org/10.1088/1742-6596/1345/4/042024

.

Google Scholar

Crossref

Chen

,

C.

,

Ren

,

X.

and

Cheng

,

G.

(

2025

), “

Research on distributed fault diagnosis model of elevator based on PCA-LSTM

”,

Algorithms

, Vol.

17

No.

6

, 250, doi:

https://doi.org/10.3390/a17060250

.

Google Scholar

Crossref

Dong

,

R.

and

Lam

,

K.

(

2024

), “

Bi-center loss for compound facial expression recognition

”,

IEEE Signal Processing Letters

, Vol.

31

, pp.

641

-

645

, doi:

https://doi.org/10.1109/LSP.2024.3364055

.

Google Scholar

Crossref

Hu

,

S.

,

Chu

,

Y.

,

Tang

,

L.

,

Zhou

,

G.

,

Chen

,

A.

and

Sun

,

Y.

(

2023

), “

A lightweight multi-sensory field-based dual-feature fusion residual network for bird song recognition

”,

Applied Soft Computing

, Vol.

146

, 110678, doi:

https://doi.org/10.1016/j.asoc.2023.110678

.

Google Scholar

Crossref

Huang

,

K.

,

Ren

,

Z.

,

Zhu

,

L.

,

Lin

,

T.

,

Zhu

,

Y.

,

Zeng

,

L.

and

Wan

,

J.

(

2025

), “

Intra-domain self generalization network for intelligent fault diagnosis of bearings under unseen working conditions

”,

Advanced Engineering Informatics

, Vol.

64

, 102997, doi:

https://doi.org/10.1016/j.aei.2024.102997

.

Google Scholar

Crossref

Kim

,

M.

,

Son

,

S.

and

Oh

,

K.-Y.

(

2023

), “

Margin-maximized hyperspace for fault detection and prediction: a case study with an elevator door

”,

IEEE Access

, Vol.

11

, pp.

128580

-

128595

, doi:

https://doi.org/10.1109/ACCESS.2023.3330137

.

Google Scholar

Crossref

Li

,

X.

,

Yang

,

X.

,

Ma

,

Z.

and

Xue

,

J.

(

2023

), “

Deep metric learning for few-shot image classification: a review of recent developments

”,

Pattern Recognition

, Vol.

138

, 109381, doi:

https://doi.org/10.1016/j.patcog.2023.109381

.

Google Scholar

Crossref

Liu

,

K.

,

Yang

,

M.

,

Yu

,

Z.

,

Wang

,

G.

and

Wu

,

W.

(

2022

), “

FBMSNet: a filter-bank multi-scale convolutional neural network for EEG-based motor imagery decoding

”,

IEEE Transactions on Biomedical Engineering

, Vol.

70

No.

2

, pp.

436

-

445

, doi:

https://doi.org/10.1109/TBME.2022.3193277

.

Google Scholar

Crossref

Lu

,

J.

,

Chen

,

H.

,

Chen

,

J.

,

Xiao

,

Z.

,

Li

,

R.

,

X

,

G.

and

Wang

,

Q.

, (

2025a

), “

Temporal knowledge graph fusion with neural ordinary differential equations for the predictive maintenance of electromechanical equipment

”,

Knowledge-Based Systems

, Vol.

317

, 113450, doi:

https://doi.org/10.1016/j.knosys.2025.113450

.

Google Scholar

Crossref

Lu

,

J.

,

Zhang

,

W.

,

Lu

,

C.

,

Xiao

,

G.

and

Wang

,

Q.

(

2025b

), “

A multi-scale convolution capsule network with data augmentation and attention mechanisms for elevator fault diagnosis

”,

ISA Transactions

, Vol.

167

, pp.

1873

-

1887

, doi:

https://doi.org/10.1016/j.isatra.2025.09.041

.

Google Scholar

Crossref

Ni

,

Q.

,

Ji

,

J.

,

Halkon

,

B.

,

Feng

,

K.

and

Nandi

,

A.K.

(

2023

), “

Physics-Informed Residual Network (PIResNet) for rolling element bearing fault diagnostics

”,

Mechanical Systems and Signal Processing

, Vol.

200

, 110544, doi:

https://doi.org/10.1016/j.ymssp.2023.110544

.

Google Scholar

Crossref

Pan

,

J.

,

Shao

,

C.

,

Dai

,

Y.

,

Wei

,

Y.

,

Chen

,

W.

and

Lin

,

Z.

(

2024

), “

Research on fault prediction method of elevator door system based on transfer learning

”,

Sensors

, Vol.

24

No.

7

, 2135, doi:

https://doi.org/10.3390/s24072135

.

Google Scholar

Crossref

PubMed

Pan

,

S.

,

Nie

,

X.

,

Zhai

,

X.

,

He

,

C.

and

Ding

,

Z.

(

2025

), “

Classification of power quality disturbances using residual networks with channel attention mechanism

”,

Engineering Applications of Artificial Intelligence

, Vol.

151

, 110641, doi:

https://doi.org/10.1016/j.engappai.2025.110641

.

Google Scholar

Crossref

Ren

,

Y.

,

Li

,

R.

,

Ru

,

X.

and

Niu

,

Y.

(

2024

), “

Suppression of horizontal vibrations in high-speed elevators using active shock absorber to assist traditional damping systems

”,

Journal of Intelligent Manufacturing and Special Equipment

, Vol.

5

No.

1

, pp.

170

-

189

, doi:

https://doi.org/10.1108/JIMSE-09-2023-0006

.

Google Scholar

Crossref

Wan

,

A.

,

Tong

,

X.

,

AL-Bukhaiti

,

K.

,

Zhou

,

Z.

,

Su

,

Y.

and

Cheng

,

X.

(

2025

), “

Intelligent fault diagnosis for elevator door systems using variational mode decomposition and multi-scale convolutional networks

”,

Journal of the Brazilian Society of Mechanical Sciences and Engineering

, Vol.

47

No.

10

, 508, doi:

https://doi.org/10.1007/s40430-025-05816-2

.

Google Scholar

Crossref

Wang

,

Q.

,

Leng

,

Y.

,

Li

,

D.

,

Zhang

,

X.

,

Li

,

R.

,

Zhu

,

H.

and

Zhang

,

H.

(

2018

), “

MCU system-based intelligent high-speed elevator door operator fault analysis and research

”,

IOP Conference Series: Materials Science and Engineering

, Vol.

428

No.

1

, 012028, doi:

https://doi.org/10.1088/1757-899X/428/1/012028

.

Google Scholar

Wang

,

H.

,

Liu

,

C.

,

Jiang

,

D.

and

Jiang

,

Z.

(

2021

), “

Collaborative deep learning framework for fault diagnosis in distributed complex systems

”,

Mechanical Systems and Signal Processing

, Vol.

156

, 107650, doi:

https://doi.org/10.1016/j.ymssp.2021.107650

.

Google Scholar

Crossref

Wu

,

C.

,

Cheng

,

X.

and

Wang

,

H.

(

2025

), “

A contrastive clustering loss function increases class-balanced in time series classification

”,

Expert Systems with Applications

, Vol.

283

, 127493, doi:

https://doi.org/10.1016/j.eswa.2025.127493

.

Google Scholar

Crossref

Yin

,

Y.

,

Han

,

Z.

,

Jian

,

M.

,

Wang

,

G.

,

Chen

,

L.

and

Wang

,

R.

(

2023

), “

AMSUnet: a neural network using atrous multi-scale convolution for medical image segmentation

”,

Computers in Biology and Medicine

, Vol.

162

, 107120, doi:

https://doi.org/10.1016/j.compbiomed.2023.107120

.

Google Scholar

Crossref

PubMed

Zhou

,

H.

,

Qin

,

Q.

,

Hou

,

J.

,

Dai

,

J.

,

Huang

,

L.

and

Zhang

,

W.

(

2024

), “

Deep global semantic structure-preserving hashing via corrective triplet loss for remote sensing image retrieval

”,

Expert Systems with Applications

, Vol.

238

, 122105, doi:

https://doi.org/10.1016/j.eswa.2023.122105

.

Google Scholar

Crossref

2026

Jiefeng Li, He Ren, Qiuyu Song, Liang Ye and Gongning Li

Published in Journal of Intelligent Manufacturing and Special Equipment. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence.

Intelligent fault diagnosis for elevator door system based on discriminative multi-scale residual network

1. Introduction

2. Related work

2.1 Residual networks

2.2 Multi-scale convolutional architectures

2.3 Metric-learning

3. Methodology

3.1 Problem definition

3.2 Overview of the proposed DMSRN method

3.3 One-dimensional MSRN

3.4 Feature discriminative enhancement strategy

3.5 Objective function optimization

3.6 Diagnosis process using proposed DMSRN method

4. Experimental verification

4.1 Datasets description

4.2 Implementation details

4.3 Results and analysis

4.4 Ablation experiments on multi-scale branches

4.5 Discussion on computational burden

4.6 Sensitivity analysis for weighting coefficient

4.7 Analysis of diagnostic performance with different noise in test conditions

5. Conclusion

6. Future work

References

New and popular articles

Email Alerts

Cited By

Intelligent fault diagnosis for elevator door system based on discriminative multi-scale residual network

1. Introduction

2. Related work

2.1 Residual networks

2.2 Multi-scale convolutional architectures

2.3 Metric-learning

3. Methodology

3.1 Problem definition

3.2 Overview of the proposed DMSRN method

3.3 One-dimensional MSRN

3.4 Feature discriminative enhancement strategy

3.5 Objective function optimization

3.6 Diagnosis process using proposed DMSRN method

4. Experimental verification

4.1 Datasets description

4.2 Implementation details

4.3 Results and analysis

4.4 Ablation experiments on multi-scale branches

4.5 Discussion on computational burden

4.6 Sensitivity analysis for weighting coefficient

4.7 Analysis of diagnostic performance with different noise in test conditions

5. Conclusion

6. Future work

References

New and popular articles

Email Alerts

Suggested Reading

Recommended for you

Cited By

Sharing Unavailable