Skip to Main Content
Purpose

This paper aims to study the reliability of the high-speed train operation control system in the Chinese Train Control System Level 3 (CTCS-3) operating mode.

Design/methodology/approach

Dynamic fault tree and Bayesian network method are adopted to analyze the reliability and weakness of the CTCS-3 system.

Findings

First, a physical architecture and data flow diagram of the CTCS-3 system are established according to the typical structure and functions of the CTCS-3 system. Second, the dynamic fault tree of the CTCS-3 system is constructed. Considering the prior probability of the bottom event and the existence of dynamic redundancy, the dynamic fault tree is transformed into a Bayesian net. The reliability of the CTCS-3 system is carried out based on the prior probability and the weakness that affects the reliability of the system based on the posterior probability is also analyzed by the Bayesian network. Finally, it is disclosed that the impact of the on-board subsystem on the reliability of the CTCS-3 system is generally greater than that of the ground subsystem. The two weakest modules in the onboard subsystem are the driver-machine interface (DMI) and balise transmission module (BTM) and the weakest one in the ground subsystem is Balise. The analysis results are generally consistent with the malfunctions in the field operation of China’s high-speed railway.

Originality/value

(1) By reasoning, the reliability of the train operation control system in the CTCS-3 operating mode meets the standard requirements.

(2) Through backward reasoning, it is found that the failure of the onboard subsystem leads to a greater probability of failure of the train control system.

(3) The DMI, BTM and automatic train protection computer unit modules are weak components in the onboard subsystem. Vital digit input&output, train interface unit and train security gateway are rarely involved in previous research, the result in this paper shows that these three modules are also weak components in the subsystem, which requires attention.

CTCS-3 is the Chinese Train Control System Level 3 with operating speeds over 300 km/h, including the onboard subsystem and the ground subsystem. The onboard subsystem includes a CTCS-3 level control unit and a CTCS-2 level control unit. When the train is operating in CTCS-3 level, the vehicle equipment mainly uses the global system for mobile Communication-R (GSM-R) network to transmit monitoring information from the RBC (radio block center). Only when the RBC or GSM-R networks fail, the onboard equipment is downgraded to use the information provided by the ground equipment of CTCS-2 level to monitor train operation. In this paper, the study is on the premise that high-speed trains are all operated in CTCS-3 mode. Therefore, it is necessary to analyze the reliability of the train control system of high-speed trains in this mode.

To analyze the reliability of the train control system, we first need to understand the working principle of the train control system. Therefore, we used the fault tree, which is a good way to express the fault mode clearly. Fault tree analysis (FTA) is a logical and graphical method for assessing the likelihood of a combination of fault events causing an accident, which has been widely used to analyze the reliability of complex systems. However, typical FTA usually assumes that faults are independent of each other without considering the dynamic logic of the system. Therefore, for a high-speed train control system with redundant components and complicated construction, the traditional FTA is unable to meet the requirements for reliability analysis. In recent years, dynamic fault tree (DFT) has been widely used in industries, in which new dynamic logic gates are added, such as priority-AND gate, sequence enforcing (SEQ) gate, SPARE gate. DFT overcomes the disadvantage of fault tree (FT), such as its hypothetical event must be independent. There are many research studies about DFT analysis and various technologies have been developed, which can be summarized as three types. The first is Markov chain-based method, which has been proved to be the valid tool for analyzing exponential time-to-failure and unrepaired systems. However, the total of states in Markov chain will increase dramatically and state space explosion will appear with the increasing number of system components (Portinale and Bobbio, 2013). Second, with the development of computer technology, Monte Carlo simulation has been used widely to analyze DFT, which adapts to any contribution time-to-failure. However, the evaluation accuracy is determined by the number of simulations When the system is complex and the fault tree is complex, the simulation will consume a lot of time and computing resources (Yevkin, 2016). The third method is converting DFT into an equivalent Bayesian net (BN), it expresses the dependency between nodes forward and backward reasoning mechanism. Qin et al. (2016) proposed several methods to analyze the reliability of the train system, and the reliability network model is one of them. Based on the functional relationship network model of the high-speed train system, the importance of the components in the reliability network and the connectivity of the components in the network are analyzed. For BN, it not only avoids space combination explosion, but the importance of top node and media nodes also can be obtained through its bidirectional reasoning. Przytula and Thompson (2000) comprehensively introduced the process of Bayesian network construction, and successfully applied the model to the system diagnosis of diesel locomotives, satellite communication systems and satellite testing equipment, using the bidirectional reasoning function of BN. Khakzad et al. (2011) compared the similarities and differences between fault trees and BN. Because BN has a mechanism of both forward reasoning and reverse reasoning, its application is in a wider range. Especially when considering multi-modal faults and common cause failures, BN is more flexible (Khakzad et al., 2011). Su and Che (2013) used FT and BN to analyze the reliability of the CTCS-3 train control system, but ignored several important components and paid no attention to the redundant structure of the components. Flammini et al. (2006) first analyzed the reliability of Lindside, onboard and trackside subsystem by using FT and then used BN to analyze the reliability of the entire European train control system train control system. The conclusion was that although each part met the reliability requirements, some components might not need such high reliability to save expenses. Pai and Joanne (2001) used BN to analyze the reliability of the network, and through the reverse reasoning, the impact of software’s framework and reliability on the network could be obtained. Kabir et al. (2014) converted the fault tree of the ship’s fuel distribution system into BN, and analyzed the reliability of the system in three states, assuming that the components can exist in three states.

In this paper, BN is combined by DFT. Logical relationship between system and components are put into DFT, and BN is used to assess the reliability and find out composite modes affecting train control system failure, which can help to improve the reliability of train control system under certain circumstances.

The remainder of the paper is organized as follows: after a brief overview of the CTCS-3 train control system, two models are developed in Section 2. Section 3 describes how the reliability assessment is developed, including building DFT and BN. Section 4 presents the conclusion and hints for future work.

When the train operates in CTCS-3 mode, GSM-R mainly realize information transmission between onboard and ground system, real-time monitoring train running speed, running interval and overspeed protection and monitoring the safe operation of the train with the target distance continuous speed control mode and brake override system of equipment. As shown in Figure 1, the content in the red box is the scope of this paper. The modules of the vehicle system include vital computer (VC), train interface unit (TIU), driver-machine interface (DMI), juridical recorder unit, speed and distance unit (SDU), safe transmission unit (STU-V), balise transmission module (BTM), compact antenna unit (CAU), GSM-R, radar, speed sensor (Ss). What is more, according to the results of fault data statistics, there are three important nodes involving vital digit input&output (VDX), train security gateway (TSG), TIU. Ground system includes Balise, Lineside Electronic Unit (LEU), train control center (TCC), RBC, temporary speed restriction system (TSRS). The hardware of CTCS-3 300H train control equipment adopts a distributed structure design, and the function of each module is relatively independent. To improve the reliability and security of the train control system, the system adopts a redundant configuration. Speaking of the vehicle subsystem, speed distance process (SDP), SDU, VDX, GSM-R, STU-V, Radar and Ss are in hot standby redundancy. The system can still operate normally if any component fails. VC, BTM, CAU, DMI and TIU are in cold standby redundancy (Di et al., 2010). If any component fails, it will take some time to restart the standby module. TSG is a single system. For the ground subsystem, including Balise, LEU, TCC, RBC and TSRS. TCC is in cold standby redundancy, TSRS and LEU is in hot standby redundancy; Balise and RBC are treated as a single system when building the model.

Figure 1.

Physical architecture of CTCS-3

Figure 1.

Physical architecture of CTCS-3

Close modal

2.1.1 Vehicle equipment

  • VC: Is the kernel of the CTCS-3 onboard system. When the train operates at the CTCS-3 level, it accepts the route description and movement authority (MA) transmitted by the RBC, and calculates the mode control curve in combination with the train position determined by the ground Balise. The actual speed and position of the train are monitored according to the mode curve. In this way, interventions are carried out when the train is over speeding.

  • SDP: Processing speed and distance data.

  • SDU: Including SDU1 and SDU2, each of which is connected to an axle Ss and a Doppler radar as power. When the train is running, the SDU receives the pulse signal collected by the Ss and the radar and converts the pulse signal into digital data and sends it to the SDP for processing through the multifunction vehicle bus.

  • VDX: Is a fail-safe unit, including VDX1 and VDX2, used for outputting emergency braking and collecting brake feedback. VDX1 and VDX2 work in the form of guard collection. Only when the output and recovery are correct, can the VDX work normally. Otherwise, the system will output the emergency brake unconditionally.

  • GSM-R: Provides transmission channel between RBC and onboard system.

  • CAU: Is the antenna of BTM, which receives telegrams from Balise.

  • STU-V: Is a secure wireless transmission system unit that is responsible for encrypting and securely transmitting wireless data transmitted between onboard and ground equipment.

  • BTM: Receives the information of the balise by the CAU, and the received message is verified and decoded and sent to the VC.

  • TSG: Is mainly used to process the transmission of data of important core modules of in-vehicle equipment, and realize data exchange between modules.

  • R&Ss: Is connected to the SDU, the collected speed pulse signal is transmitted to the SDU.

  • DMI: Displays information for the driver, allows the driver to input relevant data and alarms under specific situations.

  • TIU: Connects VC and train.

2.1.2 Ground equipment

  • TCC: Realizes the track circuit coding function, and the train occupancy information is transmitted to the RBC.

  • RBC: Generates the information such as MA and line descriptions based on the information provided by other ground equipment and that interacting with onboard equipment, and transmitted to the onboard equipment of the train within its control range via the GMS-R.

  • Balise: Transfers information such as positioning, level conversion and over-phase area to the in-vehicle device. The transponder transmits the same information as the GSM-R transmission.

  • TSRS: Manages TSR and deliver temporary speed limit information to RBC and TCC, respectively.

  • LEU: Is a data acquisition and processing unit that forms a message according to the changed data when there is a data change and sends it to the responder for transmission.

The real-time data flow of the CTCS-3 mode between the modules is shown in Figure 2. The on-board system accepts track occupancy, train positioning, line information, speed limit information, etc. from the ground subsystem in real time. After VC processing, the train speed monitoring and operation mode is generated to realize the safety protection of the train. Meanwhile, the onboard subsystem sends data such as location and train operation to the RBC through GSM-R, and then sends it to TCC and CTC. The specific information transmitted by each module is shown in Figure 2.

Figure 2.

Data flow diagram of CTCS-3 train control system

Figure 2.

Data flow diagram of CTCS-3 train control system

Close modal

DFT adds the priority AND, the SEQ, the standby or spare (SPARE) and the functional dependency to the traditional FT. Based on the actual situation of the system, this paper mainly introduces two kinds of dynamic gates, namely, HotSpare and ColdSpare. SPARE gates model one or more principle components that can be substituted by more spares with the same functionality. For HotSpare, multiple same components run at the same time. One component fails and the system still runs normally. Only when all components fail, the system fails. For ColdSpare, only the master component is running, and the standby component is not running temporarily. When the master component fails, the standby component needs to be started to recover the system. According to the physical architecture and data flow diagram in Section 2, the DFT of Figure 3 is constructed, with A for the intermediate node, B for the bottom event and C for the fault phenomenon. Table 1 explains the meaning of each node.

Figure 3.

Dynamic fault tree

Figure 3.

Dynamic fault tree

Close modal
Table 1.

Meaning of each node

NodeDescriptionNodeDescriptionNodeDescription
C1Interruption with BaliseB1Master BTM failureB20Standby Ss failure
C2VC failureB2Master CAU failureB21Standby Radar failure
C3Interruption with RBCB3Standby BTM failureB22Standby SDP failure
C4DMI failureB4Standby CAU failureB23Master VDX failure
C5Speed and distance unit failureB5Master ATPCU failureB24Standby VDX failure
C6VDX failureB6Standby ATPCU failureB25TSG failure
C7TIU failureB7Master STU-V failureB26Standby TIU failure
C8TSRS failureB8Standby STU-V failureB27Master TIU failure
C9LEU failureB9Master GSM-R radio failureB28TSRS1 failure
C10TCC failure  B29TSRS2 failure
A1Master BTM unit failureB10Master GSM-R antenna failureB30TSRS3 failure
A2Standby BTM unit failureB11Standby GSM-R radio failureB31TSRS4 failure
A3STU-V failureB12Standby GSM-R antenna failureB32Master LEU failure
A4GSM-R failureB13Master DMI failureB33Standby LEU failure
A5Master GSM-R failureB14Standby DMI failureB34BRC failure
A6Standby GSM-R failureB15Master SDU failureB35Balise failure
A7Master Speed and distance unit failureB16Master Ss failureB36Master TCC failure
A8Standby Speed and distance unit failureB17Master Radar failureB37Standby TCC failure
A9Master TSRS unit failureB18Master SDP failure  
A10Master TSRS unit failureB19Standby SDU failure  

BN is an acyclic directed graph composed of nodes and arcs. Nodes represent variables, arcs represent causal relationships between nodes and a conditional probability table, which represents quantitative relationships between nodes. In this paper, the node is the failure of each module, and the conditional probability indicates the condition of the system failure. The input value of BN is the failure rate λ of each component.

BN is a mathematical model based on probabilistic reasoning with a robust foundation of probability theory[0]. The joint probability distribution describes the probability of all possible combinations of states for multiple random variables X1…Xn, the formula is Px1,x2,……,xn = Pxkx1,……,xk − 1…P(x2|x1)P(x1). Conditional probability indicates the probability of occurrence of B in the event of A, the formula is PBA = P (AB) P (A) = PBP (A|B) PA (PA > 0), PA and PB are prior probability, PBA is posterior probability. Therefore, BN has the function of both forward reasoning and backward reasoning. Forward reasoning is calculating the possibility of result given the prior probability of reasons. The backward reasoning is calculating the possibility of causes assumed top event happened. In the reliability evaluation of the CTCS-3 train control system, the forward reasoning is used to calculate the reliability of the train control system in a specific scenario; the backward reasoning is used to calculate the possibility of causes leading to system failure.

Mapping DFT to BN includes graphical mapping and numerical mapping. In the graphical mapping, the bottom event, the intermediate event and the top event correspond to the root node, the intermediate node and the leaf node of the BN, respectively. In the numerical mapping, the conditional probability table is used to represent the logical relationship between the child node and the parent node. In the reliability analysis of this paper, after the corresponding BN is constructed, the input of BN is the failure rate of each component. The conditional probability of each gate of DFT mapping to BN is different, the mapping rules are shown in Figure 4. As in the conditional probability of subgraph (a), 1 indicates a fault and 0 indicates normal. C is a hot standby structure, assuming B1 is the main component and B2 is the standby system. Only when both B1 and B2 fail, the state of C will become 1. For a cold standby structure, such as subgraph (b), assume that B1 is the main component, so B2 is not started. When B1 fails, the time of starting B2 is very short, so the time taken for the conversion is ignored and the cold standby structure is regarded as a single system and the failure rate is half of that of a single component. For and gate, such as subgraph (c), C just fails when B1 and B2 are both failed.

Figure 4.

The rule of mapping DFT to BN

Figure 4.

The rule of mapping DFT to BN

Close modal

Convert the DFT of CTCS-3 to BN according to the steps in Section 3.3, as shown in Figure 5.

Figure 5.

Bayesian net

The train control system is a repairable system, and the failure characteristics of each component satisfies the exponential distribution. The failure rates of each bottom event and intermediate event are taken as the input of BN, as shown in Table 2, data are from Su and Che (2013), Di et al. (2010). Among them, C1, C2, C4 and C7 are the parent nodes of the cold standby structure, and the input is half of the failure rate of the child nodes (Wang and Ding, 2017).

Table 2.

Failure rate of each node

NodeDescriptionFailure rate(/h)NodeDescriptionFailure rate(/h)
B7Master STU-V failure1.80 × 10–5B22Standby SDP failure3.19 × 10–5
B8Standby STU-V failure1.80 × 10–5B23Master VDX failure1.02 × 10–7
B9Master GSM-R radio failure1.20 × 10–5B24Standby VDX failure1.02 × 10–7
B10Master GSM-R antenna failure1.45 × 10–8B25TSG failure1.03 × 10–7
B11Standby GSM-R radio failure1.20 × 10–5B28-31TSRS failure3.20 × 10–6
B12Standby GSM-R antenna failure1.45 × 10–8B34BRC failure5.00 × 10–8
B15Master SDU failure2.50 × 10–9B35Balise failure2.90 × 10–6
B16Master Ss failure5.50 × 10–8B36-37TCC failure2.50 × 10–8
B17Master Radar failure1.80 × 10–8C1Interruption with Balise1.04 × 10–6
B18Master SDP failure3.19 × 10–5C2VC failure7.45 × 10–7
B19Standby SDU failure2.50 × 10–9C4DMI failure2.50 × 10–6
B20Standby Ss failure5.50 × 10–8C7TIU failure1.05 × 10–7
B21Standby Radar failure1.80 × 10–8C8LEU failure2.02 × 10–6

3.4.1 Forward reasoning.

According to the principle of forward reasoning of BN, the failure rate of CTCS-3 is calculated by using BN software HUGIN.

Finally, when the train is running in CTCS-3 mode, its failure rate is λ = 0.987 × 10−6/h, MTBF = 0∞t f(t)dt = 1λΓ(2) = 1.013 × 105h. According to the standard, the average failure time interval for high-speed trains is MTBF ≥ 105h. Therefore, when the high-speed train is operated for 105h in the CTCS-3 mode, the reliability R = 105∞ftdt = e−λt = 0.906.

3.4.2 Backward reasoning.

According to the principle of backward reasoning of BN, it is assumed that when the CTCS-3 system fails, the probability of the fault caused by the ground subsystem and vehicle subsystem failure is 0.4012 and 0.5988, respectively. That is to say, the onboard subsystem is more likely to cause the train control system to fail.

For the ground subsystem, assuming the ground subsystem fails, the backward reasoning is also used to calculate the probability of failure of the components of the ground subsystem, as shown in Table 3. Balise provides a large amount of fixed and variable information to the vehicle equipment, whose failure causes the ground subsystem to have the greatest probability of failure. So, it is the weakest component of the ground subsystem.

Table 3.

Probability of each component failure when ground system fails

Ground system
BaliseLEURBCTSRSTCC
0.66140.32250.16081.31 × 10–51.99 × 10–10

It is assumed that when the onboard subsystem fails, the principle of backward reasoning is also used to calculate the probability of the failure of each component of the onboard system, probability is shown in Table 4. As can be seen from the table, the DMI in the onboard system is most likely to be faulty, BTM and automatic train protection computer unit (ATPCU) followed. Combined with Table 3, it is found that Balise, LEU and BTM have high probability of failure, so the channel for transmitting information by Balise needs to be focused. In addition, VDX, TIU and TSG are rarely mentioned in current literature, but the probability of failure in this paper is relatively large, so it must be focused.

Table 4.

Probability of each component failure when onboard system fails

ComponentProbabilityComponentProbability
DMI0.5327SDP1.996 × 10–4
BTM0.2205STU-V6.90 × 10–5
ATPCU0.1587GSM-R antenna3.99 × 10–5
VDX0.0435GSM-R radio3.07 × 10–5
TIU0.0224Ss3.40 × 10–7
TSG0.0219Radar1.20 × 10–7
CAU2.99 × 10–4SDU1.56 × 10–8

This paper studies the reliability of the train operation control system in the CTCS-3 operating mode. By constructing physical architecture and data flow diagram, the information transmission of the train in CTCS-3 operating mode and functions of the various components involved are described. Based on the above two models, the DFT is established and converted into BN according to the corresponding principle. Using the forward reasoning and backward reasoning functions of BN, the reliability and weak modules of the train control system are analyzed. The research conclusions are as follows:

  • By reasoning, the reliability of the train operation control system in the CTCS-3 operating mode meets the standard requirements.

  • Through backward reasoning, it is found that the failure of the onboard subsystem leads to a greater probability of failure of the train control system.

  • The DMI, BTM and ATPCU modules are weak components in the onboard subsystem. VDX, TIU and TSG are rarely involved in previous research, the result in this paper shows that these three modules are also weak components in the subsystem, which requires attention.

Authors would like to acknowledge the support of the research program of Comprehensive Support Technology for Railway Network Operation (2018YFB1201403), which is a subproject of Advanced Railway Transportation Special Project belonging to the 13th Five-Year National Key Research and Development Plan funded by Ministry of Science and Technology of China.

Di
,
L.Q.
,
Yuan
,
X.E.
and
Wang
,
Y.N.
(
2010
), “
Research on the evaluation method for the RAM goals of CTCS-3
”,
China Railway Science
, Vol.
31
No.
6
, pp.
92
-
97
.
Flammini
,
F.
,
Marrone
,
S.
,
Mazzocca
,
N.
and
Vittorini
,
V.
(
2006
), “
Modeling system reliability aspects of ERTMS/ETCS by fault trees and Bayesian networks
”,
European Safety and Reliability Conference
, pp.
18
-
22
.
Kabir
,
S.
,
Walker
,
M.
and
Papadopoulos
,
Y.
(
2014
),
Reliability Analysis of Dynamic Systems by Translating Temporal Fault Trees into Bayesian Networks. Model-Based Safety and Assessment
,
Springer International Publishing
.
Khakzad
,
N.
,
Khan
,
F.
and
Amyotte
,
P.
(
2011
), “
Safety analysis in process facilities: comparison of fault tree and Bayesian network approaches
”,
Reliability Engineering and System Safety
, Vol.
96
No.
8
, pp.
925
-
932
.
Pai
,
G.
and
Joanne
,
B.D.
(
2001
), “
Enhancing software reliability estimation using bayesian networks and fault trees
”,
Conference: International Symposium on Software Reliability Engineering (ISSRE)
.
Portinale
,
L.
and
Bobbio
,
A.
(
2013
), “
Bayesian networks for dependability analysis: an application to digital control reliability
”,
Computer Science
, pp.
551
-
558
.
Przytula
,
K.W.
and
Thompson
,
D.
(
2000
), “
Construction of Bayesian networks for diagnostics
”,
Proceedings of IEEE aerospace conference
, Vol.
5
, pp.
193
-
200
.
Qin
,
Y.
Lin
,
S.
,
Wantong
,
L.I.
,
Yong
,
F.U.
and
Jia
,
L.
(
2016
), “
Research on safety reliability analysis and evaluation method of high-speed train system
”,
Electric Drive for Locomotives
.
Su
,
H.
and
Che
,
Y.
(
2013
), “
Reliability assessment on CTCS-3 train control system using faults trees and Bayesian networks
”,
International Journal of Control and Automation
, Vol.
6
No.
4
, pp.
271
-
292
.
Wang
,
J.
and
Ding
,
B-Z.
(
2017
), “
The reliability prediction of the standby system composed of two component
”,
Journal of CAEIT
, Vol.
12
No.
4
, pp.
428
-
431
.
Yevkin
,
O.
(
2016
), “
An efficient approximate Markov chain method in dynamic fault tree analysis
”,
Quality and Reliability Engineering International
, Vol.
32
No.
4
, pp.
1509
-
1520
.
CTCS-3 System Requirements Specification (SRS)
(
2008
),
Ministry of Railways, Science and Technology Division
,
Beijing
, p.
30
.
Published in Smart and Resilient Transportation. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence maybe seen at http://creativecommons.org/licences/by/4.0/legalcode

or Create an Account

Close Modal
Close Modal