The cardiomegaly can be determined by the cardiothoracic ratio (CTR) which can be measured in a chest x-ray image. It is calculated based on a relationship between a size of heart and a transverse dimension of chest. The cardiomegaly is identified when the ratio is larger than a cut-off threshold. This paper aims to propose a solution to calculate the ratio for classifying the cardiomegaly in chest x-ray images.
The proposed method begins with constructing lung and heart segmentation models based on U-Net architecture using the publicly available datasets with the groundtruth of heart and lung masks. The ratio is then calculated using the sizes of segmented lung and heart areas. In addition, Progressive Growing of GANs (PGAN) is adopted here for constructing the new dataset containing chest x-ray images of three classes including male normal, female normal and cardiomegaly classes. This dataset is then used for evaluating the proposed solution. Also, the proposed solution is used to evaluate the quality of chest x-ray images generated from PGAN.
In the experiments, the trained models are applied to segment regions of heart and lung in chest x-ray images on the self-collected dataset. The calculated CTR values are compared with the values that are manually measured by human experts. The average error is 3.08%. Then, the models are also applied to segment regions of heart and lung for the CTR calculation, on the dataset computed by PGAN. Then, the cardiomegaly is determined using various attempts of different cut-off threshold values. With the standard cut-off at 0.50, the proposed method achieves 94.61% accuracy, 88.31% sensitivity and 94.20% specificity.
The proposed solution is demonstrated to be robust across unseen datasets for the segmentation, CTR calculation and cardiomegaly classification, including the dataset generated from PGAN. The cut-off value can be adjusted to be lower than 0.50 for increasing the sensitivity. For example, the sensitivity of 97.04% can be achieved at the cut-off of 0.45. However, the specificity is decreased from 94.20% to 79.78%.
1. Introduction
Cardiomegaly is a sign of medical conditions such that the heart is enlarged [1]. It could be diagnosed using chest x-ray images. It could lead to a stroke or heart attack [2]. The cardiomegaly is determined using a value of the cardiothoracic ratio (CTR) with a cut-off value of 0.5 [3]. It is calculated based on an aspect ratio between the heart’s size and the chest’s transverse dimension.
Therefore, to calculate CTR, it is necessary to segment the heart and lungs in a chest x-ray image. Based on our literature reviews, the heart segmentation model was based on convolutional neural networks. Rdzhabov and Kovalev [4] adopted the regular U-Net architecture for the segmentation. While, by Maga [5], the modification of attention U-Net was adopted, named attention BCDU-Net (Bi-directional ConvLSTM U-Net with Densely connected convolutions), where the attention gate was added. Similarly, for the lung segmentation, training U-Net on the Japanese Society of Radiological Technology (JRST) dataset was reported to be state-of-the-art for the segmentation [6–8]. In addition, generative adversarial networks (GAN) were recently applied to train for generating segmented masks of chest x-ray images [9]. The generator part was trained to compute a segmented area of lung, where the discriminator part was used to distinguish the segmented lung and the corresponding groundtruth. However, the performance could not outperform the U-Net based solutions.
Then, the next step is to perform the CTR calculation for identifying the cardiomegaly. Most existing works calculated CTR values based on segmented heart and lung regions using the segmentation models [10–14]. Interestingly, a study has attempted four different customized-encoders including: Visual Geometry Group (VGG)-11 U-Net, VGG-16 U-Net, SegNet and AlbuNet [10]. AlbuNet was seen to provide the most consistent and smooth segmentation results. The postprocessing step was applied to the segmented results before calculating CTR [11], based on the dilation and erosion processes. Then, the connected components of prediction masks were set to be the segmented lungs and heart.
Differently, by Singh et al. [15], cardiomegaly was identified using critical points, which were obtained using the Convolutional Neural Network (CNN)-based regressor. The deep reinforcement learning was then applied to the points to generate the regression points for the CTR measurement.
Recently, Ajmera et al. also deployed U-Net for segmenting areas of lungs and heart [16]. The CTR was calculated using widths of chest and heart. The CTR cut-off of 0.55 was used to determine the cardiomegaly. In addition, Lee et al. used U-Net for lung and heart segmentation [17]. The segmentation performance was varied depending on thoracic conditions. The automatic CTR calculation was then performed based on the segmentation results, which was used as an index of cardiac enlargement. The cardiomegaly was cut-off at the CTR of 0.50, for a regular diagnostic practice.
In this paper, the developed solution relies on the U-Net based model for heart and lung segmentation. The trained models and CTR calculation method will also be validated on the cross-dataset schema. The methods are trained on publicly available datasets and tested on the self-collected dataset. Also, a large-sized dataset of 30,000 x-ray images is constructed using progressive growing of GAN (ProGAN) [18]. This step is used for validating the classification of cardiomegaly. In addition, more examples of GAN [19–21] could be choices of the step of using GAN to generate more data samples. In this paper, as an example, it is sufficient to deploy the U-net as the main segmentation technique and ProGAN as the main image synthesis technique of constructing chest x-ray images for both cases of cardiomegaly and normal.
The proposed solution is then evaluated in several perspectives including heart and lung segmentation, CTR calculation and cardiomegaly classification. The experiments are also conducted on publicly available dataset, self-collected dataset, and ProGAN-reconstructed dataset. The rest of this paper is organized as follow. The methodology is proposed in section 2. The experiments and results are explained in section 3. Then, the conclusion is drawn in section 4.
2. Proposed method
The proposed solution of CTR measurement in this paper contains three main steps: (1) lung segmentation, (2) heart segmentation and (3) ratio calculation. Then, it is evaluated based on three datasets of publicly available datasets, a self-collected dataset and created x-ray images of chest Postero-Anterior (PA) using the GAN technique. The results are explained and discussed in the next section. In addition, some related supplementary materials of additional figures are located at https://github.com/worapanda/ACI_CTR.
To calculate CTR, as shown in Figure 1, it is a ratio between the widest area of the heart (i.e. orange line) and the widest area of the thoracic cavity (i.e. blue line) in the chest PA x-ray image [22]. Therefore, estimating CTR must begin with segmenting areas of the heart and lungs. Then, lengths of the widest area of the heart and the widest area of the thoracic cavity, based on segmented areas of the heart and lungs, are measured for calculating CTR.
Two chest X-ray scans with measurements. Panel A shows a chest X-ray with an orange line representing the widest area of the heart and a blue line representing the widest area of the thoracic cavity. Panel B shows another chest X-ray with similar measurements, where the orange line indicates the widest area of the heart and the blue line indicates the widest area of the thoracic cavity.Sample visualizations of CTR measurements, where an orange line represents a widest area of heart, a blue line represents a widest area of thoracic cavity, and CTR is a ratio between a length of the orange line and a length of the blue line (Color version of the figure is available online)
Two chest X-ray scans with measurements. Panel A shows a chest X-ray with an orange line representing the widest area of the heart and a blue line representing the widest area of the thoracic cavity. Panel B shows another chest X-ray with similar measurements, where the orange line indicates the widest area of the heart and the blue line indicates the widest area of the thoracic cavity.Sample visualizations of CTR measurements, where an orange line represents a widest area of heart, a blue line represents a widest area of thoracic cavity, and CTR is a ratio between a length of the orange line and a length of the blue line (Color version of the figure is available online)
Three types of inputs are needed to train and evaluate the performance of the segmentation, including chest PA x-ray images, lung masks and heart masks. The masks are used in the training process to allocate the corresponding areas in the x-ray images for seen lung and heart characteristics. At the same time, they are used in the testing process to see how accurate the segmented results by the proposed method are. The efficiency of the segmentation will directly influence the efficiency of the CTR calculation.
The lung and heart segmentation are implemented based on U-Net architecture [23–25]. It is one popular type of convolutional neural network that was first developed for biomedical image segmentation. The architecture contains two main modules, including encoder and decoder modules. The encoder module (i.e. the left side of the U shape of U-Net) or the contraction part contains two 3 × 3 convolution layers, a rectified linear unit (ReLu) and down-sampling with a 2 × 2 max-pooling for capturing principal contexts in input images. At each down-sampling step, the number of feature spaces is double. For example, the first layer of the encoder module begins with the feature space starting from 32 filters. It is then doubled to 64, 128, 256 and 512 filters in layers 2, 3, 4 and 5, respectively.
In the proposed U-Net based-architecture, at each layer of the decoder modules, features are computed by concatenating features from three parts including (1) the encoder module of the same layer, (2) the decoder module from the below layer, (3) the encoder module of the below layer.
In addition, the decoder module (i.e. the right side of the U shape of U-Net) or the expansive part contains up-sampling of the feature space, which uses a 2 × 2 convolution and a concatenation with the corresponding cropped feature space from the contraction part. Then, it uses a 3 × 3 convolution layer, each followed by ReLu. At the final layer, it uses a 1 × 1 convolution layer to generate the final segmentation result. The sample output is shown in Figure 2.
A chest X-ray image undergoes resizing and then Unet segmentation to produce segmented images of the lungs and heart.Sample outputs of lung and heart segmentation
A chest X-ray image undergoes resizing and then Unet segmentation to produce segmented images of the lungs and heart.Sample outputs of lung and heart segmentation
The next step is to automatically allocate bounding rectangles fitting contours of segmented lungs and heart [26, 27], as shown in Figure 3. Then, the widest area of the thoracic cavity (L) is measured by subtracting the rightmost position of the right lung from the leftmost position of the left lung. While the widest area of the heart (H) is computed by measuring the size of the fitted bounding rectangle. Finally, CTR is a ratio of . If CTR is larger than a preset threshold, it would be classified as cardiomegaly. The threshold is usually set to 0.5 [28, 29].
An image displays three black shapes, each enclosed in a green bounding rectangle. The shapes are positioned in a way that suggests they are segmented contours of lungs and heart.Sample bounding rectangles fitting segmented contours of lungs and heart
An image displays three black shapes, each enclosed in a green bounding rectangle. The shapes are positioned in a way that suggests they are segmented contours of lungs and heart.Sample bounding rectangles fitting segmented contours of lungs and heart
Another main point of this paper is to research using GAN [18, 30, 31] for generating more testing data samples with predicted groundtruth labels. In another way around, this paper also presents a way to evaluate outputs from GAN based on the proposed CTR calculation method, which will be first evaluated with the datasets with the actual groundtruth labels. In this paper, the progressive growing of GAN (ProGAN) is adopted in our implementation [18]. ProGAN proposed a new training strategy for GAN to generate images of unprecedented quality in a large size. The training process began with a low resolution (e.g. 4 × 4 pixels) and then progressively grew both the generator and discriminator modules of the network. This could increase fine details and compute a large-sized (e.g. 1024 × 1024 pixels) image as an output.
3. Experiments and discussions
Our experiments are divided into three main parts, as below.
Experiment 1: This experiment has three objectives, including (1) constructing models of lung and heart segmentation, (2) evaluating performances of the segmentation and (3) evaluating performances of the CTR calculation. The experiment is performed on publicly available datasets with the groundtruth masks of lungs and hearts in chest PA x-ray images. The segmentation models developed in this experiment will be used to segment lungs and hearts in experiments 2 and 3. This is because datasets used in experiments 2 and 3 do not contain the groundtruth masks of lungs and hearts. So, the segmentation models could not be created in experiments 2 and 3.
Experiment 2: This experiment aims to evaluate the CTR calculation on a more extensive and unseen dataset, a self-collected dataset, based on the trained segmentation models from experiment 1. This schema could validate cross-datasets performances, where the models are trained on one dataset and tested on another. The self-collected dataset comes with the groundtruth labels of CTR manually measured by three human experts.
Experiment 3: This experiment is performed on the constructed chest PA x-ray images using ProGAN. ProGAN is trained to create 10,000 chest PA x-ray images in three classes: (1) male normal, (2) female normal and (3) cardiomegaly. The segmentation models trained in experiment 1 are then used to segment lungs and hearts in all images before calculating CTR accordingly. The cut-off threshold on CTR of cardiomegaly is varied to see the detailed performance.
3.1 Experiment 1
Two publicly available datasets are used in this experiment. The first dataset is used for lung segmentation [32, 33]. It contains 704 chest x-ray images with corresponding lung mask images. The resolution of each image is 512 × 512 pixels. The second dataset, Japanese Society of Radiological Technology (JSRT) [34], consists of three sets, including 247 chest x-ray images, 247 lung mask images and 247 heart mask images [35]. Each chest x-ray image has a size of 2,048 × 2,048 pixels. However, both lung-mask and heart-mask images have resolutions of 512 × 512 pixels. This second dataset is used for both lung and heart segmentation. Each dataset is split into training, validating and testing sets in a ratio of 0.6:0.2:0.2, respectively. The training and validating sets are used for constructing the segmentation models. Then, the testing set is used for evaluating the CTR calculation.
Using the first dataset alone, the training and validating results (i.e. pixel-based accuracy) of the lung segmentation are 0.97 and 0.96, respectively. The Dice scores are 0.97 and 0.96 for the training and validating sets, respectively. In addition, by combining the two datasets, the lung segmentation performance is improved. The training and validating results are both 0.98, also with the Dice scores of 0.98. In addition, using the second dataset alone, the trained model of heart segmentation is developed and achieves accuracies of 0.99 and 0.98 on the training and validating sets, respectively. While, the dice scores are 0.98 and 0.96 for the training and validating sets, respectively.
Figure 4 shows rectangle fitting examples on segmented lungs and hearts of the two cases. A widest area of thoracic cavity (L) is calculated by subtracting the rightmost position of the rectangle covering the right lung from the leftmost position of the rectangle covering the left lung. Also, a widest area of heart (H) is measured as the width of the rectangle covering the heart. The next step of the evaluation is to validate the performance of the CTR calculation. However, there is no available groundtruth of CTR from these two datasets. Therefore, the CTR values are manually measured using the lung and heart mask images to create the CTR groundtruth. The performances are measured in percentage errors, as shown in Table 1. The error of CTR calculation is only 2.94%. However, the majority of the error came from the heart segmentation results, about 2.27%. As shown in Figure 4, heart segmentation is more complicated than lung segmentation since the heart area is located behind the thoracic vertebrae and has similar characteristics to surrounding traces of disease, such as alveolar opacity, including ground glass opacity and consolidation.
A chest X-ray with segmented lungs and hearts. Panel A shows two X-ray images of the chest with green rectangles highlighting the lungs and a smaller rectangle highlighting the heart. Panel B shows two corresponding segmented images of the lungs and heart in white against a black background. Panel C shows another set of chest X-ray images with similar green rectangles highlighting the lungs and heart. Panel D shows the corresponding segmented images of the lungs and heart in white against a black background.Rectangle fitting examples on segmented lungs and hearts
A chest X-ray with segmented lungs and hearts. Panel A shows two X-ray images of the chest with green rectangles highlighting the lungs and a smaller rectangle highlighting the heart. Panel B shows two corresponding segmented images of the lungs and heart in white against a black background. Panel C shows another set of chest X-ray images with similar green rectangles highlighting the lungs and heart. Panel D shows the corresponding segmented images of the lungs and heart in white against a black background.Rectangle fitting examples on segmented lungs and hearts
3.2 Experiment 2
This experiment is performed on the self-collected dataset containing 7,523 chest x-ray images. Each image is saved in a resolution of 512 × 512 pixels. Then, three human experts were asked to manually label each image’s CTR. Since this dataset does not contain groundtruth of lung and heart mask images, the segmentation models are adopted from Experiment 1. This could be an opportunity to validate the models on cross-datasets, where the models are trained by the datasets in experiment 1 and used for segmenting unseen data of our self-collected dataset. The percentage errors of the CTR calculation compared with the manual measurements of the three experts are demonstrated in Table 2.
Performances of CTR calculation on the self-collected dataset
| Expert 1 | Expert 2 | Expert 3 | |
|---|---|---|---|
| Percentage error (%) | 3.73 | 3.29 | 3.15 |
| Expert 1 | Expert 2 | Expert 3 | |
|---|---|---|---|
| Percentage error (%) | 3.73 | 3.29 | 3.15 |
Source(s): Authors own work
The errors are reported as 3.73%, 3.29% and 3.15% when compared with the manual measurements from each expert, respectively. In addition, the calculated CTR values are compared with the averages of 3 CTR values by the three experts. For each case, the 3 CTR values are averaged before being compared with the one computed by the proposed method. The error is reported as 3.08%, slightly higher than the error reported in experiment 1, where the test and train sets are from the same dataset. These results could demonstrate the validity of the segmentation models across the datasets.
In addition, as one way of practical usages, the proposed solution of the automatic CTR calculation could be provided as an initial guideline for a diagnosis of cardiomegaly. Also, measurements of a widest area of thoracic cavity and a widest area of heart could be provided along in an interactive tool, which could be edited further by users. The roughly acceptable error, in which a corresponding case would not need any additional manual-edit step, is 1.80%. In our experiment, there is 44.27% of a total number of cases that has an error of CTR calculation below 1.80%.
3.3 Experiment 3
In this experiment, the dataset of 30,000 chest PA x-ray images is constructed using ProGAN. It contains 10,000 images in each of the three classes: male normal, female normal and cardiomegaly. It yields the Frechet inception distance (FID) [36] values of 15.75, 15.23, and 16.71 for male normal, female normal and cardiomegaly classes respectively. This dataset contains no groundtruth of lung and heart masks. However, it has labels of being cardiomegaly or noncardiomegaly classes. Sample images of the three classes generated by ProGAN are shown in https://github.com/worapanda/ACI_CTR.
In addition, these images are segmented using the trained lung and heart segmentation models computed in experiment 1. Then, CTR values are calculated accordingly. The values are compared with a threshold T. If the value is larger than T, it is classified as the cardiomegaly class. Otherwise, it is classified as the normal class. The T value varies from 0.45 to 0.55 (with a step difference of 0.01) to demonstrate the classification accuracy at the different threshold values, as shown in Table 3.
Performances of CTR calculation on the dataset constructed by ProGAN
| Threshold | Classification accuracy (%) | Sensitivity (%) | Specificity (%) | ||
|---|---|---|---|---|---|
| Male normal | Female normal | Cardiomegaly | |||
| 0.45 | 51.07 | 39.43 | 99.20 | 97.04 | 79.78 |
| 0.46 | 62.05 | 49.18 | 98.91 | 96.14 | 83.61 |
| 0.47 | 72.57 | 58.23 | 98.58 | 94.79 | 86.79 |
| 0.48 | 80.98 | 66.81 | 97.84 | 92.81 | 89.81 |
| 0.49 | 87.83 | 74.82 | 96.65 | 90.09 | 92.66 |
| 0.50 | 93.09 | 81.86 | 94.61 | 88.31 | 94.20 |
| 0.51 | 94.67 | 84.89 | 92.89 | 83.81 | 96.04 |
| 0.52 | 96.57 | 89.57 | 89.44 | 78.94 | 97.16 |
| 0.53 | 97.60 | 93.01 | 85.49 | 74.02 | 97.88 |
| 0.54 | 98.35 | 95.15 | 81.03 | 69.03 | 98.08 |
| 0.55 | 98.73 | 96.68 | 75.23 | 64.44 | 98.26 |
| Threshold | Classification accuracy (%) | Sensitivity (%) | Specificity (%) | ||
|---|---|---|---|---|---|
| Male normal | Female normal | Cardiomegaly | |||
| 0.45 | 51.07 | 39.43 | 99.20 | 97.04 | 79.78 |
| 0.46 | 62.05 | 49.18 | 98.91 | 96.14 | 83.61 |
| 0.47 | 72.57 | 58.23 | 98.58 | 94.79 | 86.79 |
| 0.48 | 80.98 | 66.81 | 97.84 | 92.81 | 89.81 |
| 0.49 | 87.83 | 74.82 | 96.65 | 90.09 | 92.66 |
| 0.50 | 93.09 | 81.86 | 94.61 | 88.31 | 94.20 |
| 0.51 | 94.67 | 84.89 | 92.89 | 83.81 | 96.04 |
| 0.52 | 96.57 | 89.57 | 89.44 | 78.94 | 97.16 |
| 0.53 | 97.60 | 93.01 | 85.49 | 74.02 | 97.88 |
| 0.54 | 98.35 | 95.15 | 81.03 | 69.03 | 98.08 |
| 0.55 | 98.73 | 96.68 | 75.23 | 64.44 | 98.26 |
Source(s): Authors own work
Table 3 shows that if we cut off the cardiomegaly at the T = 0.50, the sensitivity and specificity are reported as 88.31% and 94.20%, respectively. However, to obtain the higher sensitivity of 97.04%, the threshold T must be set smaller to be 0.45. However, the specificity is also reduced to 79.78%, accordingly.
In the performance comparison, it is considered from two perspectives. The first perspective is to compare the performance of the CTR measurement. The methods by Singh et al. [15] and Dallal et al. [12] reported errors of 5.4% and 7.9%, respectively, while the proposed method had a lower error of 2.94%. The second perspective is to demonstrate the performance of the cardiomegaly classification. The method proposed by Chamveha et al. [11] reported the performance on the dataset containing 491 x-ray images of the cardiomegaly class and 531 x-ray images of the noncardiomegaly class. They reported the accuracy, sensitivity, and specificity values as 67%, 81%, and 69%, respectively. Compared to our results, we reported the performance on the dataset of 30,000 x-ray images with accuracy, sensitivity and specificity of 94.61%, 88.31% and 94.20%, respectively (i.e. using the cut-off as 0.50).
Moreover, the method by Gupte et al. [13] reported sensitivity and specificity of 96% and 81%, respectively. While our method achieves a comparable performance of 96.14% and 83.61% (i.e. using the cut-off as 0.46) of sensitivity and specificity, respectively. However, if a higher sensitivity of 97.04% is preferred, the cut-off can be set as 0.45. On the other hand, if a higher specificity is preferred, the cut-off can be adjusted to be higher than 0.50, as shown in Table 3.
To further validate the performance of the proposed method, the 10-fold cross validation is performed, using the cut-off value of 0.50. The sensitivity and specification of each fold are reported in Table 4. The averages and standard deviations are also stated in the final row of the table. The proposed solution is shown to be stable with the standard deviations of both sensitivity and specificity less than 2%.
Ten-fold cross-validation performances of CTR calculation on the dataset constructed by ProGAN
| Fold | Sensitivity (%) | Specificity (%) |
|---|---|---|
| 1 | 89.21 | 93.56 |
| 2 | 90.21 | 91.53 |
| 3 | 88.65 | 94.12 |
| 4 | 86.21 | 95.76 |
| 5 | 87.89 | 95.13 |
| 6 | 91.12 | 90.63 |
| 7 | 88.21 | 94.47 |
| 8 | 88.31 | 94.20 |
| 9 | 89.75 | 93.98 |
| 10 | 87.21 | 95.27 |
| Average, Standard deviation | 88.68, 1.38 | 93.87, 1.54 |
| Fold | Sensitivity (%) | Specificity (%) |
|---|---|---|
| 1 | 89.21 | 93.56 |
| 2 | 90.21 | 91.53 |
| 3 | 88.65 | 94.12 |
| 4 | 86.21 | 95.76 |
| 5 | 87.89 | 95.13 |
| 6 | 91.12 | 90.63 |
| 7 | 88.21 | 94.47 |
| 8 | 88.31 | 94.20 |
| 9 | 89.75 | 93.98 |
| 10 | 87.21 | 95.27 |
| Average, Standard deviation | 88.68, 1.38 | 93.87, 1.54 |
Source(s): Authors own work
4. Conclusion
This paper developed a solution containing three steps on lung and heart segmentation, CTR calculation and cardiomegaly classification. The segmentation models were trained and validated on the publicly available datasets with the mask groundtruth, based on U-Net architecture. The models were then used for segmenting areas of the lungs and heart in chest x-ray images on unseen datasets, including the self-collected dataset and the new dataset generated by ProGAN. The segmented areas were used to compute CTR values which were then used to identify the cardiomegaly by comparing with the cut-off threshold. It reported an average error of 3.08% for the CTR calculation on the self-collected dataset. Also, it reported the accuracy of 94.61%, the sensitivity of 88.31%, and the specificity of 94.20% for cardiomegaly classification on the dataset constructed by ProGAN.
In addition, for a future work, the developed program of CTR calculation could be validated in a real-world scenario. However, an iterative program of using the CTR calculation should be built up. It would be needed when the segmented lungs and heart are not accurate. Users could edit inaccurate parts as necessary before calculating the CTR of each case. Also, regarding the generated images by ProGAN, they could be used in the training step to enhance the segmentation performance. However, the additional step of mask labeling for the segmentation groundtruth is needed.
