Introduction
Nasopharyngeal carcinoma (NPC) is the most common type of otolaryngological cancer that grows on the walls of the nasopharyngeal cavity. It has a heterogeneous distribution in different geographical regions, with the highest prevalence observed in Southeast Asia and moderate prevalence in South Asia and North Africa [1]. As the tumor grows and its grade increases to T4, it gradually spreads to the skeletal structure of the skull, even to the intracranial area [2].
The location of the tumor in the head and neck region, surrounding vital organs, and high sensitivity to radiation are reasons for choosing radiotherapy as the best treatment method [3]. The most important step in the treatment planning system (TPS), which is performed by an experienced radiation oncology specialist, is contouring of the tumor tissue (PTV) and the organs at risk (OARs) before starting the treatment, which can involve two imaging modalities: computed tomography (CT), magnetic resonance imaging (MRI), or both of them [4].
Magnetic resonance imaging can provide the best soft tissue contrast for NPC, and it is a painless and non-invasive method that does not require ionizing radiation, which makes it possible to repeat it to take different sequences (such as T1W, T2W, and T1C). It can also show the shape and location of the lesion well [5]. Because of NPC location, its spread to bone tissues, and the advantage of CT images, including the quality of imaging with better contrast in bony areas and high speed, CT is the best choice. In addition, CT scans take less time than MRI and are cost-effective and available. Centers may use either of these two modalities according to the patient’s condition [6, 7].
Image segmentation is a time-consuming person- -dependent task that requires the rendering skill of the oncologist; therefore, its correct execution creates a large workload, and the smallest error in segmentation affects the treatment plan [8]. In addition, segmentation of NPC tumors is more difficult due to their greater diversity and heterogeneous intensity compared to other tumors. One of the other challenges and problems of NPC segmentation is its metamorphic form, and each stage of treatment may require re-segmentation. For this reason, an automatic and accurate method to implement segmentation would be of great help [2, 9].
Among alternative methods that have been tested in recent years is the use of artificial intelligence for the automatic and accurate implementation of all TPS parts in various tumors [10]. In recent studies, Convolutional Neural Networks (CNNs) are evaluated rapidly in image auto-segmentation [11–13]. Therefore, in this study, we decided to comprehensively analyze the available literature on CNN ability to automatically perform NPC tumor segmentation in CT and MRI modalities.
Material and methods
We launched a comprehensive and systematic search of reliable sources to learn whether CNNs have sufficient ability to perform accurate segmentation. The study was registered at the beginning of its conceptualization in PROSPERO, the international open-access Prospective Register of Systematic Reviews (CRD42022379228).
Search strategy
We searched electronic databases, including MEDLINE (through PubMed) and Cochrane Library. In addition, a Google Scholar search of gray literature and publications in the arXiv database was conducted. There were no limitations regarding study language. Considering that the investigation of CNNs does not have a long history and has been evaluated only in recent years, no time limit was set for the search (in the year 2022). The terms used for the search strategy included (“Nasopharyngeal carcinoma”) AND (“Segmentation” OR “U-Net” OR “U-Res-Net” OR “Res-UNet”) AND (“Computed tomography” OR “CT” OR “Magnetic resonance imaging” OR “MRI”). PubMed was searched using the restriction of placing the [Title/Abstract] fields in all terms, and no field restriction was placed in Scholar.
After searching, Endnote software was used to collect articles. First, duplicate articles were excluded from the study. The screening of the studies was carried out in three steps: title, abstract, and full text. The search and screening of articles were performed by two researchers. Our assessment overlapped in 95% of cases, and in the remaining cases, we resolved differences of opinion based on the eligibility criteria.
Study exclusion criteria
All the selected studies investigated the power of all CNNs in relation to the NPC tumor segment, and the examination of OAR segments was excluded from the study. In terms of the investigated indicators, studies that reported the dice similarity coefficient (DSC) index were included. Studies in which the size of the network training samples was under 15 and studies that combined positron emission tomography (PET) images with CT and MRI were excluded from this analysis. All study reviews, case reports, editorials, and letters were excluded from the study.
Data extraction
The results were classified into two subgroups: CT and MRI modalities. The data extracted from the studies included the name of the first author, country and year of publication, network architecture, sample size and classification for training, external validation and testing, tumor staging, epochs number, learning rate, batch size, type of datasets, network dimension, CT contrast type, MRI sequence, feature extraction software, and processor characterization.
Furthermore, the indices of network performance included the DSC index and Hausdorff distance (HD) extracted from the studies. Meta-analysis results were reported using the 2020-PRISMA criteria, and the study protocol was written accordingly (Supplementary Tab. S1).
Quality assessment (risk of bias)
The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was used to evaluate the quality of m eta-analyses and the risk of bias. This tool evaluates the quality of diagnostic studies and includes four key domains: (1) patient selection random sampling, (2) index test (assessment blinded for and independent of reference test), (3) reference standard (valid reference test, assessment independent of index test), and (4) flow and timing (sufficient time between index and reference, all data points included in the analysis). The set of questions for each domain had answers on three scores including “yes-(1) score”, “no-(0) score”, and “unclear-(0) score”. This step was implemented by two persons.
Statistical analysis
Stata software (version 17.0; College Station, TX 77845, US) was used to perform all statistical calculations. Excel software (Microsoft 2016) was used to extract primary information from the articles and perform some basic calculations. One of the most important indices for evaluating CNN segmentation results is the DSC index, which is used as effect size. The heterogeneity studies were calculated by a random effect model, I2, τ2, and a level higher than 0.7 (I2 > 0) was considered an indicator of heterogeneity. To predict and investigate the effect of a variable on the obvious change in the results, the regression method was used, and the existence of possible publication bias was evaluated using a funnel plot.
Results
Study selection
Among the 3625 studies that were obtained by searching PubMed, Scholar, and Cochrane databases, 20 studies met eligibility criteria. A flow diagram of the study selection process is shown in Figure 1.
Study characteristics and quality assessment
The reviewed studies on both modalities were conducted in China in the years 2018–2022. Different CNNs included 2D–2.5D-3D UNet [14–26], modified UNet [17], 3D Res-UNet [17, 27, 28], modified 3D Res-UNet [14, 17], a mix of 2D and 3D Res-UNet [29, 30], 3D VNet [15], 3D SI-UNet [18], 3D Nested UNet [14, 19, 31], 3D AttR2-UNet [14, 21, 31], 3D LW-UNet [32], and 3D DE-UNet [33].
Magnetic resonance imagining modality studies used hospital data [21–24, 26, 28, 29, 31–35], and CT studies often used the 2019 MICCAI StructSeg data [14, 15, 17, 19] and hospital data [16, 18, 27, 30]. Two articles were conference papers [22, 30], and one was from the arXiv database [26].
In studies where CT images were analyzed, types of considered images were included without contrast (CT), and with contrast (CE-CT). Also, MRI images were the collection of different sequences of T1-Weighted (T1W), T2-Weighted (T2W), T1-Contrast (T1C), and multi-sequence (MS). In most studies of both modalities, full details of the task were not given; however, the epoch size was between 40 and 600, batch size was 1–8, and the learning rate was 0.01–0.001.
Result of risk of bias evaluation
The quality of the articles in the CT and MRI modality groups was evaluated using the QUADAS-2 tool, as presented in Figure 2.
Result of meta-analysis
The descriptive characteristics and some performance results of NPC segmentation studies on CT scan MRI modalities are listed in Tables 1 [15–19, 27, 30, 36] and 2 [21–24, 26, 28, 29, 31–33, 35, 37], respectively.
The first author (publication year and country) |
Sample Size |
Training [number] |
External. validation |
Testing N |
Dataset |
Tumor staging |
Image type |
Architecture |
Epoch |
Batch size |
Learning Rate |
DSC (mean) |
HD (mean) |
Processor specifications |
Li et al. [16] (2019; China) |
502 |
302 |
100 |
100 |
West China Hospital |
T1–T4 |
CT |
2D UNet |
40 |
NM |
0.01 |
74 |
32.10 |
Dual. Intel Xeon |
Xue et al. [18] (2020; China) |
150 |
120 |
15 |
15 |
Hospital. of USTC |
T1–T4 |
CT |
3D UNet 3D SI-UNet |
200 |
NM |
0.0001 |
84 74 |
9.7 8.7 |
One Intel Xeon Processor E5-2695 CPU and an NVIDIA Tesla P100 GPU memory |
Wang et al. [27] (2020; China) |
205 |
NM |
NM |
NM |
NM |
T1–T4 |
CE-CT CT |
3D Res –UNet |
NM |
1 |
0.03 |
73 |
4.96 |
One NVIDIA GeForce RTX 2080Ti with 11 GB GPU memory |
Bai et al. [17] (2021; China) |
60 |
50 |
No |
10 |
StructSeg 2019 Challenge |
T1–T4 |
CT |
2D UNet 2D PUNet 3D UNet 3D PUNet 3D Res-UNet 3D PRes-UNet |
NM |
8 |
0.0005 |
57.01 60.59 59.71 59.8 58.97 62.88 |
8.12 6.75 14.52 11.94 7.41 6.07 |
One NVIDIA RTX 2080Ti GPU and 32 GB memory |
Liu et al. [19] (2021; China) |
140 |
60 |
No |
40 |
2019 MICCAI StructSeg+ Sichuan Provincial. Cancer Hospital |
T1–T4 |
CT |
3D UNet 3D Nested-UNet |
300 |
4 |
0.001 |
33.9 25.6 |
13.2 13.7 |
NM |
Mei et al. [15] (2021; China) |
50 |
40 |
No |
10 |
2019 MICCAI StructSeg |
T1–T4 |
CT |
3D UNet 2.5 UNet 3D VNet |
NM |
16 |
0.0001 |
59.91 62.16 61.02 |
NM |
Two NVIDIA GTX 1080 Ti GPU memory |
Jin et al. [30] (2021; China) |
90 |
63 |
18 |
9 |
Sichuan Cancer Hospital. |
T1–T4 |
CT |
3D PUNet 3D ResSE-UNet |
200 |
8 |
0.0001 |
75 79 |
8.59 7.64 |
NM |
Yang et al. [36] (2022; China) |
257 |
205 |
No |
52 |
2019 MICCAI StructSeg |
T1–T4 |
CE-CT CT |
3D UNet 3D PRes-UNet 3D AttR2-UNet 3D Nested-UNet |
120 |
2 |
0.01 |
73.67 74.49 73.54 73.87 |
6.32 5.06 6.74 5.17 |
One NVIDIA GeForce RTX 2080Ti with 11 GB GPU memory |
The first author (publication year and country) |
Sample Size |
Training [number] |
External. validation |
Testing N |
Dataset |
Tumor staging |
MRI sequence |
Archit-ecture |
Epoch |
Batch size |
Learning Rate |
DSC |
HD |
Processor specifications |
He et al. [22] (2018; China) |
19 |
18 |
No |
1 |
NM |
T1–T4 |
T1W |
3D UNet |
NM |
NM |
0.0001 |
74.8 |
NM |
Ubuntu 14.04 with Tesla K80 at 3.6GHz and 11.18GB GPU memory |
Wang et al. [23] (2018; China) |
15 |
11 |
No |
4 |
West China Hospital |
T1–T4 |
T1W |
3D UNet |
NM |
NM |
NM |
79 |
NM |
NM |
Chen et al. [26] (2019; China) |
149 |
NM |
NM |
NM |
Shandong Cancer Hospital |
T1–T4 |
T1W T2W T1C MS |
2D UNet 3D UNet |
100 |
8 |
0.001 |
57.97 64.33 |
84.66 21.02 |
NVIDIA Titan Xp GPU with 12GB GPU memory |
Lin et al. [28] (2019; China) |
1021 |
715 |
103 |
203 |
Sun Yat-sen University Cancer Center |
T1–T4 |
T1W T2W T1C |
3D Res-UNet |
NM |
NM |
NM |
79 |
NM |
NM |
Ye et al. [33] (2020; China) |
44 |
NM |
NM |
NM |
Panyu Central. Hospital |
T1–T4 |
T2W T1W |
2D DE-UNet |
200 |
1 |
0.0001 |
66.1 |
NM |
NVIDIA Geforce GTX 1080 TI with 11 GB GPU memory |
Guo et al. [24] (2020; China) |
120 |
96 |
14 |
10 |
NM |
T1–T4 |
MRI |
3D UNet |
500 |
1 |
0.0001 |
73.7 |
NM |
NM |
Wang et al. [29] (2021; China) |
45 |
NM |
NM |
NM |
West China Hospital |
T1–T4 |
T1W T2W T1C MS |
2D+3D Res-UNet |
50 |
5 |
0.05 |
89.6 |
5.07 |
NM |
Wong et al. [37] (2021; China) |
201 |
136 |
No |
65 |
NM |
T1–T4 |
T2W T1W |
2D UNet |
75 |
4 |
0.005 |
71 |
NM |
NM |
Cai et al. [21] (2021; China) |
251 |
241 |
No |
10 |
Shanghai Cancer Center |
T1–T4 |
T1W T2W T1C |
3D UNet 3D AttR2-UNet |
600 |
NM |
0.0001 |
81.1 81.5 |
NM |
Two NVIDIA Geforce GTX 1080 Ti GPU memory |
Qi et al. [35] (2021; China) |
130 |
NM |
NM |
NM |
Shandong Cancer Hospital |
T1–T4 |
T1W T2W T1C |
3D UNet |
NM |
NM |
NM |
88.2 |
NM |
NM |
Zhang et al. [31] (2022; China) |
93 |
73 |
10 |
10 |
NM |
T1–T4 |
T1W T1C |
2D AttR2-UNet 2D Nested-UNet 2D SE-UNet |
100 |
3 |
0.001 |
73.8 79 78.7 |
NM |
NM |
Liu et al. [32] (2022; China) |
92 |
72 |
10 |
10 |
NM |
T1–T4 |
T1W T1C |
2D LW-UNet |
NM |
1 |
NM |
81.3 |
NM |
NM |
NPC CT scan segmentation evaluation
Meta-analysis results of NPC segmentation studies of CT scan modality are presented as a forest plot in Figure 3. The pooled DSC was 0.67 (CI 95%, 0.62 to 0.72; I2 = 88.07%, t2 = 0.011) (p = 0.00) for CT scan segmentation.
NPC MRI scan segmentation
The meta-analyses result of NPC segmentation on the MRI modality showed that the pooled DSC was 0.76 (95% CI 0.72 to 0.80; I2= 81.42%) (p = 0.01), and its forest plot is presented in Figure 4.
Subgroup analysis
The type of networks and their dimensions were evaluated in the following subgroups:
- CT scan: based on the number of network types, subgroups were divided into 12 categories. The number of six Network types was reported without a meta-analysis evaluation (including one study). The DSC index for 2.5D UNet, 2D UNet, 3D UNet, 2D P-UNet, 3D P-UNet, 3D AttR2-UNet, 3D Nested UNet, 3D Res-UNet, 3D P-Res-UNet, 3D ResSE-UNet, 3D SI-UNet, and 3D VNet was 0.62 (0.49 to 0.76), 0.67 (95% CI 0.50 to 0.83; I2 = 84.52%), 0.62 (0.95% CI 0.46 to 0.79; I2 = 95.64%), 0.61 (95% CI 0.48 to 0.73), 0.68 (0.95% CI 0.53 to 0.83; I2 = 74.68%), 0.74 (0.68 to 0.79), 0.64 (0.95% CI 0.25 to 1.02; I2 = 41.30%), 0.67 (95% CI 0.53 to 0.81; I2 =74.63%), 0.70 (95% CI 0.59 to 0.81; I2 = 74.68%), 0.79 (0.70 to 0.88), 0.74 (0.67
to 0.81), and 0.61(0.48 to 0.75), respectively.
Furthermore, the pooled DSC values for Network dimensions including 2D, 2.5D, and 3D were 0.65 (95% CI 0.54 to 0.76; I2 = 75.96%), 0.62 (0.49 to 0.76), and 0.68 (95% CI 0.62 to 0.74; I2 = 89.20%), respectively;
- MRI scan: in this modality, network types were divided into ten categories (10 network types) which nine categories were reported without a meta-analysis evaluation (including one study). The DSC index on 2D UNet, 3D UNet, 2D AttR2-UNet, 3D AttR2-UNet, 2D Nested-UNet, 2D SE-UNet, 2D+3D Res-UNet, 3D Res-UNet, 3D DE-UNet, and 3D LW-UNet was 0.64 (0.57 to 0.72), 0.76 (95% CI 0.68 to 0.84; I2 = 87.20%), 0.78 (0.70 to0.87), 0.81 (0.77 to 0.86), 0.79 (0.71 to 0.87), 0.79 (0.70 to 0.87), 0.79 (0.67 to 0.91), 0.79 (0.77 to 0.81), 0.66 (0.52 to 0.80), and 0.81 (0.73 to 0.89), respectively.
The pooled DSC analysis for the subgroups of Network Dimensions including 2D, 2D + 3D, and 3D achieved 0.75 (95% CI 0.68 to 0.82; I2 = 67.92%), 0.79 (0.67 to 0.91) and 0.77 (95% CI 0.71 to 0.82; I2 = 86.87%), respectively.
Evaluation of possible causes of heterogeneity
In regression evaluation, coefficients of variables caused heterogeneity for CT studies based on the training number, external validation, and epoch number, which were (0.00073, p = 0.014), (–0.13648, p = 0.008), (–0.00109, p = 0.041), respectively, and for MRI studies based on batch size (–0.02199, p = 0.010).
Publication bias
We used a funnel plot to evaluate the publication bias in the studies that evaluated CNNs in image segmentation of both CT and MRI modalities (Fig. 5).
Discussion
The automatic system for the segmentation of heterogeneous NPC tumors is very valuable because it reduces the workload and speeds up diagnosis and treatment. It is necessary to know how successful deep learning networks have been so far, thus the results of this study will be very helpful in decision-making. The DSC index value was selected as the effect size parameter, and a meta-analysis was performed along with SE.
Convolutional neural networks as a subgroup of deep learning were initially tested as 2D in 2018 for CT scans and then in 2019 for MRI. After introducing innovative 3D networks, more studies have been devoted to these networks (Tab. 1, 2). However, 3D networks require a higher volume of calculations and more complex hardware for processing [39]. Recently, the expansion of network layers to improve network performance has been considered. AttR2-UNet and Nested-UNet are examples of such networks [40, 41].
Overall, considering the classification of the DSC index into three levels: good (0.8 ≤ DSC ≤ 1), medium (0.6 ≤ DSC < 0.8), and poor (0 ≤ DSC < 0.6)] [38], both MRI image (0.76) and CT image (0.67) segmentation networks achieved medium results, while MRI studies obtained better results than NPC CT image segmentation studies. However, due to the different characteristics of the networks and the heterogeneous distribution of the studies in the two categories, it is not possible to draw definitive conclusions in this regard. The included studies were performed in the past five years, which indicates that this field is very new and will gain more success with further investigations.
In addition, the pooled DSC of both CT and MRI modalities for different dimensions of networks (2D– –2.5D–3D), reported almost similar values (~0.02 difference). In detail, the highest value of the DSC index for CT and MRI modalities was observed in 3D-ResSe-UNet (0.79), AttR2-UNet, and LW-UNet (0.81), respectively.
The limitation of analysis based on the results of the evaluated networks was the difference in details and performance of networks, such as the used loss function and epoch number even in similar networks. In addition, there was heterogeneity regarding the training of the networks using CT (with and without contrast) and MRI in different sequences. Due to the dependence of deep learning on the dataset, the heterogeneous distribution of patients, and the small number of patients in some geographical areas, may have affected the results of studies. Almost half of the CT scan segmentation studies used the same dataset presented in the 2019 MICCAI StructSeg, which reduces the impact of data type on the results and makes their comparison more valid. What is characteristic of these studies is that external validation was not performed in more than half of both modalities. Overall, we were able to reduce the heterogeneity analysis of the dimensions and type of the network subgroups.
Notably, all eligible studies were conducted in China, and on the other hand, the highest prevalence of NPC cancer was reported in China (~80%) [42]. Probably, the number of appropriate datasets, compared to other countries, and prioritizing this cancer in research facilitated the implementation of studies.
Since it is not easy to determine the margin of small tumors in magnetic resonance (MR) images, it may affect the ability of the network [43]. Therefore, more empowerment of networks to segmentation of MR images should be given more attention in future studies. Due to the impossibility of using contrast agents for patients with renal failure and the possibility of long-term complications [44], using images with contrast is likely to be used less in the future, thus it is better to enable networks to use non-contrast images.
Conclusions
The medium capability level of CNNs was observed in both CT and MRI modalities, while this capability was better in MRI segmentation. By improving CNNs, their clinical application can be made more practical.
Article Information and Declarations
Ethics statement
This article does not involve any studies with human participants or animals performed by any of the authors.
Author contributions
I.A. and M.Z. were equally involved in the design, literature review, and analysis of the study.
Funding
None.
Acknowledgments
None.
Conflict of interest
The authors declare that they have no conflict of interest.
Supplementary material
Supplementary Table S1.