Introduction
In radiotherapy, radiation doses are delivered to a target (cancer cells) while causing minimal damage to the surrounding healthy organs [1]. Patient-specific quality assurance (PSQA) ensures the accuracy and safety of the treatment process [2]. PSQA is divided into two categories: pretreatment verification and during-treatment verification. Current recommendations on PSQA focus on pretreatment verification in which the patient’s plan is delivered to either a phantom or air, and the absolute dose is measured using ion chambers, films, or an electronic portal imaging device (EPID).
Varian (Varian Medical Systems, Inc., Palo Alto, CA, United States) released Halcyon which has a two-layer multi-leaf collimator (MLC) system that ensures rapid beam modulation and substantially reduces leakage between the MLC leaves, thereby increasing workflow efficiency in radiotherapy. Meanwhile, as the machine does not have collimator jaws, MLC is the only beam-forming component in the machine. Therefore, MLC location and optimization are critical to ensuring accurate dose delivery [2–4].
To ensure the consistency of the MLC and beam angle, an EPID detector is permanently attached inside the Halcyon. EPID images can be extracted to provide dose distribution information; as a dosimetry method for verification before and during therapy [5–14]. One type of dosimetry by EPID is non-transit dosimetry, which can predict the dose inside the patient for pre-treatment verification by measuring doses at a certain height without an attenuator. While many algorithms have been developed using mathematical formulas [6], a novel method for non-transit dosimetry using deep learning model is presented in this paper. Five models were developed, and the results were compared and validated using the gamma index with the criteria of 3%/3 mm.
Deep learning works by combining large amounts of data with fast and iterative intelligent processing algorithms. This allows the system to learn automatically from patterns or features in the data [15]. Researchers have used deep learning as a correction tool in dosimetry to increase the similarity of dose distributions through EPID [16–21]. However, no previous study has used deep learning to reconstruct the dose distributions inside patients with EPID for non-transit dosimetry.
Materials and methods
The experiment was performed at Cipto Mangunkusumo General Hospital using Varian Halcyon (Varian Medical Systems, Palo Alto, CA, United States) equipped with an a-Si 1200 EPID. After irradiation, a single image was extracted, and the pixel value was automatically converted into a calibrated unit. The source-to-detector distance (SDD) was fixed at 154 cm, as shown in Figure 1. The A-Si EPID 1200 has a maximum irradiation area measuring 43 × 43 cm², accompanied by a pixel dimension of 1,280 × 1,280. Consequently, the pixel size ratio was 0.340 mm/pixel. The term “max irradiation area” indicates the largest allowable irradiation field.
Furthermore, the EPID images were compared to those of DICOM RT, which were generated from the Eclipse treatment planning system (TPS) using an anisotropic analytical algorithm (AAA). The flowchart of this study is shown in Figure 2 and is explained below.
Breast cancer was considered in this study because it has the highest age-standardized mortality rate of around 15.3 per 100,000 population [22].
The maximum field size that can be captured by the detector is calculated using a similar triangle theorem by utilizing the source axis distance (SAD) 100 cm divided by source-to-detector distance (SDD) 154 cm and multiplied by the max irradiated area 43 × 43 cm2. So, the maximum field was 27.92 × 27.92 cm2.
There are 47 unique fields of breast cancer cases that have a field size below the maximum allowed. All fields were recalculated again for gantry angles at 0 degree, so the total data reached 94 fields. The planned dose distributions were exported from the TPS in DICOM format and used as ground truth.
Augmentation
The original dataset was divided randomly into two equal parts, training–test dataset and validation dataset. This step is to prevent mixing between the training–test and validation datasets.
Deep neural networks require a lot of training data to obtain good results and prevent overfitting. The augmentation was done by rotating and flipping the dataset. This technique is a common image augmentation technique in deep neural network research [22, 24]. This augmentation technique adds more than 1,000 image pairs which is divided into 70% training data and 30% testing data.
Deep learning models
Five deep learning models were developed using the Python programming language and run on a Nvidia K80 GPU with 12 GB RAM memory. The inputs for the deep learning models were the EPID images, whereas the ground truths or targets were the planned dose images. Each model was optimized using the Adam optimization algorithm. Although the epoch was set to 300, training could be stopped earlier if there was no improvement in the results. The learning rate was 0.000001, and the loss value was calculated using mean squared error (MSE). MSE calculates the average of the squared differences between the predicted and actual values, this value is shown to be an accurate measure of the similarity between two images. The details of each model are explained below.
A-Model
This model is a convolutional neural network (CNN) with five layers, and the architecture is presented in Supplementary File — Table S1. There were 64 hidden neurons in each layer, and the number of parameters was 113,155, which were optimized during the training sessions.
B-Model
This model is a CNN with six layers, and the architecture is presented in Supplementary File — Table S2. The number of hidden neurons for each additional layer doubled, i.e., 16, 32, 64, 128, 256, and the last layer (sixth layer) was the output layer. Since this model had more layers, the total number of parameters increased to 1,097,285.
C-Model
This model is a convolutional autoencoder, and the architecture is presented in Supplementary File — Table S3. It has been proven to eliminate noise in images by retaining spatial and temporal information [25]. This model comprised two major parts: the encoder and decoder sections. In the encoder section, when a new layer was added, the hidden neurons increased, whereas the image dimensions decreased. However, in the decoder section, when a new layer was added, the hidden neurons decreased, whereas the image dimensions increased. The input had the same number of dimensions as the output, and the total number of parameters was 1,018,817.
D-Model
This model is also a convolutional autoencoder, and the architecture is presented in Supplementary File — Table S4. The difference between the D- and C-models is that the number of hidden neurons in the encoder section of the D-model was reduced from 256 to 128. Reducing the number of hidden neurons decreases the number of parameters for training. Based on these two models, the effects of the total number of parameters can be analyzed. The total number of parameters for this model was 427,713, which was 58.02% less than that for the C-model.
E-Model
This model is a U-Net, and the architecture is presented in Supplementary File — Table S5. U-Net was originally invented and first used for biomedical image segmentation. Its architecture can be broadly considered as an encoder network followed by a decoder network (i.e., an autoencoder). Previous studies have shown that this model can be used to correct the reconstruction results of the absolute dose distributions of EPID dosimetry [16]. In this study, we adopted this model for non-transit EPID dosimetry. It had four depths, and each down-sampling block comprised two blocks containing a convolution layer, followed by batch normalization and a fixed rectified linear unit activation function. Additionally, it had the highest number of parameters (i.e., 7,759,521), which were used for training.
Validation
Validation was performed using the gamma index (γ) method, which is frequently used to verify complex modulated radiotherapy [26–29]. Several researchers have used it to measure the similarity between two images in 2D and 3D [30]. In this study, all validation images from EPID were compared with the planned dose, known as the origin, and subsequently, those EPID images were enhanced with the deep learning model and compared again to the planned dose. All comparisons used a gamma index of 3%/3 mm.
Results
The dataset used for validation is 47 and set each of shape 1,280 × 1,280 × 1 pixels. Every model was trained, tested, and validated with the same data. Table 1 reveals that deep learning can be used to build non-transit dosimetry on Halcyon. Moreover, the EPID images that improved by the A-, C-, D-, and E-models had higher average gamma pass rates than the origin. Surprisingly, the B-model had a lower average gamma pass rate compared to the origin, meaning this model is not suitable or needs improvement to build non-transit dosimetry. Furthermore, although the A-model had the least number of parameters, it had the highest average gamma pass rate. Therefore, in addition to the number of parameters, the accuracy of a model is determined by the constituent components of the model. If the model is not suitable for the problem, the output will be inaccurate, even if it has a high number of parameters.
|
Origin |
A Model |
B Model |
C Model |
D Model |
E Model |
Average γ-pass rate % |
79.29 ± 6.27 |
90.07 ± 4.96 |
77.42 ± 7.18 |
79.60 ± 6.56 |
80.21 ± 5.88 |
80.47 ± 5.98 |
Max γ-pass rate % |
87.74 |
94.70 |
88.22 |
89.66 |
89.68 |
89.69 |
Min γ-pass rate % |
61.96 |
75.51 |
51.77 |
62.00 |
62.05 |
62.07 |
Table 2 shows a comparison of processing time and average gamma pass rate improvement from the origin for each model. From the table, the processing time did not have a linear relationship with the number of parameters; thus, more parameters do not always lead to longer processing times. The A-model had the shortest processing time because it had the least number of parameters, whereas the B-model had the longest processing time but not the highest number of parameters. The average processing time was less than 2 s, proving that deep learning can be used for clinical activities. The fastest model in descending order is A, D, C, E, and B. Moreover, the average gamma passing rate improvement from the origin revealed that the A-model had the greatest improvement compared to the other models, with the average reach 14.10 ± 8.79% from the origin.
|
Average processing time [s] |
Average of gamma pass rate improvement from origin (%) |
Model A |
0.414 ± 0.113 |
14.10 ± 8.79 |
Model B |
1.512 ± 0.199 |
–0.38 ± 9.99 |
Model C |
0.788 ± 0.022 |
0.68 ± 8.22 |
Model D |
0.651 ± 0.052 |
1.42 ± 6.94 |
Model E |
1.187 ± 0.017 |
1.72 ± 6.88 |
The distribution of gamma pass rates obtained from the validation of various deep learning models and original data is presented in Figure 3. This categorization was performed with the objective of identifying successful image patterns in each model. It was observed that the A-model exhibited the most favorable results, with 30 out of 47 images belonging to the gamma pass rate group of 91–95%. Conversely, none of the other models achieved this group.
Figure 4 explains the comparison gamma pass rate maps for cases from the 91–95% and 76–80% groups for Model A. Referring to other research on developing deep learning models for dosimetry using EPID images, it has been highlighted that the penumbra area is of particular concern for accurate correction [16]. The penumbra area in an EPID image refers to the region that experiences the shadowing or attenuation effect of radiation due to the boundary of the irradiated object. This occurs because of the physical properties of radiation, where radiation can scatter and create shadows on the surface of the EPID detector. However, in this case, the pattern of the penumbra area has not been identified to have an impact on determining similarity to the target image. This is shown at Figure 4B, the brighter color is absent only at the boundary of the irradiated object.
Discussion
The development of various deep learning models significantly depends on the problem to be solved. The absence of a golden standard for determining the layout of each hidden layer, number of parameters, and minimum amount of data required for accurate model results have become major topics in deep learning research [31–33].
In this study, it was proven that the number of parameters was not directly proportional to the output image. Table 2 reveals that the A-model, which had the least number of parameters, had the highest gamma index value compared to the other models. We assumed that the addition of more parameters to a model required more training data. The results of this study also strengthen other studies that try to compare performance for various hidden layers or parameters, the results show that when data is added to training and validation, the model with higher hidden layers will have better accuracy than the model with fewer hidden layers [34], but further research should be conducted to validate this assumption.
The best gamma index results from Models B, C, D, and E only reach below 90%. Unexpectedly, some of the gamma index results on model B show lower compared to the original image without artificial intelligence (AI). The larger parameter size of model B may necessitate a meticulous parameter tuning process and the utilization of a more extensive training dataset. Furthermore, these findings strongly suggest that model inaccuracies can have a detrimental impact on the results, potentially resulting in performance that is inferior to the original image without the incorporation of AI. Therefore, many deep learning studies incorporate a confusion matrix to analyze the likelihood of dangerous false positives and false negatives, as they could have significant implications in the healthcare domain. False positives refer to instances where the AI system incorrectly identifies a condition or event that is not present. On the other hand, false negatives occur when the system fails to detect a genuine condition or event that it is actually present.
In Figure 4, an extensive analysis was undertaken with the aim of identifying discernible patterns within the gamma maps generated by model A under varying levels of accuracy. Specifically, we examined the gamma maps corresponding to the model’s highest accuracy (Fig. 4A) and those associated with the lowest accuracy (Fig. 4B). Our analysis was focused on two key aspects: field size and the intensity of coloration within the penumbra region.
Despite meticulous examination and a comprehensive comparative approach, it is noteworthy that our findings did not reveal any prominent or readily identifiable patterns in either of these aspects. This outcome suggests that, within the scope of our study, there were no stark differences in field size or significant alterations in the brightness levels within the penumbra region that could be unequivocally attributed to variations in model accuracy. This lack of discernible patterns underscores the complexity of the relationship between model accuracy and the resulting gamma maps.
The study involves the construction of models A, B, C, D, and E, which are derived from three basic models, namely CNN, Autoencoder, and U-Net. By maintaining an equal amount of training data, CNN demonstrated superiority based on the gamma pass rate data. However, it is essential to recognize that deep learning encompasses more than just datasets; improper preprocessing techniques and misconfiguration of hyperparameters can lead to ineffective performance and reduced accuracy. Further research is necessary to ensure that each model is appropriately tuned to achieve optimal results.
In clinical implementation, the use of deep learning will greatly accelerate the pre-treatment verification of patient-specific quality assurance, the image obtained from the EPID is directly converted into a dose and projected on the patient’s target organ by the deep learning model so that the image can be directly compared with the dose calculated results from the TPS. This study succeeded in five deep learning models that can run below 2 seconds. Furthermore, it also adds to the evidence of the success of deep learning in building accurate dosimetry based on EPID images [16, 19, 34]. Moreover, the results should be validated by considering more cases and improved by replacing the loss value from MSE to the gamma index directly; however, faster computer processors and higher RAMs are required when training the model.
To create a deep learning (DL) model, comprehensive data in big number is needed, for example, to make DL capable of auto-contouring organ at risk at CT images, large number of CT image data is needed before and after contouring which will be used as targets, but this data is very hard to find as an open-source dataset. The lack of open datasets in radiotherapy means that the deep learning models that are created cannot be compared directly to other research. An open collection of EPID and TPS data in many cases and centers is really needed to help researchers only focus on developing their deep learning models and tuning their hyper parameter without worrying about data needs. In future research, there is potential for the integration of multiple distinct models with the goal of enhancing the precision of AI dosimetry.
Conclusion
In this study, a deep learning-based method for building non-transit dosimetry was proposed to improve the similarity between EPID images and planned dose images from TPS at the isocenter plane. Five different deep learning models were compared, and we found that the A-model had the best performance, with a gamma pass rate higher by 14.10 ± 8.79% compared to that of the EPID images not subjected to the deep learning models, followed by the E-, C-, D-, and B-models. In addition, the A-model had the fastest processing time of 0.414 ± 0.113 second, followed by the D-, C-, E-, and B-models.
Conflict of interest
There is no conflict of interest in this study.
Funding
This study was supported by Universitas Indonesia PUTI research grant (contract number: NKB-1665/UN2.RST/HKP.05/00/2020).
Acknowledgement
The authors thank the management and all staff members in the Department of Radiotherapy, Cipto Mangunkusumo Hospitals and Department of Physics, Universitas Indonesia.