WHAT’S NEW This article delineates significant progress in the diagnosis of aortic dissection achieved by deploying an innovative Convolutional Neural Networks model adept at differentiating with high precision aortic dissection from non-aortic dissection electrocardiograms (ECGs). The AI-Aortic-Dissection-ECG score exhibits substantial correlations with pivotal clinical parameters and aortic dissection-associated mortality risk. Transcending traditional diagnostic modalities, the AI-Aortic-Dissection-ECG score has a stronger association with D-dimer distribution, augments diagnostic acuity, and can be considered a supreme tool for exhaustive aortic-dissection risk evaluation. This study further elucidates the model’s interpretability, highlighting crucial ECG signals pertinent to aortic dissection and associated aortic risk levels. This pioneering approach can substantially enhance aortic dissection diagnostic protocols and facilitate clinical decision-making processes. |
INTRODUCTION
Aortic dissection (AD) is a serious cardiovascular disorder despite its relative rarity [1–3]. Emergency physicians frequently face the challenging task of diagnosing and managing AD, as this condition can rapidly deteriorate and pose a life-threatening risk to patients. Therefore, prompt and accurate diagnosis and intervention are crucial, particularly in the emergency department (ED) [1–3]. Initial misdiagnosis in 14%–39% of AD cases can have severe or fatal consequences due to incorrect treatment [4].
Accurate AD diagnosis can be achieved using non-invasive imaging techniques like computed tomography angiography (CTA) and magnetic resonance angiography (MRA), but they are a lengthy process, impractical for bedside use, and unsuitable for elderly frail patients or those with renal insufficiency, which restricts their utility [4–6]. X-ray and echocardiography, while valuable for bedside AD diagnosis, especially for hemodynamically unstable patients, are limited in diagnostic precision and technical applicability [5, 7]. Many blood biomarkers, such as D-dimer, a product of thrombus formation and fibrinolysis, have been suggested as AD diagnosis biomarkers, yet they also have diagnostic limitations [8].
AD patients often present with electrocardiogram (ECG) abnormalities during their disease course [2, 9]. ECG examination, as a non-invasive method, is one of the most readily available assessments that can provide immediate results, and, therefore, is extensively implemented across medical institutions at all levels to facilitate expedited and accessible disease evaluation. ECG is good for recording cardiac electrical activity and can reflect physiological and pathological changes, which is pivotal in the diagnosis of many cardiovascular diseases [10]. However, when diagnosing AD, the diagnostic value of ECG is relatively weak [11].
Over the past decade, deep learning (DL), a type of AI, has significantly advanced and brought innovation in disease diagnosis [12, 13]. Unlike traditional machine learning, DL models (DLM) automatically extract complex features, improving disease detection, including in atrial fibrillation [14, 15], hypertrophic cardiomyopathy [16], left ventricular systolic dysfunction [17, 18], and aortic stenosis [19]. Compared to internal medicine doctors’ identification of arrhythmias, deep learning models exhibit higher accuracy [12]. This indicates that deep learning has a promising clinical future in interpreting ECG [20].
In our study, we aimed to accurately identify AD patients through a DLM trained with a convolutional neural network (CNN) based on 12-lead ECG and to generate an AADE score that would correlate with disease severity. By applying deep learning technology to ECG diagnosis, we aim to develop a new and simpler method to enhance the accuracy of AD diagnosis and reduce misdiagnosis rates, thereby providing patients with more precise treatment plans.
METHODS
The confirmation of AD was based on the following criteria: CTA showed the presence of an intimal flap separating true and false lumens in the aorta, or there was an intramural hematoma; it involved the ascending aorta (defined as type A), the aortic arch, or descending aorta (type B). In our study, penetrating atherosclerotic ulcers and intramural hematomas were defined as AD, as they are similar in treatment and prognosis to typical AD [21]. This study was approved by the Review Committee of Tongji Hospital, affiliated with Tongji Medical College of Huazhong University of Science and Technology (TJ-IRB20230647).
Study population
Our retrospective study at Tongji Hospital’s Emergency Department, conducted from January 2018 to July 2022, included 1878 patients. This cohort consisted of 313 individuals diagnosed with AD, 1252 general emergency patients without AD, and a specific control group of 313 pa- tients with chest pain. We included all patients hospitalized with a diagnosis of AD during the study period. Exclusion criteria included patients without adequate electrocardiographic data, and those diagnosed by CTA or angiography but without follow-up data.
Data collection and analysis
Data collection focused on ECG features, demographics, biochemical indices, and medical histories of the AD group (Supplementary material, Appendix 1). We also compiled data for a control group of 1252 non-AD patients, matched 1:4 with the AD group based on age and sex. An additional control group of 313 non-AD patients with chest pain was also included (Supplementary material, Figure S1), confirmed by emergency and ward physicians. We conducted a follow-up of AD patients via telephone, with the follow-up period for all participants calculated from the date of diagnosis to the date of death or the end of the study, with a median duration of 21.3 months; the interquartile range (IQR) was approximately 15.51 months (11.6 months, 27.1 months). To address the problem of missing data during the follow-up period, we utilized Multiple Imputation by Chained Equations (MICE) [22]. We defined death as the endpoint event.
ECG data
For our study, we analyzed each patient’s first pre-treatment ECG, recorded at 500 Hz using a Philips PW TC10 and stored in XML format. ECG interpretations were performed manually by experienced cardiologists. Our dataset included 1878 ECGs, encompassing records from both AD and non-AD patients. The samples were divided into training and validation sets in a 7:3 ratio, with no overlap.
Model development and performance evaluation
We employed a CNN as the primary architecture for our deep learning model to extract features from the 12-lead ECGs (detail in Supplementary material, Appendix 2 and Figure S2). The model’s training process involved optimization and adjustments to improve AD diagnosis accuracy. A 10-fold cross-validation method was used to ensure the model’s robustness. To further evaluate the effectiveness of our deep learning model in identifying AD, we compared the model’s predictive results with the actual diagnostic outcomes. We also conducted model testing with 313 patients experiencing chest pain and 313 patients with AD to evaluate the model’s performance in these two specific groups. The model’s performance was evaluated using metrics like accuracy, sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve. In addition to the original 1:4 matched sample set, we also tested 1:1, 1:2, and 1:3 ratios to assess whether there were significant changes in the model’s accuracy and stability with different sample sizes and patient proportions.
AADE score and patient characteristics
The model produced an AADE score, reflecting the likelihood of AD on each ECG. We explored correlations of AADE scores with patient characteristics, including demographics, biochemical indicators, and D-dimer levels. D-dimer concentrations were divided into four quartiles: low (<0.840 ug/ml), medium (0.840~1.450 ug/ml), medium-high (1.451~5.700 ug/ml), and high (>5.700 ug/ml), The relationship between these concentrations and AADE scores was visualized using box plots, and statistical comparisons were made using the Kruskal-Wallis test.
Statistical analysis
We displayed continuous variables with a normal distribution in the sample information as means (standard deviations). For continuous variables that are not normally distributed, the medians and IQRs from the first quartile (Q1) to the third quartile (Q3) were used for presentation. In addition, we used the Student’s t-test to analyze comparisons between the two groups. Categorical data were presented as frequency and percentage and compared using the Chi-squared (χ2) test and Spearman correlation coefficients for continuous variables, and point-biserial correlation coefficient for binary variables. We employed the Kruskal-Wallis test to compare the distributions of the medians. Additionally, Kaplan-Meier survival curve analysis was applied in our study. In all hypothesis tests, a two-sided significance level of 0.05 was adopted. Histograms were used to plot the classification of ECG features. We used accuracy, sensitivity, specificity, F1 score, and area under the receiver operating characteristic curve (ROC-AUC) to evaluate the generated models. R packages (ggplot2, pROC, survminer) were used for statistics and plotting. P-value <0.05 was considered statistically significant, all tests were two-sided.
RESULTS
Study population characteristics
In this study, we compared the clinical characteristics of 313 patients with AD with 1252 non-AD patients (Table 1).
Variables |
AD (n = 313) |
Non-AD (n = 1252) |
P-value |
Age, year, mean (SD) |
59.1 (12.9) |
59.1 (13.2) |
0.9 |
Female, n (%) |
58 (18.5) |
233 (18.6) |
0.87 |
BMI, mean (SD) |
25.6 (3.9) |
24.9 (4.2) |
0.21 |
Body temperature, mean (SD) |
37.2 (0.8) |
37.1 (0.9) |
0.82 |
SBP, mm Hg, mean (SD) |
142.2 (25.9) |
131.6 (27.5) |
<0.001 |
DBP, mm Hg, mean (SD) |
82.1 (17.2) |
72 (16.3) |
<0.001 |
Hypertension, n (%) |
269 (85.9) |
642 (51.3) |
<0.001 |
Diabetes, n (%) |
176 (56.4) |
760 (60.7) |
0.18 |
Hyperlipidemia, n (%) |
266 (85.2) |
806 (64.4) |
<0.001 |
Renal insufficiency, n (%) |
101 (32.2) |
81 (6.5) |
<0.001 |
Coronary artery disease, n (%) |
83 (26.8) |
60 (4.8) |
<0.001 |
Cerebrovascular disease, n (%) |
9 (2.8) |
20 (1.6) |
<0.001 |
Respiratory system diseases, n (%) |
24 (7.6) |
99 (7.9) |
0.74 |
Digestive system diseases, n (%) |
11 (3.5) |
102 (8.2) |
<0.001 |
Trauma or injury, n (%) |
2 (0.6) |
114 (9.1) |
<0.001 |
Smoke, n (%) |
220 (70.3) |
895 (71.5) |
0.36 |
Alcohol, n (%) |
223 (71.2) |
881 (70.4) |
0.41 |
Hospital death, n (%) |
17 (5.4) |
18 (1.4) |
<0.001 |
Our findings showed differences in average age (59.1 years) and sex distribution (approximately 18.5% female) between the groups (P-values of 0.9 and 0.87). However, disparities were noted in several key health indicators. Specifically, the AD group exhibited higher mean systolic blood pressure (142.2 mm Hg vs. 131.6 mm Hg), mean diastolic blood pressure (82.1 mm Hg vs. 72 mm Hg), prevalence of hypertension (85.9% vs. 51.3%), hyperlipidemia (64.4% vs. 85.2%), and renal insufficiency (32.2% vs. 6.5%), compared to the non-AD group (all P <0.001). Additionally, the incidence of coronary artery disease and cerebrovascular disease were higher in the AD group than in the non-AD group (P <0.001). In the non-AD group, there was a higher incidence of digestive system diseases, trauma, or injuries (P <0.001), which may be associated with diversity of emergency department patients. However, there were no differences between the groups in the prevalence of diabetes, incidence of respiratory system diseases, smoking and drinking habits (P-values of 0.18, 0.74, 0.36, and 0.41, respectively). We analyzed the ECG characteristics of the 313 AD patients (Supplementary material, Figure S3).
These findings indicate that, while patients with AD and the emergency department control group were similar in certain baseline characteristics, there were significant differences in blood pressure, prevalence of chronic conditions and specific diseases, as well as hospital mortality rates.
Model performance
The CNN demonstrated excellent performance as a model within the validation group. We selected a control group of patients (n = 1252) that matched the age and sex of the AD patients (n = 313) and used a standard 10-second 12-lead ECG full model. We tested the model with control group ratios of 1:4, 1:3, 1:2, and 1:1, using an AADE score of 0.5 as the optimal threshold for diagnosing AD. Performances of all four models were good. We compared the AUC by calculating the model’s accuracy, sensitivity, specificity, and F1 value (Figure 2). The 1:1 model showed an accuracy of 0.93, sensitivity of 0.914, specificity of 0.946, F1 value of 0.93, and an AUC of 0.97, all superior to other models’ ratios, demonstrating that the AADE score’s diagnostic effect is best at a 1:1 match. The CNN generated an AADE score, a continuous value between 0 and 1, indicating the estimated likelihood of AD on each ECG. Furthermore, in the specially introduced control group (Supplementary material, Figure S4) designed to enhance the model’s adaptability to clinical scenarios (the cohort with chest pain, n = 313), the model also demonstrated satisfactory performance, albeit slightly reduced. Within this group, the model achieved an accuracy of 0.871, a sensitivity of 0.837, a specificity of 0.903, an F1 score of 0.867, and an AUC of 0.92 (Supplementary material, Figure S5).
Comparison of accuracy indicators for AADE groups and distribution of D-dimer
We collected the demographic characteristics information (Supplementary material, Table S1) and ECG features (Supplementary material, Figure S6) of the validation group. Assessing the correlation coefficients (r) between the AADE score and several laboratory-test-based markers of AD severity, we found that the AADE score positively correlated with AD type (P = 0.02) (Table 2). This means that an increase in these factors may be associated with an increase in AADE. Other variables, including sex, age, smoking, alcohol, height, weight, blood pressure, etc. showed no statistical correlation with AADE. Among the laboratory tests, D-dimer showed a difference in the risk score (Table 2).
Variable |
Correlation coefficient |
P-value |
Sex |
–0.076 |
0.47 |
Age |
–0.003 |
0.97 |
Smoke |
–0.14 |
0.18 |
Alcohol |
–0.155 |
0.14 |
Height |
–0.005 |
0.96 |
Weight |
–0.087 |
0.41 |
SBP |
–0.015 |
0.89 |
DBP |
–0.073 |
0.48 |
Hospital death |
0.084 |
0.42 |
Hypertension |
0.059 |
0.57 |
Diabetes |
–0.059 |
0.57 |
Renal insufficiency |
0.074 |
0.48 |
Hyperlipidemia |
–0.099 |
0.35 |
AD type |
0.242 |
0.02 |
Creatinine |
–0.082 |
0.44 |
eGFR |
0.079 |
0.45 |
CRP |
–0.078 |
0.46 |
CTn |
0.095 |
0.37 |
PT |
0.085 |
0.42 |
Fbg |
0.057 |
0.59 |
APTT |
0.094 |
0.37 |
TT |
–0.048 |
0.65 |
D-dimer |
0.322 |
0.002 |
Figure 3A depicts AADE score distributions across D-dimer groups. The median score increased from the low (<0.840 ug/ml) to medium (0.840~1.450 ug/ml) D-dimer groups (P = 0.04), remained stable for medium and medium-high (1.451~5.700 ug/ml) groups, but rose in the high group (>5.700 ug/ml, 0.983) compared to both low (P = 0.005) and medium-high groups (P = 0.003). Furthermore, the type A dissection group exhibited a higher median AADE score (0.985) compared to the type B group (0.823), with a P-value of 0.002 (Figure 3B).
Significant ECG features in the CNN model
To improve the CNN model’s interpretability (Figure 4), we found that abnormal ECG was the most strongly correlated feature with the model’s predictions (r = 0.384; P <0.001), highlighting ECG significance in the model’s decision-making. ST-segment abnormalities (r = 0.302; P = 0.003) and ST-segment depression (r = 0.302; P = 0.003) also demonstrated positive correlations, suggesting their importance in AD diagnosis. Other ECG features, such as anterior and anterolateral wall ST-segment depression, sinus tachycardia, sinus bradycardia, and left ventricular hypertrophy, showed moderate correlations (r ranging from 0.219 to 0.263; P <0.05), indicating their relevance in the model’s analysis. A correlation heatmap was created to visually represent these relationships, offering an intuitive view of how the CNN model interprets ECG data.
AD risk prediction according to the AADE score
In the Kaplan-Meier survival analysis depicted in Figure 5, an association was observed between the AADE score and the survival time (P = 0.02), with the high-AADE-score group exhibiting a substantially lower survival rate throughout the follow-up compared to the low-score group. This indicates a more rapid decline in several surviving patients in the high-score group. During the one-year follow-up, the mortality rate in the high-AADE-score group was 31.25%, compared to 10.34% in the low-score group. When the follow-up was extended to two years, the mortality rate in the high-score group rose slightly to 34.38% while in the low-score group, it remained stable. This means that the risk of death was higher in the high-score group.
DISCUSSION
In our study, we developed a CNN model based on ECG data (see the graphical abstract), which is capable of distinguishing between AD and non-AD ECGs with high accuracy. The results from the special control group further confirm the effectiveness of our model in differentiating between AD and non-AD patients, even among non-AD patients presenting in clinical settings with symptoms of chest pain. The model provides an AADE score that can effectively assess the likelihood of AD on each ECG. The AADE score is correlated with certain laboratory tests and types of dissection. Compared with existing DL research on AD, our study has several significant advantages [23]. Firstly, our model included more patients with AD. Secondly, we identified ECG features correlated with the model, improving the explanatory power of the AI model. More importantly, in our follow-up, we evaluated the mortality risk of patients through the AADE score, which could provide clinicians with a more accurate and comprehensive risk assessment tool.
D-dimer has turned out to be effective in distinguishing AD from other diseases, and its levels were positively correlated with AD mortality risk [24]. However, some AD patients may present negative D-dimer levels [25, 26], which underlines the necessity of utilizing a combination of diagnostic methods for accurate AD detection. Currently, combining the aortic dissection detection risk score and D-dimer offers higher diagnostic accuracy than using single indicators [27], but its sensitivity and specificity for acute aortic syndrome diagnosis fall short of our AADE score model [28]. Our study found a positive relationship between AADE scores and D-dimer distribution, suggesting that higher AADE scores correspond to an increased AD mortality risk. This adds another dimension to diagnostic precision, particularly, in identifying more severe cases of AD. In distinguishing between two AD types, A and B, which present different anatomical and clinical characteristics that affect treatment strategies [29, 30], the AADE score proves valuable. We observed that type-A AD generally scored higher on the AADE scale, indicating the potential use of this model in predicting AD type and thereby guiding treatment decisions.
Despite the lack of interpretability inherent to CNN models due to their “black-box” nature, our research utilizes advanced AI methods to discern strong AD signals on ECG and quantify these in relation to aortic risk levels. This not only boosts model interpretability but may also influence future clinical practices and studies. We discovered positive correlations between the AADE score and several ECG features including abnormal ECG, sinus tachycardia, sinus bradycardia, and ST-segment depression. Interestingly, while T-wave inversion was common, it showed no correlation with the AADE score, unlike ST-segment depression, which was the most closely linked. This suggests that, for ECG examinations of AD patients, ST-segment depression, particularly in anterior and anterolateral walls, could be a critical feature for the AI model to identify AD. However, the exact mechanism behind the link between ECG ST-segment changes and AD occurrence remains unclear [9], and further research is required to explain these potential connections.
In conclusion, our research shows that a CNN-based AI model can effectively distinguish between patients with and without AD based on 12-lead ECG data. In patients with chest pain, the model also demonstrated stable performance. After training at different ratios, the performance of the model was maintained, and the AI model indicated an AADE score related to the probability of assessing the risk of disease. Moreover, our study showed correlations between the AADE score and ECG features, D-dimer, and AD type. During the one and two-year follow-up periods, the mortality rate of the high-scoring group was significantly higher than the low-scoring group, indicating a significant association between the AADE score and the patient’s survival period. This suggests that the AADE score is an important prognostic factor that may impact AD patients survival. These findings may provide valuable information for clinicians assessing the risk and severity of AD and help to diagnose and manage AD patients more accurately.
Early detection and timely treatment of AD are crucial for improving patient survival rates [31]. Our model can assist lower-level hospitals in rapid diagnosis of dissection without CTA. This can facilitate swift transfers to higher-level hospitals for further diagnosis and treatment and help avoid severe consequences of incorrect treatment. Additionally, in circumstances where patients have contraindications to CTA, including renal insufficiency, this algorithm can be useful in distinguishing a specific population that requires special attention. In the future, this model could be used to develop wearable devices to identify AD patients in non-hospital settings, aiding their swift triage.
Limitations
Our study has some limitations, including a small sample size that may affect reliability and generalizability of the results, and there is a necessity for larger studies. Its single-center nature and exclusion of certain patients limited its scope, highlighting the need for multi-center validation and more inclusive patient selection. Additionally, the study was retrospective, and future prospective studies could enhance the efficacy of this AI model.
Supplementary material
Supplementary material is available at https://journals.viamedica.pl/kardiologia_polska.
Article information
Acknowledgments: Gratefulness is extended to all those who supported and encouraged this research. Despite the absence of direct financial assistance, their valuable insights and suggestions positively influenced the quality and depth of this study.
Conflict of interest: None declared.
Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Open access: This article is available in open access under Creative Common Attribution-Non-Commercial-No Derivatives 4.0 International (CC BY-NC-ND 4.0) license, which allows downloading and sharing articles with others as long as they credit the authors and the publisher, but without permission to change them in any way or use them commercially. For commercial use, please contact the journal office at kardiologiapolska@ptkardio.pl