Despite significant efforts in investigating the pathophysiology of atrial fibrillation (AF) and improving treatments, AF remains the most common cardiac arrhythmia. To date, limited studies have assessed trends of all-cause mortality in contemporary AF patients. These data have shown a small increase in age-standardized all-cause mortality in AF patients (increased by 2.0% between 1990 and 2019) [1] and no significant improvement in 1-year all-cause mortality in AF patients recruited in 2007 and 2016, respectively (8.0% in 2007 vs. 7.8% in 2016) [2]. Such trends highlight that risk management of mortality in patients with AF remains a concern.
With the development of computational technologies, machine learning (ML) approaches are increasingly being applied in AF-related fields. Compared to traditional regression models, ML models have the ability to handle a large number of variables, even if there is an intrinsic correlation between these variables [3]. This enables ML models to identify some non-traditional or previously unidentified risk factors and accurately assess their relative importance in predicting outcomes. However, despite the growing interest in ML, there is a notable lack of models tailored to 1-year all-cause mortality in AF patients.
In this edition of the Polish Heart Journal, Wang et al. [4] developed a risk-scoring system to predict 1-year all-cause mortality in AF patients, the CRAMB score, using the eXtreme Gradient Boosting (XGBoost) model. The study included 26 365 AF patients from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 2.2) database as a derivation cohort, which was randomly allocated to training and test cohorts in a 3:1 ratio. In the training cohort, features that were important for the endpoint were pre-screened using the XGBoost model. Charlson Comorbidity Index (CCI), readmission status, age, metastatic solid tumor, and maximum value of blood urea nitrogen were finally imported to construct the XGBoost model. Subsequently, the performance of this model was evaluated in the test cohort and the external validation cohort (from the Chinese intensive care unit database), respectively. Then, the contribution of the different features for predicting the target endpoint was assessed. The results showed that the CRAMB score had an area under the curve (AUC) of 0.765 (95% confidence interval [CI], 0.753–0.776) in the test cohort, which was significantly better than the CCI (AUC 0.733; 95% CI, 0.720–0.746) and CHA2DS2-VASc (AUC 0.617; 95% CI, 0.603–0.631). In the external validation cohort, the CRAMB score had an AUC of 0.582 (95% CI, 0.502–0.657), which was also superior to the CHA2DS2-VASc. In addition, decision curve analysis showed that the net benefit of the CRAMB score exceeded that of CHA2DS2-VASc. This suggests a potential advantage of the CRAMB score in clinical applications for more accurate assessment of the 1-year risk of all-cause mortality in AF patients than the CHA2DS2-VASc score.
While traditional risk assessment scores have helped clinicians identify the long-term risk of all-cause mortality in AF patients, the article by Wang et al. provides a good opportunity to discuss whether ML models, like CRAMB score, are more effective compared to conventional clinical risk scores. It may not be completely fair to compare the most widely used clinical scores, CHADS2 and CHA2DS2-VASc scores, which were initially used to predict thromboembolic risk in AF patients.
However, other scores have been proposed combining CHADS2 and CHA2DS2-VASc with estimated glomerular filtration rate and creatinine clearance have shown some promise, albeit with the best-performing one only having an AUC of 0.734 in predicting 1-year all-cause mortality in AF patients [5].
In addition, the GARFIELD-AF risk score had an AUC of 0.77 for identifying 1-year all-cause mortality [6]. However, the complexity of GARFIELD-AF requires more comprehensive data inputs, which, on the one hand, may limit the application of the model in real clinical settings and, on the other hand, may lead to instability in model predictions, especially when confronted with unbalanced data. This highlights the necessity for simpler yet effective predictive tools that are easily implemented in routine practice to enhance clinical decision-making.
In contrast, the CRAMB score offers a straightforward approach with only five key features. Its simplicity reduces the burden of data collection and potentially makes it more practical for clinical applications. Notably, the CRAMB score includes predictors such as CCI and readmission status, which are not part of CHA2DS2-VASc, providing additional insights that could guide future model enhancements. The CRAMB score also demonstrated strong performance in the test cohort, achieving an AUC of 0.765, which surpassed that of CHA2DS2-VASc. This underscores the potential of ML models to not only simplify prediction but also deliver predictive accuracy on par with, or even exceeding, traditional clinical scores.
Although the CRAMB score shows potential, its use in clinical practice still has a long way to go. One of the main issues is the very poor performance of the CRAMB score in external validation from a different ethnicity cohort, with an AUC of only 0.582. This suggests that the model may not generalize well across populations, particularly those with different ethnic backgrounds compared to the derived cohort.
Ethnicity is now generally recognized as an important factor in the epidemiology as well as in the management of AF, with recognized ethnic differences in AF-related complications [7, 8]. Increasing focus has also been on sex differences in risks, with implications for clinical risk stratification [9–11]. These issues highlight the possible need for ethnicity-specific ML models for AF mortality to ensure their reliability and utility in real-world settings [12].
Another limitation is that the CRAMB score relies on the XGBoost algorithm for both feature selection and model construction. While XGBoost is a powerful ML algorithm known for its high accuracy and efficiency, relying on it alone may bias feature selection and may miss opportunities to compare performance with other ML algorithms (e.g., neural network, random forest) that may have complementary strengths or show different insights [13]. The reliance on metastatic solid tumors and maximum blood urea nitrogen that are not easy to obtain may also limit the usability of the CRAMB score. Finally, risk is not static but dynamic, changing with age and incident comorbidities, and ML models need to account for the dynamic nature of changing risk factors [14].
Overall, the CRAMB score presents a simplified ML model for predicting 1-year all-cause mortality in AF patients with only five key features. However, the poor external validation performance of this method highlights the problem of its generalizability, especially in ethnically diverse populations (as we previously reported [15]). Future studies should aim to validate these models in ethnically diverse cohorts and explore a wider range of ML algorithms, including recent foundation models, to improve predictive accuracy and clinical relevance in AF management.
Article information
Conflict of interest: None declared.
Funding: None.
Open access: This article is available in open access under Creative Common Attribution-Non-Commercial-No Derivatives 4.0 International (CC BY-NC-ND 4.0) license, which allows downloading and sharing articles with others as long as they credit the authors and the publisher, but without permission to change them in any way or use them commercially. For commercial use, please contact the journal office at polishheartjournal@ptkardio.pl