Introduction
The assessment of osteoporotic patients should include several points. The most important measurement designed for diagnosing osteoporosis is bone densitometry by means of dual-energy X-ray absorptiometry (DXA), regarded as the gold standard [1]. One should also remember to gather data on clinical fracture risk factors. The role of DXA and clinical assessment should be considered as the way to fracture risk assessment. A fracture is an essential event in the osteoporotic population, and it is one of the most important risk factors for the next fracture. Therefore, the crucial aim is fracture prediction and an attempt to avoid the first fracture. Some methods have been developed to predict future fracture(s) [2–5]. Independently of such possibilities, in daily practice with patients, medical professionals should remember about physical assessment. One of the most important points is height measurement. The risk established by medical examination, including, among others, body height measurement, gives a complete picture of the patient’s clinical condition. In several studies, height loss, defined as current height compared with the former maximal height, was studied and described as a sign of the risk for future fractures [6–24]. HL is not included in the available calculators as a fracture risk factor, and thus this important clinical information is omitted in widely used fracture risk assessment tools.
The aim of the current prospective study was to verify the hypothesis that height loss may predict fracture incidence.
Material and methods
The current analysis is based on the epidemiological female sample from the RAC-OST-POL study. The baseline observation was performed in 2010, and epidemiological data were presented earlier [25]. At baseline, data were collected in 978 postmenopausal women at a mean age of 66.48 ± 7.6 years. The recruited group included more than 625 subjects randomly invited for participation in the study [25] because other women who came additionally did not differ in regard to their main features (age, place of residence, educational level, and marital status), and thus they were also included in the baseline study group for longitudinal observation. Body height was measured at baseline using a wall stadiometer (Seca, Germany), mounted according to the manufacturer’s recommendations.
The measured body height was compared with the maximum body height in early adulthood, as reported by the patient. All the body height measurements were performed by one DXA technician.
Bone densitometry at the proximal femur was performed using a Lunar DPX (GE Healthcare, Madison, WI, USA). CV% for the femoral neck was 1.6%, and for total hip 0.82%.
Afterwards, all subjects were asked annually, using phone interviews, to report new fractures that had occurred. Finally, in a 10-year follow-up, 640 patients at the mean age of 75.04 ± 6.95 years remained under observation. Only low-trauma fractures were recorded. All interviews were performed by one experienced investigator (WP).
The study was approved by the Ethics committee of the Medical University of Silesia (KNW/0022/KB1/132/10). At baseline, in 2010, all participants gave their written informed consent.
Statistics
Statistical analysis was performed using Statistica software (StatSoft, Tulsa, OK, United States) and PQStat v. 1.8.2.226 (PQStat Software, https://pqstat.pl/). Mean values and standard deviations were used for descriptive statistics of continuous variables. The normality of data distribution was verified by the Shapiro-Wilk test. Student’s t-test or the Mann-Whitney U test for data with and without a normal distribution, respectively, were applied for comparative analyses. The significance of the results in all the statistical analyses was assumed at p < 0.05.
In the presented study, 5-fold stratified cross-validation was applied. It means that the dataset was divided into 5 disjoint subsets containing roughly the same proportions of examples from each class. Four-fifths of the dataset for each split were treated as the training set, and the remaining one-fifth was used as the test set. To assess the prediction accuracy of the analysed regression models, the receiver operating characteristic (ROC) was studied as well as the area under the curve (AUC) calculation using the DeLong method. The potential occurrence of confounding variables was also investigated, and the 10% rule was applied (the rule states that when the odds ratio (OR) changes by 10% or more upon including a confounding variable, the confounder must be controlled by leaving it in the model. When a 10% change in OR is not observed, the confounding variable can be removed from the model) [26].
Medical datasets are very often imbalanced, which means that the number of cases describing healthy people severely outnumbers cases of people suffering from a specific disease. Machine learning (ML) from such class-imbalanced data is a big challenge. The skewed distribution of the training examples makes standard classifiers biased, favouring the majority class, and it makes the detection of rare instances impossible. However, because the minority class, e.g. sick patients, is more interesting and important, it is essential to deal with this problem. It can be done by either undersampling the majority class or oversampling the minority class. In the presented research, various methods of resampling were tested regarding the analysed data, including the synthetic minority over-sampling technique (SMOTE), random undersampling (RU), as well as methods combining RU and the k-nearest neighbours (kNN) algorithm [27–29].
Results
In follow-up, 190 osteoporotic fractures were noted in 129 women at the following skeletal sites: forearm 81, spine 30, ankle 25, hip 15, arm 13, rib 9, feet 7, clavicula 7, and pelvis 3. Ninety-one women had one fracture in the following skeletal sites: forearm 42, spine 8, ankle 16, hip 7, arm 5, rib 6, feet 4, clavicula 2, and pelvis 1. Multiple fractures occurred in 38 patients, and the total number of fractures in this subgroup was 99: forearm 39, spine 22, ankle 9, hip 8, arm 8, rib 3, feet 3, clavicula 5, and pelvis 2. Two fractures occurred in 24 subjects, 3 in 7 subjects, 4 in 5 subjects, and 5 fractures in 2 subjects.
The presented studies focused on the analysis of the influence of HL, the DXA result expressed as T-score for femoral neck (FN) bone mineral density (BMD), and age on the occurrence of low-trauma fractures. The histograms showing the distributions of mentioned factors for all examined women are presented in Figure 1. The mean values for age, HL, and FN T-score in each patient subgroup defined by the number of fractures are shown in Table 1.
Group label |
Group description |
Group size |
Age at enrolment [years] |
HL [cm] |
FN T-score |
Whole group |
All patients |
640 |
65.04 ± 6.95 |
4.93 ± 3.53 |
–1.24 ± 0.92 |
Subgroup_0 |
Patients without fracture |
511 |
64.64 ± 6.83 |
4.8 ± 3.58 |
–1.18 ± 0.92 |
Subgroup_1 |
Patients with one fracture |
91 |
65.98 ± 7.17 |
4.8 ± 2.66 |
–1.41 ± 0.90 |
Subgroup_10 |
Patients with one fracture or without fracture(s) |
602 |
64.84 ± 6.89 |
4.8 ± 3.45 |
–1.22 ± 0.92 |
Subgroup_Any |
Patients with (any) fracture(s) |
129 |
66.61 ± 7.22 |
5.46 ± 3.28 |
–1.46 ± 0.91 |
Subgroup_Multi |
Patients with multiple (n > 1) fractures |
38 |
68.12 ± 7.2 |
7.03 ± 4.06 |
–1.58 ± 0.93 |
The Shapiro-Wilk test confirmed that age and HL have a normal distribution, but only in patients with multiple fractures. In other groups and for the entirety of the data, the HL did not have a normal distribution. In contrast, the FN T-score had normal distribution in subgroups of patients with one fracture or any fracture(s).
Depending on the results of the normality test, Student’s t-test or the Mann-Whitney U test was used for comparative analyses. In comparative analyses, the following significant differences in the measured variables were identified:
- — comparing Subgroup_Any with Subgroup_0, age of patients with (any) fracture is significantly higher (p < 0.01) than age of subjects without fractures
- — comparing Subgroup_Multi with Subgroup_10, age of patients with the multiple fractures is significantly higher (p < 0.01) than age of subjects without fractures and subjects with one fracture
- — comparing Subgroup_Any with Subgroup_0, height loss in patients with (any) fracture is significantly greater (p < 0.05) than height loss in subjects without fractures
- — comparing Subgroup_Multi with Subgroup_0, height loss in patients with the multiple fractures is significantly greater (p < 0.01) than height loss in subjects without fractures
- — comparing Subgroup_Multi with Subgroup_10, height loss in patients with multiple fractures is significantly greater (p < 0.01) than height loss in subjects with 1 fracture or without any fracture
- — comparing Subgroup_Any with Subgroup_0, T-score for FN BMD in patients with (any) fracture is significantly higher (p < 0.05) than T-score for FN BMD measured in subjects without fractures
- — comparing Subgroup_Multi with Subgroup_10, T-score for FN BMD in patients with multiple fractures is significantly higher (p < 0.05) than T-score for FN BMD in subjects without fractures plus subjects with one fracture
- — comparing Subgroup_1 with Subgroup_0, T-score for FN BMD in patients with one fracture is significantly higher (p < 0.05) than T-score for FN BMD in subjects without fractures
Butterfly charts were used to better illustrate the HL differences between the compared groups. Graphs for Subgroup_Any and Subgroup_0 as well as for Subgroup_Multi and Subgroup_10 are presented in Figure 2.
It can be seen that the HL distributions have greater difference in the subgroups of patients without fractures or with a single fracture versus subjects with multiple fractures (Subgroup_Multi vs. Subgroup_10 — Fig. 2B) than in fractured and unfractured subgroups (Subgroup_Any vs. Subgroup_0 — Fig 2A). Thus, it can be considered that HL may be a good predictor of multiple fractures. Logistic regression was performed to verify this presumption and to estimate the relationship between HL and multiple fractures.
First, 3 simple logistic regression models were created using, sequentially, HL, FN T-score, and age sequentially as independent variables and multiple fractures as a binary dependent variable (coded 0 and 1, respectively). This preliminary assessment of potential risk factors for multiple fractures showed a statistically significant relationship for each mentioned model (p < 0.05).
Logistic regression requires that the independent variables are not too highly correlated with each other; therefore, the pairwise correlations among the predictors were determined before the creation of the multivariable regression models. The following values of correlation coefficients were obtained: 0.392 for age and HL, –0.302 for age and FN T-score, and –0.181 for HL and FN T-score.
Multivariable regression models based on the 3 predictors mentioned above confirmed the statistical significance in these models only for HL.
Age correlated with both the independent variable HL and the dependent variable, multiple fractures, so it satisfied 2 conditions to be a confounding variable. Both correlations were positive, so we could expect a positive bias that might occur when the age variable was excluded from the model. That meant that the bias could overestimate the strength of the effect of the independent variables on the regression output.
According to the model without age (model 1), the odds of multiple fractures occurrence increased over 1.147 times (14.7%) with each additional centimetre of height loss. The odds ratio after adjusting for age (model 2) was 1.117, which confirmed that in the first model the coefficient for HL was overestimated. However, the difference in OR values for both models was relatively small. In accordance with “10% rule”, age did not need to be considered in the presented case unless there were other premises for it.
For the model without age, the achieved prediction accuracy expressed by the area under the ROC curve (AUC) is 0.669 (95% CI: 0.579–0.760). This value was compared with the AUC value calculated for the model based exclusively on the FN T-score, which is treated as a gold standard for the diagnosis of osteoporosis. In this case, the AUC is only 0.596 (95% CI: 0.497–0.695). Thus, it can be concluded that the model considering HL is better than the currently accepted standard.
ROC curves for the compared models are presented in Figure 3.
Analysing the ROC curve for the HL model and determining the point on the curve where the sensitivity and specificity of the test are equal, 6 cm is indicated as the cut-off point (Fig. 3). It means that a loss of height of at least 6 cm should be treated as a predictor of multiple fractures.
An analysis of the OR profiles was also carried out to verify the cut-off value. The shape of the relationship between the HL and the predicted variable (multiple fractures) was studied (Fig. 4).
It allowed us to check whether the shape of that relationship was close to linear and if it was sufficient to determine the unit OR, but it could have been more advantageous to divide the predictor variable into categories, i.e. to discretise it.
After analysing the unit changes of OR and its profile, as well as the distribution of height loss, it was split into 2 categories with a threshold of 6. The HL discretization resulted in OR [95% CI: 3.059 (1.562; 5.991)]. That means that the individuals with HL ≥ 6 cm have more than 3 times (200%) greater chance of multiple fractures than those with less HL.
The analysed dataset was characterized by a high degree of imbalance. The imbalance ratio (IR), calculated as the ratio of the number of patients in Group_10 and patients in Group_Multi, is 15.84, which means that patients with multiple fractures constituted only 5.95% of all analysed cases.
It is well known that the overall performance of ML models built on imbalanced datasets is limited by their ability to predict rare/minority objects. Therefore, to improve the efficiency of the analysed model, we tried to fix the imbalance using under- and oversampling methods. However, because it is clear that any intrusion in the source dataset by its under- and/or over-sampling can cause the distortion of data, an attempt was made to find the optimal level of resampling to ensure satisfactory prediction accuracy without undue data interference.
Admittedly, there are some general guidelines for the use of different sampling methods; however, in practice, the approach that will bring the best results for a given dataset needs to be found experimentally. We tested some over- and under-sampling methods, and a combination of SMOTE and KNN_RU methods seems to be a good choice. Even relatively small over- and under-sampling using these methods resulted in more than 15% improvement in the AUC value compared to the value obtained for the original data.
Discussion
The most important finding of the current study is the observation that the magnitude of HL predicts multiple fractures in a long-term follow-up. The role of HL as a predictor of multiple fractures was not analysed in other studies. Therefore, current observation is a novelty in investigations in the field of osteoporosis. Generally, HL was greater in women with fractures noted during longitudinal observation in comparison to those without fractures. However, when women with one fracture were compared separately with those without fractures, no significant difference was found. This observation suggests that HL is not a significant predictor of a single fracture. We consider that HL should always be included in patients’ examinations, and HL of at least 6 cm is the predictor of multiple fractures. It should also be taken into account that HL may be a marker of a previous fracture, in particular a vertebral fracture. In assessing the risk of future fractures, HL can be a surrogate for a previous vertebral fracture, whether it has been diagnosed or not and whether the patient is aware of it.
The importance of HL was presented in several studies mentioned in the introduction, but only some of them described data from prospective studies [9–13, 19, 21, 23, 24]. Generally, in these studies, HL was significantly greater in subjects with prior fractures than in patients without fractures. Our finding of higher HL in subjects with any fracture than in patients without fracture is comparable to results from the cited studies. In our previous cross-sectional study a HL of 3–4 cm or more was considered a simple indicator of fracture risk [22]. HL provided very similar information as fracture risk assessment established by online available calculators [2–5].
A subgroup with multiple fractures was not separately analysed in investigations presented by other authors. We observed that multiple fractures were related to more pronounced HL, and HL greater than 6 cm should be considered as a threshold of risk of multiple fractures. We consider that this observation is a novelty of the current study. Such a finding may be easily implemented in daily practice. One may hypothesize that each subject with HL exceeding 6 cm should be considered as an individual at high fracture risk. Individuals with HL ≥ 6 cm have more than 3 times (200%) greater risk of multiple fractures than those with less HL. Such observation is of great importance in daily work with patients. It is also worth emphasizing that the determination of HL is easily measurable in daily practice. It does not generate costs and does not require any specialist medical examination.
The comparison of the achieved prediction accuracy expressed by the area under the ROC is also very interesting. The AUC value for HL is 0.669, whereas the AUC value calculated for the model based exclusively on the FN T-score is only 0.596.
Our study has some limitations. We noted only 38 subjects with multiple fractures, the study was performed only on women, and spine X-ray was not routinely done in follow-up, so some clinically silent vertebral fractures might have been omitted. However, the observed group was a representative, epidemiological female sample, and the dropout during the period of observation was acceptable because 65.4% of the baseline population remained at follow-up. Also, despite the relatively low number of women with multiple fractures, the total number of such fractures (n = 99) seems to be sufficient for a reliable assessment of the role of HL as an important measurement in elderly women. To minimize the impact of the class imbalance on the machine learning model used for prediction, various combinations of data balancing methods were also tested. The conducted analyses showed that the use of the SMOTE oversampling technique in combination with RU_kNN undersampling allowed us to improve the precision of prediction. The population observed in the current study allowed us to conclude other important considerations on fracture risk assessment [30]. The use of a randomly recruited, epidemiological sample of postmenopausal women enhances the significance of the current observation for practitioners in their daily work with patients. The study was performed in the same epidemiological cohort of RAC-OST-POL that was analysed in studies presenting the role of falls in osteoporotic population [31, 32].
Concluding, HL predicts multiple fractures in a prospective observation of a representative epidemiological female sample. HL of at least 6 cm is the predictor of multiple fractures. The measurement of HL should always be included in patients’ assessments.
Data availability statement
Data are available on request.
Ethics statement
The study was approved by the Ethics Committee of the Medical University of Silesia (KNW/0022/KB1/132/10).
Author contributions
W.P.: concept and design of the study, acquisition of data, analysis and interpretation of data, drafting of the manuscript, final approval of the submitted version (the first author); P.A.: analysis and interpretation of data, drafting of the manuscript, final approval of the submitted version; A.W., M.B.: participation in data preprocessing, supervision and performance of the statistical analysis and results visualisation, performing data mining experiments, including data balancing and creating mathematical models, writing some parts of the manuscript, and final approval of the submitted version. All this can be considered as the performance of tasks in the Technical Informatics and Telecommunications discipline defined by the Polish Ministry of Education. B.D.: analysis and interpretation of data, critical revision of the manuscript, final approval of the submitted version.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Acknowledgments
Non declared.
Conflict of interest
W.P., P.A., A.W., M.B., and B.D. declare that they have no conflict of interest related to this manuscript.