In this work we have investigated deep learning approaches on the NHANES (2003–2006) locomotor physical activity data. We estimate biological age (BA) based on the physical activity and chronological age (CA). To quantify how well the estimated biological age captures the health risk, we apply the Cox proportional hazard model with all-cause mortality. The deep learning models such as DNN, CNN, and ConvLSTM were trained to exploit the dependence of the physiological/activity changes with age. In all cases, the deep learning approaches were trained to minimize the mean squared error (MSE) between estimated BA and CA, in every epoch.

Parameter choices

We tested the performance of smoothening/filtering the original 1D activity data using different moving averages (simple moving average (SMA), weighted moving average (WMA), and exponential moving average (EMA)). We observed that EMA provided the overall best result. To test the impact of window size (N), we performed experiments using different values. We have considered N = 1, 2, 4, 8, 10, 16, 20, 25, 30, 35, 40. Table S1 shows the impact of the window size used for calculating the moving averages. We show the performance using different window sizes (N) based on two machine learning algorithms, namely, support vector machine (SVM) and random forest (RF). Table S1 shows the variation of window size of exponential moving averages using the Box-Cox transformation. The performance criteria for choosing the window size was to get lower MAE, higher R-squared distance, and higher correlation. From these results, we selected N = 35 as the best overall window size (R2 = 0.48 for SVM). Table S2 shows the impact of λ for Box-Cox transformation. Best results were obtained with λ = 0.9 (R2 = 0.56 for SVM using N = 35). We have reported results for ConvLSTM* using three values of λ; λ = 1 (raw, with no transformation), λ = 0 (log transformation), and λ = 0.9.

For ConvLSTM* layer we have used 128 filters, a kernel size of 3 ×3 with a “ReLU” activation function. The first dense layer has 256 filters and second has 128 filters. Weight initialization was performed by Glorot and Bengio normal initialization38, 30% dropout was performed after each dense layer. We have tried different optimizers such as rmsprop39, Adam40, and Nadam41. Based on the empirical results, we have selected Adam optimizer for this work. Circular padding was used for CNN. Mean square error (MSE) was used for loss function. We used Keras (https://keras.io/) library with Tensorflow (https://www.tensorflow.org/) in the backend to build the deep learning models. All experiments were performed using a NVIDIA 1080Ti graphics processing unit (GPU) running on a Ubuntu 16.04 (operating system) machine with Intel core-i7 processor and 32GB RAM.

Impact of gender

Results reported so far are from a single model that does not consider gender differences. That is, the same model is used for both female and male. However, gender is expected to have an influence on the performance of an age estimation scheme42,43,44. Table 5 shows the results for separate gender specific models. We observe that, for gender specific models, applying normalized biological age acceleration η = (CA−BA)/CA using estimated BA have higher hazard ratios than using chronological age. Moreover, for λ = 0 and λ = 1, using chronological age, the p-values are not significant. Using a separate model for male resulted in higher HR values (for each λ). However, using a separate model for female did not improve the hazard ratios when compared with using the single model for all.

Table 5 Results of the Cox Proportional Hazard model (CoxPH) applied on the normalized biological age acceleration η = (CA−BA)/CA using separate models for female and male subjects. Full size table

Figure 7 shows the KM plots for gender specific models applying η = (CA−BA)/CA (ConvLSTM estimated BA) factored into quartiles. Using separate male model’s KM plots were of similar nature in comparison with the combined KM plots. See Fig. 6(e). However, for separate female model, the KM curve is slightly different although all the quartiles are well separated. To further quantify these results, we perform log-rank test on the models. Table 6 shows the results for log-rank test for separate female and male models. Similar to the results of Cox PH models and KM curves, log-rank test also show better results for male model with higher χ2 values for all the λ variations.

Figure 7 The Kaplan Meier curves for applying \(\eta =\frac{CA-BA}{CA}\) on the physical activity (a) female, and (b) male. Q1, Q2, Q3, and Q4 denote 1st, 2nd, 3rd, and 4th quartiles, respectively. Full size image

Table 6 Results of Log-rank tests applied on the normalized biological age acceleration η = (CA−BA)/CA using separate models for female and male subjects. Full size table

Variations on biological age acceleration

In this work, so far we have used η = (CA−BA)/CA. In previous work11 age acceleration was defined as Δ = CA−BA. We introduce the normalized form to reduce the effect of low values or high values of CA. However, because of the fitting minimization of mean square error (MSE) as the loss function, this definition of η may still suffer from the “regressing to the mean” problem19. To solve the problem we introduce variations of biological aging acceleration. We calculate the difference between individuals’ biological age and the corresponding age, and gender matched cohort average. Thus we define Δ g = BA g −BA, and \({\eta }_{g}=\frac{B{A}_{g}-BA}{CA}\). Figure S2 shows the distribution of age acceleration for η, and η g over age groups. We notice that η g have better distribution for all age groups. For all the age groups except age group 1 (≤30) we observe a shift to the left from η to η g . Table S4 shows the correlation of the average physical activity (PA Avg), chronological age, variations of aging acceleration (η, η g ) with respect to the biomarkers used in this work. η g have higher correlation with most biomarkers. Thus it may be the case that η g , the biological aging acceleration calculated based on the age and gender matched cohort average is more powerful in exposing the relationship between biomarkers and aging.

Connection with general health status

Another way to investigate the performance of the proposed ConvLSTM* in capturing health risks is to consider their possible relationship with known indicators of health risk or how the estimated biological age differentiates between subjects with known diseases and those without. Below we consider these two perspectives in evaluating a BA estimation method.

Relationship with known health indices

For general indices of health status, we can consider the body mass index (BMI), waist to height ratio (WHtR), or the more recently introduced surface based body shape index (SBSI)29 or ABSI30. In particular, we studied the variation of the proposed normalized biological age acceleration (NBAA, denoted η) computed using the estimated BA from ConvLSTM* with variations in the WHtR, and in SBSI categories. Earlier studies by Morkedal et al.45 have shown that the WHtR is a better measure of health status when compared with BMI. Rahman and Adjeroh29 made a similar observation on the superiority of SBSI over BMI. We have also observed the performance of ConvLSTM* with respect to the surface based body shape index (SBSI)29 quartiles. Table 7 shows the log-rank test on the SBSI quartiles. The results are shown using η, for each SBSI category. We observe that, in general the χ2 values increase from first quartile to fourth quartile. However, the increase is not monotonic for all the variations of λ. For example, the χ2-distance decreased from Q 2 (7.75) to Q 3 (2.89) and then increased for Q 4 (15.83) for λ = 1, other variations (λ = 0, 0.9) follow a similar trend. We observe a similar trend for male-only models as well. Using female-only models, the χ2-distances increased monotonically for λ = 1, 0. The performance of ConvLSTM* with respect to the waist-to-height ratio (WHtR) quartiles is of similar nature to the results on SBSI quartiles (see Table 8). We observe that, χ2-distances increase from Q 1 to Q 4 for λ = 0, 0.9. However, the χ2-distances for the fourth quartile are not always greater than those for the third quartile, although they are greater than both first and second quartile. We also observed the relationship between the variants of biological aging acceleration with SBSI. Performance of η g is generally similar with the performance of η in Table 7 for each λ (λ = 0, 0.9, 1) using all, female-only, and male-only models. For λ = 0, χ2-distance increased monotonically from Q 1 (8.74) to Q 4 (65.76), and for λ = 0.9, the χ2-distance increased from Q 1 (6.70) to Q 4 (26.58) for female-only model. In general, we observed significant differences in the χ2-distances between Q1 and Q4, and also between (Q1/Q2) and (Q3/Q4). This was the case for both SBSI and WHtR.

Table 7 Log rank results applying \((\eta =\frac{CA-BA}{CA})\) for different SBSI categories. Results are shown for model with all subjects, female-only, and male-only separately. Q1, Q2, etc. denote 1st quartile, 2nd quartile, etc. Full size table

Table 8 Log rank results applying normalized biological age acceleration \((\eta =\frac{CA-BA}{CA})\) for different WHtR quartiles. Results are shown for model with all subjects, and for separate models for females and males. Q1, Q2, etc. denote 1st quartile, 2nd quartile, etc. Full size table

Relation with disease status

We also considered whether the proposed measure of biological age acceleration would show any difference between healthy subjects and those with certain known diseases. Table 9 shows the results grouped for subjects having chronic diseases such as diabetes, cardio vascular disease (CVD), and kidney disease. On average Δ g = BA g −BA is lower for the individuals having chronic diseases (diabetes = −5.18, kidney = −3.66, and CVD = −2.92) whereas for all subjects Δ g = −0.67. Those that do not suffer from any chronic disease have a Δ g = 0.25 on average. We observe a similar pattern using \({\eta }_{g}=\frac{B{A}_{g}-BA}{CA}\) for the same partition. Positive and Negative refer to average of the subjects having positive and negative Δ respectively. Positive Δ and η corresponds to lower biological age than the chronological age (more healthy), while negative values correspond to higher biological age than the original age. % of negative Δ is higher for subjects with disease (74.53%, 65.38%, and 66.67%), compared with all subjects (56.25%). Subjects with no chronic disease have lowest proportion of negative Δs (52.25%).

Table 9 Performance of estimated biological age of subjects having different chronic diseases. Full size table

These results show that the proposed ConvLSTM* estimated BA locomotor activity data can indeed capture significant information about the health status of the subjects.

Comparison

Pyrkov et al.7 proposed a deep learning architecture for analyzing the physical activity data that is based on a one dimensional convolutional neural network (CNN) architecture. We also implemented a deep neural network (DNN) to estimate biological age (Our own architecture and implementation; motivated from the architecture of Putin et al.22, and also a basic CNN fed into an LSTM model (CNN + LSTM)). These models (DNN and 1D CNN) are used as comparative results. The results on mortality modeling using the Cox model and KM curves have shown the performance of the proposed ConvLSTM* in comparison with DNN and Pyrkov et al.’s7 1D CNN and CNN + LSTM. See Tables 3 and 4 and Fig. 6. The results showed that the proposed ConvLSTM* method generally outperformed the 1D CNN, the CNN + LSTM model, or the DNN. Another way to compare the methods is by considering the estimated chronological age from the methods. Since the deep learning methods were trained to minimize the mean square error between the estimated and the original chronological age, we can compare the methods based on their performance in CA estimation.

Table 10 shows the mean absolute error (MAE), root mean square (RMSE), correlation (CORR), and R-squared value(R-sq) for all the deep learning methods discussed. Results are reported for both training and test datasets. We observe that ConvLSTM* (λ = 1) on the original dataset has the lowest MAE (12.6), RMSE (15.74), R-sq of 0.85, and best correlation (ρ = 0.62). ConvLSTM* with λ = 0 and λ = 0.9 had similar performance (ρ = 0.55 for both, R-sq of 0.85 and 0.80, and MAE of 13.21 and 13.4 respectively). While 1D CNN7 has the best R-sq (0.93) followed by the DNN network (R-sq = 0.89), for MAE and correlation ConvLSTM* and CNN + LSTM model performed better. They also required fewer epochs (10 compared with 100 for DNN and 500 for 1D-CNN). We have also considered 7 × 24 matrix representation followed by using LSTMs for sequences of 60 as a variation of ConvLSTM* architecture. We observe MAE = 16.45, ρ = 0.32 for the test datasets using this variation of architecture.

Table 10 Results of the Deep learning Age Prediction methods. Full size table

The above discussion demonstrates the specific benefits of ConvLSTM* over other deep learning methods when applied to locomotor activity data, namely, improved mortality modeling (using Cox PH, χ2-distance from the log-rank test, and using KM curves) and improved CA prediction (MAE, RMSE, correlation). The improved performance of the proposed ConvLSTM* can be attributed to (1) the use of a data representation that exploits the temporal patterns in the locomotor activity data, and (2) the use of a special deep learning model that combines the power of both CNN and LSTM. As discussed briefly in the introduction, there are several approaches to age estimation, using different types of data. Here, given the significant differences in methodology and datatypes involved, it is difficult to provide a detailed comparison with other non-deep learning approaches, or those that used other types of data. In general, the deep learning approaches on locomotor activity data tended to result in higher MAE when compared with methods that used other datatypes, for instance brain MRI25, or DNA methylation profiles34,46. However, the correlation (and R2-values) are generally similar. Our results using the Cox PH also shows that the performance in modeling mortality is similar to other popular BA methods, such as using the KD method on blood biomarkers.

Does improved CA estimation really imply reduced performance in BA estimation?

All the methods described above use supervised learning that learns in the form of minimizing the difference between estimated biological age and the chronological age itself. This difference has been called biological age acceleration11 in the literature. Pyrkov et al.7 suggested that an improvement in CA estimation can affect the significance of BA acceleration for a particular test that may involve health risks. This also relates to the issue of “paradox of biomarkers” as described by Klemera & Doubal12, and Hochschild47. However, our results show that the proposed ConvLSTM* approach results in a better estimation for chronological age (lower MAE, higher correlation) in comparison with the other deep learning methods. We have also shown that ConvLSTM* on the transformed data (using λ = 0, 0.9) have better BA acceleration and better performance in modeling all-cause mortality using both the Cox PH model and KM curves than 1D CNN, DNN, and CNN + LSTM. The normalized biological age acceleration (η) using the estimated BA from ConvLSTM* on the transformed activity values (λ = 0, 0.9, 1) resulted in a better overall performance in capturing health risks, for instance, in modeling all-cause mortality, when compared with the other deep learning methods, namely, 1D CNN7, CNN + LSTM, and DNN. These results seem to suggest that improved CA estimation may not always lead to a deterioration in BA estimation. The issue might be in how the estimated BA is used for further analysis, rather than the accuracy of the initial chronological age estimation. This clearly warrants further investigation, for instance, studying approaches that can combine the results from the fitting-based models that minimize the mean square error (MSE) with recent approaches (e.g. Pyrkov et al.7, Liu et al.17) that have used proportional risk models for developing methods to estimate biological age, rather than just testing the performance of estimated biological age.