In the present study, the psychometric properties of the HADS have been evaluated in a large general population of older people. Overall, the HADS showed to be a valid instrument to measure psychological distress in the current population. The original two-factor structure was confirmed, internal consistency was satisfactory and no DIF for sex was detected. Problems with floor effects were shown for all items.

The distribution of item responses was highly skewed towards lower scores and floor effects were shown for all items. However, all of the item response alternatives were endorsed which indicate that all response categories are relevant. A potential problem with this skewed distribution and floor/ceiling effects could be a negative impact on sensitivity and responsiveness [29]. This has been seen in other studies using the HADS [30, 31] and could therefore be expected. Further, this study was based on data from a general population where a limited proportion has shown to have symptoms of anxiety and depression. Thus, this problem is probably related to the sample rather than the instrument.

According to the Little MCAR test, missing data was not completely missing at random which indicate a systematic drop out. However, the number of missing responses was very low. Additionally, as many other statistical tests, the Little MCAR test is sensible to large sample sizes and a statistically significant result does not necessarily imply that it is clinically important [32]. The low rate of missing data indicates that the items are easy to understand and that the instrument is not too extensive and burdensome to complete for the respondents.

The CFA in the present study showed support for the hypothesized two-factor structure with two latent variables, anxiety and depression, which also is demonstrated in previous research regarding community-based healthy older people [10, 16]. However, our results identified problems with cross-loadings for item 7 and 8. This problem has been addressed in previous studies [15, 20]. After these items were allowed to cross-load on both factors (model II), the model fit was excellent according to all indices. It has been suggested that item 8 (“I feel like as if I have slowed down”) could be interpreted as age-related slowing down [15] and that item 7 (“I can sit at ease and feel relaxed”) both refers to psychomotor agitation and the anhedonia domain of the depression subscale and therefore loads both into the anxiety and depression factor [28]. This may explain why these two items seem to be indicators for both anxiety and depression. Even if the model fit increased in model II, the cross-loadings resulted in poor factor loadings below 0.5 and increased residual variances for both item 7 and 8. These findings indicate that the original two-factor model should be preferred despite that the RMSEA is above 0.06. Using the hypothesized two-factor model would also facilitate comparisons between studies. However, this problem needs to be addressed in further studies and users should be aware of this limitation of the HADS.

According to the skewed distribution with few responses on the third (2) and fourth (3) category, these two were collapsed in order to examine if this would increase the model fit further. This third model resulted in excellent fit, very close to the findings from model II. This finding indicates that the skewed distributions, with pronounced floor effects, did not have any serious effect on the factor structure. In addition, this third model was evaluated for statistical reasons and should not be applied for clinical use.

Although anxiety and depression are known to represent two different constructs, they are highly correlated. In our study the correlation between the two latent factors was strong, which is consistent with the understanding that there are symptomatic overlaps between anxiety and depression [33].

The internal consistency is well supported by both ordinal as well as traditional Cronbach’s alpha values for both HADS Anxiety and HADS Depression. This is similar to findings from studies in the same age group [2, 16] and thus supports the robustness of the scale for older people.

Our results show that the HADS can be used to make invariant comparisons between men and women even if the group variable and interaction term was significantly associated with item responses for all items. With a large sample, also small and meaningless associations will be highly significant. Therefore, pseudo R2 changes should be used to exam DIF rather than statistical significance. The effect measured with McFadden R2 was low which implies that no meaningful DIF was present. According to Zumbo, [22] R2 changes above 0.130 are required to determine the presence of DIF. This criteria has in later research been criticized for being too liberal [34] but even when more conservative criterion (R2 ≥ 0.035) suggested by Jodoin & Giel, [35] was applied, no meaningful DIF was present. Some few studies have previously evaluated the measurement invariance of the HADS in relation to sex with diverging results. The HADS was shown to be a valid tool for comparisons between sexes in a population of cardiac patients [36] and in a population of outpatients attending a musculoskeletal rehabilitation program [37]. Yet, when the HADS was evaluated in a population of patients who had undergone heart surgery, DIF for sex was found [38]. Further, DIF for age and sex was found for those 55 years and over in a primary care setting [39]. Even if our study showed absence of DIF for sex in an older general population there are further needs for evaluating DIF in other groups, such as age and ethnicity.

Methodological considerations

The large random sample of a general population is a strength of the present study, though one consequence of a large statistic power is the increased risk to detect statistically significant results of minor importance. We have therefore combined the use of p-values with other statistical methods to evaluate the psychometric properties, for example graphs and effect size measures. One potential limitation is that the upper age was limited to 80 years. No strong conclusions about the HADS can therefore be drawn about the oldest. Our findings need therefore to be confirmed in age groups above 80 years. The dropout rate of 33% is in line with what could be expected in this type of surveys [40]. In fact, the population in this study was in the age group of 65–80 years, where disability and poor health is more common than among younger people. Therefore, the dropout rate can be considered as low. A large drop out may have serious consequences for external validity. However, in psychometric studies large drop outs are seldom a problem as long as it will not affect the variation in data, for example that not all response categories are used. According to the score distribution this was not a problem in the present study. Another strength of this study is that we have used appropriate statistical methods for ordinal level data, which strengthens the statistical validity.