Overall, eight differences were found between the rankings of health states using the expectation of the public and the experience of patients. Given that there were six health states and five consequences, a maximum of 75 (15 per consequence) pairwise differences could have been found. Therefore, 11% of the total possible pairwise differences were found. This figure is an understatement of the extent of differences between expectations and experience because from the 15 potential pairwise differences per consequence, a total of eight pairwise comparisons involve a health state that dominates the other (i.e. in the pairwise comparison a health state has either equal or less problems in every dimension than the other health state). The eight differences therefore represent 23% of total non-dominated pairwise comparisons (8/(7*5)). The findings in this paper may be a conservative estimate of the true discrepancy between expectations and experience because the comparison was at an ordinal level and thus cardinal differences could not be investigated. Overall, this study indicates that this novel method can be used to assess whether members of the public are informed and the evidence suggests that although the participants in this study are not grossly misinformed about how health affects the six consequences, their expectations are not accurate even on an ordinal scale.

The most frequent difference in the ranking was because the public underestimated the effects of moderate problems in usual activities compared to moderate problems in mobility. Both other differences in ranking involved health state 11334. For the consequences of enjoyment and relationships health state 11334 was underestimated compared to health state 44553. This meant that severe anxiety or depression was underestimated compared to problems in the other four dimensions. For the consequence of independence, health state 11334 was overestimated compared to 32322. This means that anxiety or depression combined with pain or discomfort was overestimated compared to problems in mobility and self-care for independence.

The STR and beta regression results for patient experience indicate that anxiety or depression and usual activities are the two dimensions with the largest odds ratios for the five consequences. These results have face validity given that some consequences would be expected to correlate with some health dimensions. For example, for the activities consequence it was the usual activities dimension that was associated with the largest odds ratio and for enjoyment it was the anxiety or depression dimension that was associated with the largest odds ratio.

The regression results of this study confirm the findings in the literature that the association of the mental health dimension (i.e. anxiety or depression) with subjective well-being is stronger than with other health dimensions [9, 10, 41]. Dolan and Metcalfe [9] find that problems in anxiety or depression are associated with a detriment in enjoyment about 10 times as large as mobility problems. Similarly, in this study the beta regression model estimated that odds of having worse enjoyment were higher for anxiety or depression problems than for any other dimension. Dolan and Metcalfe [9] argue that comparing values derived from preferences to measurements of subjective well-being shows that members of the general public undervalue mental health compared to physical health. In this study, the public underestimated the effect of having anxiety or depression problems on enjoyment and personal relationships when comparing states 11334 to 44553. However, in this study anxiety or depression was not underestimated when comparing states 44535 to 44553 (i.e. comparing extreme pain or discomfort to extreme anxiety or depression directly). No literature has been identified on whether anxiety or depression is underestimated for other consequences. While in this sample the public underestimated the effect of anxiety or depression on enjoyment and relationships, they did not underestimate the effect of anxiety or depression on other consequences.

The findings of this study, if replicated in a larger study, have implications for the use of preference-elicitation tasks for resource allocation in health care. For example, the public’s beliefs about the consequences of problems in usual activities compared to problems in mobility were not in line with patient experience. This can mean that the use of preferences for evaluating interventions undervalues improvements in usual activities compared to improvements mobility, although the problem is lessened if improvements in the two dimensions are correlated. Similarly, for consequences such as enjoyment and relationships the public underestimated problems with anxiety or depression compared to problems in the physical dimensions. As a result, interventions that improve mental health could be undervalued. Using uninformed preferences to value the EQ-5D could thus result in sub-optimal policy recommendations. Research that focuses on encouraging more informed preferences by developing methods to inform the public of the consequences of health states could be continued [42]. One possibility for better informed preferences is to provide members of the general public with more information about the experience of patients. There is also the possibility to move further away from existing methods. One suggestion in the literature is the use of patient preferences, which may have the benefit of more closely matching experience and expectations, but requires patients to imagine full health [8] and has practical limitations [43]. An altogether different approach is to use general population preferences by developing a descriptive classification based on the consequences, perhaps by using ICECAP-A [44] or another well-being based descriptive system [43]. In addition, it would be important to know how different informed and uninformed preferences are, and if those differences are of practical significance to cost-effectiveness analysis. This research will ultimately result in a value set that is more defensible and is more in line with what the general public would want if they were informed about the consequences of health states.

The limitations in this study include the phrasing of the expectations and experience questions, the method of comparing the two datasets, and the study sample. As shown in Table 2, there are differences in the phrasing of the questions in the participant expectations and patient experience datasets. For example, the experience independence question (originally in the ICECAP-A [44]) does not mention control but the expectation question does. Another difference between the expectations and experience is that the scales are different, and therefore comparison could only be made on an ordinal basis. It may be that ordinal rankings are correct, while relative cardinal values are not. The datasets used in this study have limitations. The MIC datasets had few respondents in the worst response level of the EQ-5D-5L dimensions and more observations for those levels would make inferences more reliable. The MIC datasets is cross-sectional and there is a potential for endogeneity in this type of cross-sectional datasets [45]. A longitudinal panel dataset would be useful to account for individual heterogeneity and can more easily assess causality [45]. Additionally, only six health states were used in this study. To obtain a broader view of the difference between expectations and experience a wider range of health states would be needed. This is further discussed in the future research section.

There are limitations in the sampling of both the public expectations and patient experience datasets. Ideally, both samples should be comparable and representative of the population. The sample of members of the general public in the expectations dataset was small and not representative of the general UK population. This is particularly important because average beliefs of members of the public were compared to average patient experiences. The MIC sample was an online sample, and although online samples are being used more frequently in health economics studies [46] they may suffer from self-selection (though any sampling method would). Lastly, this paper assumes that the ranking of the average of the public’s expectation should be the same as the ranking of the average patient’s experience. This assumption can be justified because the EQ-5D is a generic and broad instrument and various diseases map on to the same EQ-5D health profile. The average expectations thus should be close to the average experience. Furthermore, difference between expectations and experience could be caused by adaptation. The expectations of the general public are elicited by asking them to consider living 10 years in a health state, although they may rather be focused on the transition to ill health [7], but how long a patient has been in the health state is unknown. The effect of this is difficult to estimate, but it is not necessarily expected to affect the ordinal ranking of the experience and expectations of the health state.

Future research can, in the first instance, focus on addressing the limitations of this study to more accurately assess the extent to which preferences are informed. First, the sample could be improved by measuring expectations from a larger and representative group of members of the general public as is typically recommended for valuation studies [2]. Second, a larger number of health states could be included, which can provide a better overall indication of whether the public is informed. Third, the study methods could be improved by using the same question and scales for both the expected and experience questions, which would allow for better comparison between the two. If the same questions are used alongside a larger number of health states the analysis could then focus on analyzing the difference between expectations and experiences in a joint regression model and could thus include both statistical significance testing and could investigate health state dimensions and levels rather than health states, which may reveal whether there are certain dimensions about which the public is less informed.

One issue to be further investigated is the effect of using of self-reported questions to measure consequences and whether differences between experiences and expectations are partly driven by response scale heterogeneity [47]. In this study, one item from existing questionnaires to measure consequence was chosen, but other items that measure experience could be tested. There may also be a benefit to more sophisticated elicitation procedures. For example, probabilistic expectations can be elicited, which has been previously implemented to elicit expectations of future earnings given educational attainment [48]. In addition, uncertainty in beliefs may need to be included [48]. Different probability elicitation procedures exist for eliciting uncertainty and they may have implications for the truthfulness of reported expectations [49]. The method in this study could also be conducted with other generic preference-based measures such as the SF-6D [50] and HUI [51]. In particular, the comparison between measures based on the ‘within the skin’ approach and measures that focus more the social context or impairment may be interesting. It may be that judging the consequences is difficult when using a within the skin type measures, such as the HUI, and the results of this study cannot necessarily be generalized to other measures.