Our data were gathered through an interview-based survey of behaviour and attitudes to GHEs amongst inhabitants of Hanoi and Hung Yen, Vietnam. Interviews were conducted face-to-face and data were recorded on paper. The idea of this survey was based mainly on several previous studies of the effects of medical costs on patients’ lives after treatment28,29,30. One study showed that patients, especially poor patients who had borrowed money to pay for treatment, tended to fall into destitution after receiving hospital treatments28. Many desperate patients had little choice but to live together and support each other as they struggled to earn a living and pay for prolonged treatment29,30. This evidence about the harsh reality of the situation facing seriously ill patients led to recognition that prevention and early detection of disease are critically important.

The project consists of five phases: (1) Questionnaire design; (2) Face-to-face interviews; (3) Quality control for questionnaire answers; (4) Preparing the dataset; (5) Data analysis.

Survey sample

Participants were chosen at random. All mentally competent residents in survey locations were invited to take part. Interviews did not begin until potential participants had been given information about the institutions responsible for the research, the objectives of the research and the methods of analysing the data, and had agreed to take part. Participants have been informed of indirect identifiers in the dataset and have consented to public use of their personal information under the condition that their names must be removed. The dataset—with respondent names being removed—is thus suitable for open access.

Survey design

The survey was conducted between September and November 2016 in locations such as secondary schools, hospitals, companies, government agencies and randomly selected households in Hanoi, including Hospital 125 Thai Thinh (Dong Da District) and Vietnam-Germany Hospital (Hoan Kiem District). The survey team consisted of seven key members who were associated with Vuong & Associates research office and a dozen assistants. Key members wore identification badges in the field.

Interviewers recorded the time taken for each interview. The numbers of refusals and acceptances were reported at the end of each day and summed at the conclusion of the fieldwork.

The survey team adhered to the ethical code of the institutions responsible for the research. All questionnaires were checked and their validity confirmed by the team member who collected them and the team supervisor. Access to the database is open to the public, following the agreement between participants and the research team.

Survey validation

Before the interview respondents were given instructions on the response formats for the various questions, for example to choose only one answer when the question required selection of the most appropriate response amongst multiple choices. For questions where responses were to be given using a numerical scale the interviewer ensured that the respondent understood the scale and gave a score within the allotted range. In addition, all collected questionnaires were checked three times to ensure the quality and validity of the data: when the interviewer returned to the team, when data were entered into the database and before exploratory analysis.

Data collection

A total of 2,479 people were approached, of whom 409 refused to take part. The total number of observations was thus 2,070, two of which were invalid and excluded from analysis, yielding a final sample of 2,068 valid responses. On average, one out of six people refused to take part when invited to do so. Interviews lasted for approximately 12–15 min. Participants were male and female and ranged in age from 13 to 83 years. The female participation rate was 64.08% (1,340/2,068). The average age of participants was 29.17 years (s.d.=10.09, 95% CI: 28.74–29.60). The majority of respondents (60%, 581/2,068; see Fig. 1a) were aged between 18 and 30 years old. Most respondents had had their last GHE less than one year before the date of the interview. The majority of the sample was married (57.35%) and 54.35% of respondents had a stable job (Fig. 1b).

Figure 1: Histograms of participants. By age (a), and job status (b), and distribution of BMI values by sex (c) Fig. 1a shows the distribution of people by age, and within each age group the distribution of times since last GHE. (b) shows the distribution of participants by job status. (c) presents the distribution of average BMI by sex and indicates 25 and 75% quartiles. Full size image

Participants provided us with their weight and height to enable us to calculate their body mass index (BMI). Most participants had a relatively healthy BMI (M=20.848, s.d.=2.67, 95% CI: 20.73–20.96). On average, male respondents had a higher BMI than female respondents (Fig. 1c).

Data and materials

Data were used to analyse patterns in GHE engagement and to assess how specific variables influenced GHE behaviour. Time since last GHE was used as the dependent variable in analyses of factors affecting the frequency with which individuals attended medical checks.

Materials

The raw data were first entered into a MS Excel file, then converted into ‘comma-separated values’ (CSV) format (which can be found at 11102016Med4.csv [Data Citation 1]). Data were analysed in R (3.3.1). Estimates were calculated using the baseline-categorical logit model (BCL)27.

As most variables were categorical and most data for response and predictor variables were discrete we used a logistic model. Logistic models are used to predict the probability of each value of the dependent variable given specific values of the independent variables.

The general equation for the baseline-categorical logit model is:

ln [ π j ( x ) / π J ( x ) ] = α j + β j T x , j = 1 , … , J − 1 .

where x is the independent variable; and π j (x)=P(Y=j|x) its probability. Thus π j =P(Y ij =1), with Y as the dependent variable.

In the logit model under consideration, the probability of an event is computed as:

π j ( x ) = exp ( α j + β j T x ) / [ 1 + J − 1 ∑ h − 1 exp ( α j + β j T x ) ]

Beta coefficients can be regressed directly from the original CSV file. In this case, the reference independent variable’s categories will be set by default. Reference categories cannot, however, be modified by the analyst. Therefore, we perform regression on distribution tables of the sample, in CSV format. File tab4.1.csv [Data Citation 1] is an example of such a table.

We also used linear regression or ordinary least square (OLS) analysis for the numerical variables. The general equation for the OLS analysis is as follows:

Y = α + β 1 X 1 + β 2 X 2 + … + β k X k

Y is a continuous variable; the independent variables X i can be concrete, categorical or continuous.

Response coding

Both questions and participants’ responses were codified into variables and variable categories in our dataset. The demographic variables were as follows: ‘sex’ (male; female), ‘age’, ‘weight’ (in cm) and ‘height’ (in kg). Because the participants were recruited randomly and fieldwork was carried out in a variety of locations it was not practical to measure participants’ height and weight directly, so respondents were asked to provide their most recent measurements of height and weight. Most Vietnamese people memorize their height and weight, as a considerable number of administrative procedures in the country require personal documents for which these measurements are indispensable. In addition it is not complicated to take measurements of one’s height and weight as electronic devices and mobile phone apps for doing so are widely available and fairly easy to use. For these reasons we consider the data provided by respondents to be reliable. From them we calculated BMI, using the formula BMI=weight/(height×height).

Marital status is referred to as ‘MaritalStt’ (married; unmarried; other). Job status was captured as the variable ‘JobStt’ (stable; unstable; student; retired; homemaker; other). Educational attainment was captured as ‘Edu’ (‘PostGrad’ (post-graduate); ‘Grad’ (college/university); ‘Second’ (high school); ‘Hi’ (middle school)). Health insurance status was represented by a binary variable, ‘HealthIns’. Questions concerning weight, height and BMI also appear in the questionnaire.

The variables were time since last medical examination (‘RecExam’) and time since last GHE (‘RecPerExam’) and both were coded as follows: ‘less12’=less than 12 months; ‘b1224’=between 12 and 24 months; ‘g24’=over 24 months; ‘unknown’=respondent unable to recall. Before respondents answered the relevant questions the interviewer carefully explained the difference between them (and made sure the respondent understood the questions properly, in order to ensure that responses were accurate. ‘Time since last medical examination’ is the length of time since the respondent last visited a doctor with symptoms of disease, whereas ‘time since last GHE’ is the length of time since the respondent’s last GHE. GHEs are conducted periodically regardless of whether an individual has any signs of illness or disease and are intended to track individuals’ health status and detect disease at a pre-symptomatic stage. During a GHE, people will receive a list of tests, including clinical examinations and subclinical tests, such as diagnostic imaging and functional exploration.

Reasons for their most recent GHE, captured in the variable ‘RecExam’ were coded as follows: ‘noti.disease’=concerns over illnesses/epidemics; ‘adv.sig’=worrying symptoms; ‘request’=prompted by employer/community/insurance; ‘volunteer’=no immediate reason. We also collected data on how often respondents believed GHEs should be carried out: every 6 months (‘6 m’); every 12 months (‘12 m’), every 18 months (‘18 m’) or less than every 18 months (‘g18m’).

One question dealt with reasons why people might hesitate to take a GHE. Binary yes/no responses to the following reasons were solicited: GHE is a waste of time (‘Wsttime’); GHE is a waste of money (‘Wstmon’); fear of discovering diseases (‘DiscDisease’); little faith in the quality of the medical service (‘Lessbelqual’); do not consider GHEs to be urgent or important (‘NotImp’). A similar format was used to explore reasons for attending a GHE, with options as follows: health is first priority (‘HthyPriority’); GHEs are subsidized by employer/community (‘ComSubsidy’); have acquired the habit of regular GHEs from family/employer (‘Habit’); constantly follow updates on their health measures (‘FlwHealth’).

To gain more insight into the health status of respondents and their families we asked participants whether they or a member of their family were receiving long-term medical treatment (‘PerTrmt’ and ‘AcqTrmt’ respectively; binary responses). We also asked respondents whether they and their family all enjoyed good health ‘StabHthStt’; binary response: ‘yes’ if respondent and family all in good health, otherwise ‘no’). This question was used to evaluate the extent to which family members’ health status is related. Finally we asked what participants’ preferred way of dealing with new symptoms (StChoise) would be, the options were: ‘clinic’=go to the clinic and consult professionals; ‘askrel’=seek advice from family and relatives; ‘selfstudy’=do personal research.

We assumed that individuals’ attitude to health would be correlated with possession of common items of medical equipment and the ability to use them, so we asked the following questions: (1) Do you keep a medical cabinet and basic medical equipment in your house? (‘MedCabinet’); (2) Do you have the skills to use basic medical equipment? (‘Tooluseskill’); (3) Do you have experience in taking care of a sick family member? (‘ExpCare’); (4) Does your family regularly take simple medical measurements (blood pressure, eye sight, weight etc.)? (‘ExamTools’).

We assessed perceptions of the quality of periodic GHE sessions using five questions to which responses were given using a continuous, 1 to 5 scale (1=lowest quality). The variables were as follows: ‘Tangibles’=quality of medical equipment and personnel; ‘Reliability’=ability of examiner to perform medical services that meet the patient’s expectations; ‘Respon’=timeliness of service; ‘Assurance’=knowledge/ability to assure professional reliance; ‘Empathy’=thoughtfulness and having a high sense of responsibility. We also asked participants to tell us there general opinion of public health (‘CHPerc’), the options were: ‘good’, ‘quite good’, ‘bad’ and ‘unknown’.

Cost of treatment is one of the most important factors in people’s decision of having GHEs. Cost can influence whether patients go to the hospital or clinic for health checks, particularly if they do not experience signs of illness. In the survey, GHE costs are divided into three categories: ‘low’=under 1 million VND; ‘med’=from 1 to 2 million VND; ‘hi’=over 2 million VND. Respondents were also asked which of the following options they would choose if they were provided cash for having GHEs (‘Usemon’): use all the money to have a GHE soon (‘allsoon’); use part of the money for a GHE and save the rest (‘partly’); take the money and have a GHE later (‘later’).

Information in the mass media on health care in general, and on GHEs in particular, can also affect attendance at periodic medical examinations and judgments of medical service quality. We therefore asked participants to evaluate several aspects of the information they had received on GHEs, using a 1 to 5 scale: sufficiency (‘SuffInfo’); attractiveness (‘AttractInfo’); impressiveness (‘ImpressInfo’); popularity (‘PopularInfo’).

Development in science and technology mean that the use of information technology (IT) in subclinical diagnosis is becoming more and more widespread. At present there is only limited use of IT to support healthcare in Vietnam, for example healthcare queuing apps and more complex applications such as online consultation, diagnostic imaging, remote health treatment, electronic medical records etc. Not everyone is ready to accept the use of IT to support diagnostic assessment. We assessed such readiness using two questions: (1) ‘Are you willing to use IT to detect health problems if it is reliable’ (‘UseIT’) and (2) ‘If a healthcare app indicated that you needed to have a GHE would you actually arrange one?’ (‘AfterIT’).

At the end of the questionnaire there were two questions about participation in sports and physical exercise that were used to evaluate attitude to sports and perception of the health benefits of regular exercise: (1) ‘How much time do people need to spend on sports and physical exercise to stay in shape?’ (‘SuitExer’) and (2) ‘How much time do you spend on sports and physical exercise?’ (‘EvalExer’). Response options for the second question were ‘more than enough’ (‘verysuff’); ‘enough’ (‘quitesuff’); only a little (‘little’); ‘none or almost none’ (‘trivial’).

Measurement of the dependent variable and the control variable. The code used in R(3.3.2) was:

> model4.1=read.csv(‘D:/.../tab4.1.csv’,header=T) > attach(model4.1) > fit.model4.1=vglm(cbind (unknown,g12,less12)~Wsttime +Wstmon+HthyPriority+FlwHealth +HealthIns,data=model4.1,family=multinomial) > summary(fit.model4.1)

These commands were intended to determine how the length of time since an individual’s most recent GHE is related to possession of health insurance, concerns that GHEs are a waste of time and money, prioritisation of health and regular following of health updates. The results are presented in Table 1.

Table 1 Estimated coefficients Full size table

The model’s fitness test was conducted to verify that all the coefficients are not equal to zero simultaneously, that is the null hypothesis H 0 : β 1 =β 2 =...=0, yields the P-value:

p = 1 − pchisq ( 2 × ( − 151 . 22 + 249 . 91 ) , 10 ) ≈ 0

with df=(62–52)=10 (see Agresti)31. Thus, H 0 was decisively rejected.

The data in Table 1 were used to calculate conditional probabilities, which provide some useful remarks: (i) if there are no financial or temporal constraints people will attend GHEs to try to ensure early detection of diseases and timely treatment; and (ii) possession of health insurance is positively associated with attendance at GHEs, even in the case of people in financial difficulties (Fig. 2).

Figure 2: Shows the difference between insured and uninsured patients with respect to likelihood of attending a GHE in the near future. The two graphs are constructed using conditional probabilities calculated from the estimated coefficients presented in Table 1. The method of calculation was as described by Agresti27. From (a,b) the shifting trends of empirical probabilities are similar for both insured and uninsured patients. However, the changes in numerical probabilities are significantly different, and in (b), the two probability lines intersect, at about 50%. So 50% can be seen as a probability threshold where uninsured patients are indifferent in their decisions to have a GHE in the near future or not. There exists no such threshold for insured patients as seen in (a). Full size image

On the basis of these results we suggest that attendance at GHEs could be improved by increasing the budget for supported healthcare schemes, raising the actual coverage of health insurance and improving the quality of medical services offered to people with health insurance.

Code availability

Data were analysed using the statistical software R (release 3.3.1). The code used in the analyses is available as a pdf file (Supplementary File 1) which includes examples of code used to read the input data, create contingency tables and carry out multiple logistic regression for the dependent variable ‘RecPerExam’ and predictor variables ‘Wsttime’, ‘Wstmon’, ‘HthyPriority’, ‘FlwHealth’ and ‘HealthIns’.

The R code for generating Figs 1–3 is also included.