Creating a scalable online tool to assess social networks

We designed a HIPAA-compliant structured social network questionnaire adapted primarily from the General Social Survey12,15 (Supplementary Methods 1). The schema of data acquisition and potential use is presented in Fig. 1. The questionnaire comprises ~48 questions with adaptation to responses. The estimated completion time of the questionnaire is 10–15 min. The questionnaire begins with three traditional name generators, in which participants named all people with whom they had discussed important matters, socialized, or sought support in the last 3 months. The number of people who could be named was not capped. Next, participants answered questions that evaluate the connections between each pair of the first ten persons in the network, including the strength of ties in three levels (strangers, weak, and strong). Finally, participants answered questions about the characteristics and health habits of each of the first ten persons in the network7. The online questionnaire was hosted on the Research Electronic Data Capture (REDCap) server, a secure web platform for administering questionnaires in clinical research16. A version of the instrument is available for use in the REDCap Shared Library. Code to analyze and visualize data created from the instrument is available on GitHub.

Fig. 1 Overview of data collection, analysis, and interventions. This flowchart shows the social network data acquisition, identification of modifiable elements in the social environment, and potential intervention strategies Full size image

The assessment generated two main categories of network metrics, structure, and composition, based on graph theoretical statistics. Within the category of social network structure, size is the number of individuals in the network, excluding the index participant or “ego”. Density is a measure of connectivity of individuals in the network, calculated as the sum of ties, excluding the ego’s ties, divided by all possible ties17. Constraint is a more detailed version of density that quantifies the extent to which the ego’s connections are to individuals who are connected to one another. Effective size is the number of non-redundant members in the network18. Maximum degree is the highest number of ties by a network member, and mean degree is the average number of ties by a network member. Equations for these measures are available in Supplementary Methods 2.

Within the social network composition category, several metrics quantify the ratio of member characteristics in the network. For instance, the percent kin is the percent of individuals in the network who are family members. Standard deviation of age represents the range of ages. The diversity of sex index is the mix of men and women in the network, according to the index of qualitative variation19, with a value of 1 indicating equal mix of men and women. The diversity of race is the mix of races similarly calculated. Importantly, the questionnaire also queries the health behavior environment around the participant by examining the percentage of the network members with negative health habits, including smoking, sedentary lifestyle, not visiting doctors regularly, and poor compliance of prescription medications. All compositional variables were created to account for network size. Specifically, the number that fits a category was divided by the total size to create the percentage.

Demonstrating network quantification in a nation-wide cohort

We assessed the social networks of 1493 GEMS participants from across the United States (Supplementary Fig. 1), which represented 57% of the cohort as of October 2016. In Table 1, we report the demographic and clinical information of the cohort at the time of the study, separated into subgroups of asymptomatic participants and participants with an MS diagnosis. Asymptomatic participants had a lower age on average than participants with an MS diagnosis, consistent with the previously reported demographics of the cohort13.

Table 1 Demographics and clinical characteristics of the participants Full size table

The primary outcome measure of functional disability was the MSRS-R, a self-reported outcome of functional disability validated for people with MS. The MSRS-R is a brief questionnaire that correlates with traditional clinical instruments20,21. The eight domains of MSRS-R include walking, using arms and hands, vision, speaking clearly, swallowing, cognition, sensation, and the bowel and bladder function for a maximum score of 32. In this cohort of primarily asymptomatic people at risk for MS, we chose MSRS-R as an outcome measure because few alternative self-reported outcome measures have the advantages of being concise and validated in early MS. As expected, the median MSRS-R score was higher on average in the MS group than in the asymptomatic group.

To visualize each participant’s social network structure, we plotted a montage of all participants’ networks, ranging from the smallest to the largest, with the strength of each tie highlighted in color (Fig. 2). The average network consisted of eight people who were densely linked (67% of all possible ties were present). Furthermore, an average of 44% of all network members were kin, 38% were supportive of the index participant, and there was a nearly equal mix of men and women (diversity score of 0.89 with one being an equal mixture of men and women). Race, on the other hand, was not varied within networks with a diversity score of 0, indicating that most members in a participant’s network were of the same race. Weak ties, denoting those who are less familiar with the participant, ranged from 20% to 67% depending on the measure. The percent of individuals who were known for less than 6 years by the respondent was 20% in asymptomatic persons and 12% in MS patients (P = 0.001, Wilcoxon signed-rank test), suggesting a reduction in recent acquaintances in participants with an MS diagnosis. Otherwise, differences in network structure and general network composition between asymptomatic and MS participants were small and not significant (Table 2).

Fig. 2 Structure of participants’ personal social network. Each small network has a black circle that represents the participant who is surrounded by white circles who are the network members. The lines connecting the circles are red if the relationship is strong and blue if the relationship is weak. Networks are arranged from the smallest (top left) to the largest (bottom right) Full size image

Table 2 Network characteristics Full size table

To visualize the milieu of health habits around the participant, we plotted a montage of all participants’ networks, ranging from the healthiest environment to the least healthy (Fig. 3). On average, the network composition with respect to health habits skewed toward social environments in which most network members have healthy habits. Seventeen percent of participants had personal networks in which all members were healthy. On average, the percent of network members who do not exercise was 33%, and this was the highest value out of the examined negative health habits. There was a weak negative correlation between network size and the percentage of network members with unhealthy habits (Pearson’s correlation = −0.13 ± 0.05, P < 0.0001). Because we did not detect differences in network composition with respect to healthy habits between asymptomatic and MS participants, we were able to pursue joint analyses of these two subgroups.

Fig. 3 Health habits in participants’ personal social network. In each network, a black circle is the participant, a white circle is a healthy social contact, and a red dot is an unhealthy social contact. Unhealthiness is defined as someone who does any of the following: smokes, does not exercise, does not visit doctors regularly, or not compliant with prescription medications. Networks are arranged from least negative health influence (top left) to most negative health influence (bottom right) Full size image

Having established the basic properties of our data, we examined the relationship between network metrics and self-reported functional disability outcome. Given the number of network metrics and to account for multiple testing burdens, we grouped the network variables into structure and composition categories. We then used a permutation-based omnibus test to examine the associations of these two groups of network metrics with the MSRS-R. The observed distribution of P-values in the omnibus test was greater than chance for network composition (P = < 0.0001, all; P = 0.008, asymptomatic subgroup; P = 0.001, MS subgroup), but not for network structure (P = 0.066, all; P = 0.14, asymptomatic subgroup; P = 0.25, MS subgroup) (Table 3, Fig. 4). Thus, our global assessments indicated that network composition, rather than network structure, was associated with self-reported functional disability based on the MSRS-R scores (Table 3).

Table 3 Relationship of the composite categories of network variables to MSRS in all participants Full size table

Fig. 4 Comparison of expected versus observed regression results. Quantile–quantile plot of expected versus observed P-values of composite network structure and network composition metrics in relation to neurological function and disability in the full cohort (a, b) and subgroups of asymptomatic (c, d) and MS participants (e, f). The expected P-values (-log10[P-value]) are shown on the x-axis and the observed P-values (-log10[P-value]) are shown on the y-axis. The dark gray area indicate the confidence interval ranges as generated by chance at a threshold of P = 0.10 and the light gray is for P = 0.05. The observed values for composition, and not structure, are outside of the gray areas, suggesting that composition is associated with the MSRS-R score beyond chance after accounting for multiple testing burden and correlation structure of the composition variables Full size image

To deconstruct these global effects of the social network, we examined the association of individual network metrics with the MSRS-R, adjusting for sex, age, marital status, and years of education (Table 4). None of the network structure metrics were significantly associated with MSRS-R score, consistent with the global assessment. Two network composition features were significantly associated with MSRS-R score: the percent of network members who (1) do not go to a doctor regularly or (2) are deemed to have a negative health influence on the respondent. The strongest association was with the percent of network members who are deemed to have a negative health influence (β = 0.017 ± 0.005, P = 0.016, linear regression).

Table 4 Relationship of individual network variables to MSRS-R Full size table

In exploratory analyses, we examined the relationship between each individual’s Genetic and Environmental Risk Score (GERS) and her or his social network size. The GERS is an aggregate estimate of an individual’s MS risk based on validated genetic and environmental susceptibility factors. We have previously reported that the GERS is informative of MS risk beyond family history in the GEMS cohort of first-degree family members13. Using the published GERS based on previously reported genetic and environmental risk factor data available among a subset of the GEMS participants (n = 999 all, n = 920 asymptomatic subgroup, n = 79 MS subgroup), we noted an association in linear regression between larger network size and increased GERS (β = 0.82 ± 0.19, P = 2.43 × 10-5, all) (Supplementary Table 1). This finding appears to be driven by the larger network size of women participants relative to men. In a regression analysis, network size is inversely related to male sex (β = −1.87 ± 0.42, P = 8.71 × 10-6, all). Among asymptomatic participants, both a history of mononucleosis (β = 1.13 ± 0.40, P = 0.005) and a higher genetic risk score for MS susceptibility (β = 0.65 ± 0.24, P = 0.006) were also associated with a larger network size in the linear regression (Supplementary Table 1).