Significance The human immune system consists of many different white blood cells that coordinate their actions to fight infections. The balance between these cell populations is determined by direct interactions and soluble factors such as cytokines, which serve as messengers between cells. Understanding how the interactions between cell populations influence the function of the immune system as a whole will allow us to better distinguish patients most at risk for specific infections or immune-mediated diseases and inform vaccination strategies. Here, we determine key collective interactions between white blood cells present in blood samples taken from healthy individuals. This perspective allows us to predict functional responses and describe previously unappreciated differences between age groups and in individuals carrying cytomegalovirus.

Abstract The immune system consists of many specialized cell populations that communicate with each other to achieve systemic immune responses. Our analyses of various measured immune cell population frequencies in healthy humans and their responses to diverse stimuli show that human immune variation is continuous in nature, rather than characterized by discrete groups of similar individuals. We show that the same three key combinations of immune cell population frequencies can define an individual’s immunotype and predict a diverse set of functional responses to cytokine stimulation. We find that, even though interindividual variations in specific cell population frequencies can be large, unrelated individuals of younger age have more homogeneous immunotypes than older individuals. Across age groups, cytomegalovirus seropositive individuals displayed immunotypes characteristic of older individuals. The conceptual framework for defining immunotypes suggested by our results could guide the development of better therapies that appropriately modulate collective immunotypes, rather than individual immune components.

In the peripheral blood of humans over 100 different immune cell populations can be distinguished based on the expression of cell surface and intracellular markers. These cell populations span various activation and differentiation states within broader classes of principal immune cell lineages in the adaptive and innate immune systems (1). With recent advances in single-cell measurement technologies such as high-dimensional flow cytometry and mass cytometry (2), these different immune cell populations can be simultaneously quantified in an individual sample. Leveraging these developments, recent population studies have profiled the global immune cell composition in healthy adult humans (3⇓–5) and found these to be highly stable within an individual over the course of weeks to months, but highly variable between individuals (3, 6). Although variations in a few immune cell populations are heritable, the majority of immune cell population frequencies and functional responses are largely determined by nonheritable influences in healthy individuals (4). For example, cohabitation has been shown to be a strong influence, emphasizing the importance of shared environmental factors within a household (6). Most of the specific environmental factors that shape human immune systems have yet to be determined, but monozygotic (MZ) twin pairs discordant for human cytomegalovirus (HCMV) seropositivity were more different in the majority of immune measurements compared with seronegative twins (4).

Many studies have focused on individual immune cell populations, their functional properties, regulatory roles, and clinical significance. However, a prototypical immune response involves the coordinated action of multiple cell populations (7, 8), providing both inhibitory and stimulatory feedback, leading to a balanced response that fulfills its protective function while preventing immunopathology. The rules for immune cell interdependence are understood in only a few cases, but there may be many that are unknown. Many current therapies target individual cell populations and sometimes give rise to unexpected side effects, involving completely different cell populations from the ones directly targeted. For example, a subgroup of patients treated with the B-cell–depleting antibody Rituximab, which targets the CD20 antigen expressed on B cells, developed delayed neutropenia despite the fact that neutrophils do not express CD20 (9). An understanding of why such complications occur in selected patients requires a more complete knowledge of how specific immune cell populations depend on each other, regulate each other, and affect systemic responses.

The immune system is a complex adaptive system, without any apparent master regulator. Therefore, all systemic behaviors are emergent properties dependent upon the collective actions of all system components. As such, immune responses will be unpredictable until these emergent properties are understood. Furthermore, specific immune cell populations are dependent upon each other for stimulation and inhibition, and different cell population frequencies vary widely across healthy individuals. Therefore, we hypothesized that differences in immune responses between individuals are influenced by differences in some collective measures of their frequencies of various immune cell populations; i.e., certain combinations of immune cell populations are determinants of response. In this paper, we test this hypothesis by analyzing responses to stimuli in cohorts of healthy humans.

We first analyzed immune cell composition data from a total of 1,575 healthy individuals across multiple cohorts, both published (6, 10) and previously unpublished (149 individuals) in addition to our previously published data (4), combined in Dataset S1. We also took advantage of a set of diverse functional responses measured in 329 individuals from a number of Stanford cohorts (refs. 4 and 11 and previously unpublished data, combined in Datasets S2 and S3). Our goal was to define a small number of collective variables, or specific combinations of cell populations, variations in which could predict functional responses. Thus, our results depend on the specific cell populations and functional responses measured in this study. As such, future studies across additional independent cohorts from different geographical locations measuring the same or different cell populations and functional responses are needed to further validate the concept that a few properly chosen collective variables can predict a person’s immune response to stimulation. Nonetheless, our current analysis is a proof of concept that immunology can benefit from a more collective view of the immune system as a whole.

We identified three key combinations of immune cell frequencies (not individual cell frequencies) that robustly predict diverse functional responses in healthy individuals. Viewed from the perspective of how individual humans are different in terms of key combinations of cell populations, we find that, rather than separate clusters of individuals sharing immunological traits, there is a continuum of human immune variation. This variation is influenced, but not dictated, by age, sex and HCMV seropositivity. Even though young individuals vary greatly in their frequencies of individual cell populations, they are highly similar to each other in terms of these three key combinations of immune cell frequencies. With age, unrelated individuals become more heterogeneous when viewed from this perspective, likely reflecting the diversity of environmental exposures with time. Our results show that HCMV seropositivity is associated with a shift in immune phenotype along an age axis, leading to a phenotype characteristic of older populations, irrespective of the actual age of the individual. This conceptual framework of defining immunotypes using the collective states of many immune system components provides a sensitive way of describing immune variation and could guide the development of better immune modulatory therapies in the future.

Discussion Here we present a conceptual framework for analyzing human immune systems in terms of “immunotypes” within a space defined by axes that represent key combinations of immune cell frequencies. First, we find that individuals’ immunotypes are distributed continuously across several healthy populations and are predictive of diverse functional immune responses as shown by in vitro cell signaling results, and these responses cannot be predicted from individual immune cell frequency measurements alone. Next, we show that specific factors, most notably age and HCMV serostatus, are associated with reproducible shifts in immunotype across unrelated, healthy individuals (shown schematically in Fig. 6B). Furthermore, by analyzing the distribution of individuals’ immunotypes, we show that the immune systems of unrelated young individuals are more similar to each other, whereas older individuals are characterized by heterogeneous immunotypes. Our results suggest that an individual’s immune cell composition may be globally modifiable by interventions to transform it from a less responsive to a more responsive immunotype. Our findings are in accordance with recent data on the dominant influence of nonheritable factors in shaping human immune systems (4⇓–6, 10). They further support previous findings suggesting that environmental influences exert a cumulative influence over time, leading to the divergence of immune phenotypes between MZ twins (4). The finding that collective sets of immune cell frequency measurements predict diverse functional responses is interesting and suggests that the composition of an individual’s circulating immune system reflects a functional network of cell populations and that the balance between these cell populations determines the overall responsiveness of the system. It also suggests that modulating this balance and perturbing the network, for example by depleting one specific cell type, can have broad implications for the functional responsiveness of all remaining cells. This is an important observation given that such depleting strategies are routine practice in current medical practice. An example of a manifestation of this hypothesis is the clinical observation of delayed loss of neutrophils upon depletion of CD20-expressing B cells by an anti-CD20 antibody (Rituximab) (37). Another important implication of this is that the infusion of specific cell populations, such as tumor-specific lymphocytes or populations of stem cells infused for a range of clinical conditions, could impact the global network of immune cell populations with unpredictable consequences for the immune competence of the individual patient. The relationship between immune cell measurements and metadata—age, gender, and HCMV serostatus—also adds to our understanding of human immune system variation. The fact that these metadata alone cannot accurately predict functional responses shows the need for collective indicators of immune health (immunotypes) such as those defined in Fig. 2 C–E. Second, a description of an aging immune system requires far fewer LVs than does gender or HCMV serostatus, suggesting that these characteristics of an individual play a more complex role in regulating the immune system. The fact that older individuals are increasingly heterogeneous with respect to their immunotypes also makes age insufficient as a predictor of immune responsiveness. Despite the potential our conceptual framework has to pose and answer important questions in immunology, it is necessary to address possible improvements in our results and methods that can be addressed in future work. First, we emphasize that PLS regression is only one method for choosing the collective variables with which to define immunotypes. Extending this method using kernels to account for nonlinear interactions between immune parameters, in addition to introducing regularization into the LV decomposition (with a sufficiently large cohort size to allow unbiased fitting and evaluation of the regularization parameter), would likely lead to more robust definitions of immunotypes. Next, we acknowledge that predictions of our functional immune responses from immunotype, although statistically significant, are moderate correlations. This is likely due to a number of reasons, the most prominent being that the 34 immune cell populations (which are restricted to peripheral blood subsets) do not afford a complete definition of immunotypes, and thus we cannot expect perfect predictions. However, more extensive immune system measurements (serum cytokines, chemokines and growth factors, antibodies, including information from additional tissue compartments, etc.), as well as taking other factors, such as the symbiotic microbiota into account, may lead to even more reliable definitions of an individual’s immunotype. Related to this, an expansion of immune measurements would also increase the potential for front loading of explanatory power into the top three LVs, covering a comparatively larger fraction of the global immune system variation in humans. It is also important to note that our finding of a continuous distribution of immunotypes may also be sensitive to the specific immune cell populations measured in this study. An expansion of the immune measurements, as described above, could potentially reveal discrete clustering of individuals, especially if the additional immune parameters themselves exhibit discrete distributions. In addition, it is always the case that in clustering analysis, finding no evidence of discrete clusters does not prove their absence, although we are confident given our thorough analysis that discrete clusters are not present in these datasets. However, we also predict that this result could change if our cohorts were expanded to include, in addition to the healthy individuals studied here, individuals with perturbed immunotypes—e.g., patients treated with immunosuppressive drugs or individuals suffering from primary and secondary immunodeficiency syndromes with severe infectious diseases susceptibility and autoinflammatory symptoms may form a distinct cluster. We predict that these individuals may be qualitatively much different from the healthy individuals in the cohorts we studied, the discovery of which would be a next step for development of immune-modulatory therapies. A possible criticism of the results we present here is that responses to cytokine stimulation in vitro may not be reflective of actual immune system competence that determines an individual’s risk of immune-mediated disease or severe infectious diseases. However, several lines of evidence suggest that the broadly different signaling responses, mediated via different signaling pathways, are reliable correlates of global immune competence. For example, in a recent study, Shen-Orr et al. (11) found that defective responsiveness in the JAK-STAT pathway to the same type of in vitro stimuli was strongly predictive of cardiovascular disease in a cohort of elderly individuals. It will be important to test the concepts revealed by studying the cohorts considered in this paper by carrying out similar analyses in additional cohorts for which both immune cell compositions (or other markers) and responses to stimuli have been measured. We anticipate that the concept of continuous immunotypes predictive of immune responses will be further validated. The precise definitions of the key combinations of cell types (LVs) that define immunotypes may be cohort dependent. Indeed, it will be interesting to study how the precise definition of the LVs depends upon diet, infection history, etc., by studying cohorts of different geographical and environmental contexts whose experiences in this regard are expected to be distinct from those of the Stanford cohorts described here. Finally, the conceptual framework we have described should be augmented by a mechanistic understanding of why and how a few specific combinations of immune cell populations define an individual’s immunotype. By addressing the above limitations in our method, we believe that our findings and the framework presented here could guide future developments of better immunomodulatory therapies designed to induce perturbations of an individual’s immunotype rather than individual immune components (Fig. 6B). This framework could also be of particular importance for the development of improved vaccines in the future, taking immunotypes into account, perhaps personalizing adjuvants and vaccine formulations as well as optimizing the time for vaccination to improve responsiveness. Also within the field of cancer immunotherapy, the induction of antitumor immune responses could be improved by stimulating immunotypes rather than individual cells to improve responsiveness. As more data become available and immunotype definitions become more robust, modeling of necessary perturbations to elicit a desired response would become possible. We envision that such a framework would make the responsiveness of individual patients to such therapies more predictable because compensatory relationships between individual cell populations are taken into account in a way in which more traditional analyses, focusing on individual cell populations only, do not. Thus, the conceptual framework we have described may have important implications for predicting immunological health and the risk of disease.

Materials and Methods Cohorts. In this work, we analyzed human immune cell frequency data from three cohorts. The Stanford cohort consisted of 398 individuals, both men and women, between ages 8 y and 89 y of age, sampled at the Stanford University clinical trials unit. The study protocol was approved by the Stanford University Administrative Panels on Human Subjects in Medical Research, and written informed consent was obtained from all participants. A total of 226 of these individuals were twin subjects from both MZ and DZ pairs from the Twin Research Registry at SRI International as previously described (4). A total of 23 additional subjects were included from a longitudinal study of aging in the immune system (11) and 149 previously unpublished subjects were included as well (combined in Datasets S1–S3). All demographic information of the subjects in the Stanford cohort is available in Dataset S1. The Stanford cohort comprises individuals from the following ClinicalTrials.gov study IDs: NCT01827462, NCT03020498, NCT03022396, NCT03022422, and NCT03022435. The Roederer et al. (10) cohort consists of 669 female twins sampled in the United Kingdom. The Carr et al. (6) cohort consists of 670 nontwin subjects, both men and women sampled in Belgium. Immune Cell Population Frequency Measurements. All three cohorts analyzed PBMCs, using either flow cytometry or mass cytometry. The immune cell composition data for the Stanford cohort were generated by the Human Immune Monitoring Center, using mass cytometry and antibody panels (4, 11). All cells were classified by manual gating, using the FlowJo Software (ThreeStar Inc.), and gating schemes are presented in the original publications. The PBMCs from the Roederer et al. (10) and Carr et al. (6) cohorts were analyzed by a combination of consecutive flow cytometry panels and manual gating. Functional Response Measurements. Frozen PBMCs were thawed and stimulated, with one of seven different cytokines (IL-2, IL-6, IL-7, IL-10, IL-21, IFN-α, or IFN-γ) for 30 min; fixed; and stained with antibodies to surface antigens (identifying eight responding cell populations: monocytes, B cells, CD4+, CD4+CD45RA+, CD4+CD45RA–, CD8+, CD8+CD45RA+, and CD8+CD45RA– T cells) as well as intracellular phosphor-STAT1, -3, and -5, respectively, as described in detail previously (4). The 90th percentile fluorescence intensity of stimulated samples was compared with the 90th percentile in PBS-treated cells (unstimulated) as a fold-change ratio used for downstream analyses (Dataset S2). In addition to the cell-signaling responses, we also include an analysis of antibody responses to seasonal flu vaccines (years 2010–2012) (Dataset S3). The vaccine administered was the Fluzone quadrivalent vaccine (Sanofi-Pasteur Inc.) and the vaccine-induced antibody responses were measured in a strain-specific manner, using HAI assays, and fold changes day 28/day 0 were used for analyses as described in detail in ref. 4. The flu-vaccine data included eight strains: A/Texas/50/2012, B/Massachusetts/2/2012, B/Brisbane/60/2008, A/California/07/2009(H1N1), A/Perth/16/2009(H3N2), A/South Dakota/06/2007(H1N1), A/Uruguay/716/2007(H3N2), and Wisconsin. The A/Texas/50/2012, B/Massachusetts/2/2012, and Wisconsin strains were removed due to small sample size (Data Preprocessing). Data Preprocessing. Of the 136 cell population frequencies measured in the Stanford cohort, 34 were considered in this study. We removed 89 cell populations because they were measured in only 10.3%, 29.1%, 39.4%, or 70.8% of the 398 patients. We also removed the following 13 subtypes because they either are rare or are the complement in a set of rare cell populations: plasmablasts, HLADR+ NK cells, CD161+CD45RA+ Tregs, CD161+CD45RA− Tregs, CD85j+CD4+ T cells, HLADR+CD38+CD4+ T cells, HLADR+CD38+CD8+ T cells, HLADR+CD38−CD4+ T cells, HLADR+CD38−CD8+ T cells, HLADR−CD38+CD4+ T cells, HLADR−CD38+CD8+ T cells, HLADR−CD38−CD4+ T cells, and HLADR−CD38−CD8+ T cells. To remove measurement dependence on measurement year (2009–2012), we applied the ComBat algorithm (38), an empirical Bayes method for removing nonbiological experimental variation that has been found to outperform other batch effect correction methods according to many metrics (39). Patient 398 was removed due to missing batch information, leaving a total of 397 patients studied from the Stanford cohort. A summary of the immune cell population frequencies and demographics data for the Stanford cohort is available in Dataset S1. Functional responses having fewer than 34 patients (equal to the number of retained cell populations) were removed from consideration: three influenza strains (A/Texas/50/2012, B/Massachusetts/2/2012, and Wisconsin), as well as all 21 cytokine stimulation responses measured in the “Non-BT” cell subtype. Clustering Methods. The Stanford, Roederer et al. (10), and Carr et al. (6) cohorts included composition measurements of nonidentical immune cell populations (Table S1 for a list of cell subtypes in each cohort), thus precluding a combined clustering analysis of the full dataset. The “monocytes” cell population was removed from the Roederer et al. (10) cohort as it was missing in 7.6% of the 729 patients. Of the remaining 32 populations, 11 had missing data in at most 1.6% of patients. For the Carr et al. (6) cohort, patients containing >20% missing measurements were removed, leaving a total of 449 patients. Cell populations with >10% missing values across patients were removed from consideration. Of the remaining 32 populations, 5 had missing data in at most 4% of patients. Missing values in both cohorts were imputed to the average across all patients. Clustering analysis was performed on the PC scores (projections of immune cell composition data onto the eigenvectors of the correlation matrix—this places all immune cell population frequencies on the same scale). Clustering of individuals was performed on the top PCs with a k-means clustering algorithm implemented using the sklearn.cluster.KMeans() class in scikit-learn v0.17 (40), using interpoint Euclidean distance in PC space. The quality of clusters from this algorithm was quantified using two metrics: i) The silhouette coefficient (41) is defined for each data point, i , as ( b i − a i ) / max { a i , b i } in which a i is the average pairwise Euclidean distance between data point i and all other points within the same cluster as i , whereas b i is the lowest average pairwise Euclidean distance between data point i and all data points in any other cluster. The silhouette coefficient ranges from −1 to 1, with a high value indicating good clustering.

ii) The between-cluster fractional explained variance is defined as the variance in the data where each point is replaced by the average of the points in the same cluster, divided by the total variance, V t o t , in the data (equal to the sum of between-cluster and within-cluster variance, V b / w and V w / i , respectively). More formally, let x i = ( x i ( 1 ) x i ( 2 ) … x i ( d ) ) T be a vector of PC scores (PCs 1 − d ) for patient i and μ = ∑ i = 1 N x i / N be the mean over all N patients. Now, let the clustering algorithm assign each patient to one of C clusters, c i ∈ { 1 , … , C } , and let X c be a matrix composed of the subset of rows of X corresponding to patients assigned to cluster c ; i.e., the rows of X c are the set { x i ∶ c i = c } . Finally, let μ c = 1 T X # { i ∶ c i = c } be the cluster average, where 1 is a vector of ones. Then, the between-cluster fractional explained variance, V b / w , f r a c , is calculated as follows: V t o t = tr [ ( X − μ ) T ( X − μ ) ] V w / i = ∑ c = 1 C tr [ ( X c − μ c ) T ( X c − μ c ) ] V b / w , f r a c = V T o t − V w / i V t o t = V b / w V t o t . Because these measures of cluster performance are difficult to interpret alone, they are compared with a null model formed by the following: (i) For each PC, i , in the original data, calculate the difference, a i , between the 97.5th and 2.5th percentiles of the scores of that PC. This gives a robust-to-outliers measure of the spread in each PC. (ii) Choose a number of PCs, d , to include in the clustering. (iii) Generate a uniform random sample of profiles within the PC space contained within a d -dimensional ellipsoid with principal axes equal to the respective differences calculated in i. If x i represents the randomly generated value for the PC i score, the null model satisfies ∑ i = 1 d x i 2 ( a i / 2 ) 2 < 1 . This null model was chosen because the PC projections appear ellipsoid with apparent uniform density of points except at the outskirts (although with a fatter tail than a multivariate Gaussian). Thus, using a uniform density for the null model is conservative (i.e., it will tend to give a larger number of clusters than the real data). These analyses are presented in Fig. S1, along with corresponding t-distributed stochastic neighbor embedding (t-SNE) plots, implemented using the sklearn.manifold.TSNE() class in scikit-learn v0.17 (40) with two components, perplexity of 30.0, and learning rate of 1,000.0. The t-SNE method for visualizing high-dimensional data overcomes some of the limitations of the linear PCA method (20). PLS Regression. Formally, we sought to develop a set of regression models that separately predict the values of each functional response, y ( k ) , given the immune cell compositions, X , of a group of individuals. Here, element y i ( k ) of vector y ( k ) represents the value of response k for individual i , whereas element X i j refers to the immune cell population frequency j in individual i . Linear models assume that the responses and predictors are related through the functional form y ^ ( k ) = X β ( k ) , where the “hat” distinguishes the predicted responses from the experimentally measured ones, and β ( k ) is the regression vector for functional response k . The elements of the regression vector are the weights of each cell population that define a linear combination that best predicts the functional response. For the regression models, the functional responses were first transformed using the log-modulus transformation (42) to preserve rank ordering of responses while minimizing the effect of large outliers. For each functional response, PLS (Dataset S4) is used to find a set of linear combinations (termed the LVs) of cell populations that have the highest covariance with the functional response. These LV signatures, r j ( k ) , are the directions in the space of possible immune cell compositions that explain a particular functional response, ordered from best to least, as columns of the LV matrix R ( k ) . In this way it is possible that only a small number of such directions need to be identified to predict functional responses well. The advantages of using PLS are its simplicity, the avoidance of multicollinearity issues resulting from highly correlated immune cell populations (SI Materials and Methods, Regression Methods), and the conceptual advantage afforded by grouping cell populations together into latent variables that collectively explain a response of interest. We compared the performance of PLS with PCR, in which directions in the space of possible immune cell compositions are chosen to explain the greatest variance in immune cell composition across patients. The regression models are evaluated using learning curves of 10-fold cross-validated (CV) negative Spearman correlation coefficients between observed and predicted response measurements for both PLS and PCR regression models. As described in ref. 43, cross-validation estimates out-of-sample model performance for smaller datasets by fitting a model on only a fraction of the available data and testing predictions on the remaining data left out. We used a 10-fold cross-validation scheme based on recommendations by Hastie et al. (43). The dataset is randomly partitioned into 10 groups and a PLS model separately fitted on each possible combination of 9/10 of those groups. The predictions on the left-out groups are compared with their experimental values, using a negative Spearman correlation coefficient. Learning curves are made by calculating CV error as a function of the number of LVs included in the model. The default error (when no LVs are used) is a correlation coefficient of zero. Error bars on learning curves are generally reported as SEs (44). However, because our measure of error combines errors over the entire dataset, we report error bars as SD of error calculated over 100 random instantiations of the CV learning curve (corresponding to different random 10-fold partitioning of the dataset). We wish to verify that the PLS model is fitting meaningful correlations between both the immune markers and response functions and that improved model performance is not an artifact of the model-fitting procedure. To address these concerns, we evaluate the performance of a “null model,” created by random reassignment of responses across patients. This randomization eliminates the relationship between a patient’s immune cell composition and the value of the corresponding response, and the model-fitting procedure should correctly detect that there is no underlying structure to be learned from the data. We calculate 95% confidence intervals of the null model learning curve, using 100 instantiations of the null model to summarize how variable the estimates from the null model are. Responses for which the minimum error plus error bar come within 0.05% of the 95% confidence interval of the null model learning curve are removed from further consideration. Six signaling responses (IL-6/Mono/STAT5, IL-7/CD4+CD45RA+/STAT1, IL-7/CD4+CD45RA–/STAT1, IL-7/CD4+/STAT1, IFN-γ/CD8+CD45RA–/pSTAT3, and IFN-γ/CD4+CD45RA+/pSTAT3) and four of the five flu-vaccine responses [A/California/07/2009(H1N1), A/Perth/16/2009(H3N2), A/South Dakota/06/2007(H1N1), and A/Uruguay/716/2007(H3N2)] were so removed. Finally, to indicate the amount of front loading of explanatory power into the top LVs, we also report a normalized error defined as follows: If the minimum observed error for a given functional response in either PLS or PCR is denoted e m i n , and the raw model error is denoted e , the normalized error is defined as e n o r m = ( e − e m i n ) / ( 0.0 − e m i n ) , where an error of 0.0 indicates a degenerate model in which the mean of the response measurements is predicted regardless of an individual’s immune cell composition. As discussed in the main text, the PLS model for the HAI response to the influenza B/Brisbane/60/2008 strain was fitted on only the subset of individuals who had a measurement increase in HAI titer upon vaccination (31% of patients with a post- to prevaccination HAI titer ratio of 1 were not included in the fit). Furthermore, the fit benefits from further outlier moderation; seven patients with log-modulus HAI titer ratios greater than 4.0 (corresponding to HAI titer ratios >53.6) were adjusted to this 4.0 threshold for further outlier moderation. These seven individuals had HAI titer ratios equal to 64, 80 (for three individuals), 160, 320, and 640. The average HAI titer ratio for the remaining individuals was 11.3. Regression Methods. We highlight three linear regression methods in this section. The main regression method used in this study is PLS regression. However, ordinary least squares (OLS) and PCR are discussed first to illustrate the shortcomings that the PLS method addresses. Furthermore, the OLS results are used repeatedly in the development of the PCR and PLS methods.

SI Materials and Methods OLS. OLS (43) is the simplest linear regression method and forms the conceptual basis for more complex methods like PCR and PLS regression. The OLS model assumes that within error ε , the response variable, y , is a linear combination of covariates, represented by a set of regression coefficients β . Here the covariates are the individual features in the observed data (columns in X ), which are treated as fixed and known. The model, least-squares estimate of the regression coefficients, β ^ , and its variance are given by y = X β + ε β ^ O L S = ( X T X ) − 1 X T y V [ β ^ O L S ] = ( X T X ) − 1 σ 2 , where σ 2 characterizes the error term, assuming spherical normal errors: V [ ε | X ] = σ 2 I n . PCR. Rather than using the original features as covariates as in OLS, PCR performs linear regression by using as features the PC scores, Z , instead of X directly (see above). The regression coefficients in PCR are thus exactly the OLS result using transformed covariates, Z : β ^ P C R = ( Z T Z ) − 1 Z T y . The variance of the PCR regression vector has a simplified (diagonal) form owing to the linear transformation to uncorrelated covariates (i.e., the PC scores): V [ β ^ P C R ] = Λ − 1 σ 2 . The dependence of the variance of the PCR regression coefficients on the inverse of the eigenvalue matrix illustrates that low-eigenvalue PCs (potentially as a result of multicollinearity) lead to high variance in the corresponding coefficients of the estimates. The effect of these low-eigenvalue modes on the OLS regression coefficients is more complicated because the original, nontransformed covariates are correlated linear combinations of the uncorrelated PCs. The effect of this can be seen by expressing the variance of the OLS regression estimator in terms of the PCs, ( V [ β ^ O L S ] ) i i = ( ∑ k λ k − 1 V i k 2 ) σ 2 , which is proportional to the sum over all PCs, k , of the reciprocal eigenvalue times the square of the loading of that covariate in PC k . Here, matrix V is the eigenvector matrix. Intuitively, this says that covariates with high loadings in the low-eigenvalue PCs tend to have inflated variance that leads to difficulties in interpretation. PCR avoids the issue of multicollinearity by regressing on a reduced subset of r < k , PCs with corresponding eigenvalues sufficiently large; i.e., β ^ r P C R = ( Z r T Z r ) − 1 Z r T y , where Z r = X V r and subscript r indicates that the matrix comprises only the first r columns. PLS. PCR suffers from an important limitation: The PCs, although capturing the maximum variability in X , may or may not be relevant for the task of predicting the response Y (single-response vector y is expanded to multiple-response matrix Y , indicating that these methods can be applied to multiple responses simultaneously), because no information regarding Y is incorporated in computing the PCs. This is addressed by PLS, which computes a simultaneous decomposition of X and Y such that these components explain as much as possible of the covariance between X and Y . The PLS method works by decomposing both the covariate, X , and the response, Y , matrices into sums of h rank-1 matrices. These decompositions are termed the “outer relations” for X and Y : X = ∑ i = 1 h t i p i T + E h = T h P h T + E h Y = ∑ i = 1 h u i q i T + F h ∗ = U h Q h T + F h ∗ . Here, E h and F h ∗ refer to error terms from (potentially) incomplete decomposition. An example of such a decomposition is PCA, in which individual PCs and corresponding scores give these rank-1 approximations; i.e., X = Z V T . Matrices T h and U h are referred to as score matrices, whereas matrices P h and Q h are referred to as loading matrices, in analogy with PCA. Unlike the PCA loading matrix, V , however, P h and Q h are not necessarily orthogonal. The score matrices are modeled to be related by the linear “inner relation”: U h = T h D h + G h . Here, D h is an h × h matrix of OLS regression coefficients relating the respective scores of X and Y , whereas G h is the inner relation error. Note that substituting this inner relation into the outer relation for Y results in a linear relationship between responses Y and the scores of X : Y = U h Q h T + F h ∗ = ( T h D h + G h ) Q h T + F h ∗ = T h D h Q h T + G h Q h T + F h ∗ = T h M h T + H , where M h T ≡ D h Q h T and H ≡ G h Q h T + F h ∗ . Thus, unlike PCA, which determines the score matrix for X without considering Y , PLS incorporates information about Y for the decomposition; the first set of PLS scores, t 1 and u 1 , is chosen to maximize their covariance: w 1 , c 1 = arg max ‖ w ‖ , ‖ c ‖ = 1 [ Cov ( X w , Y c ) ] 2 t 1 = X w 1 u 1 = Y c 1 . Here, vectors w 1 and c 1 are unit vectors that weight covariates (columns in X ) and responses (columns in Y ) to generate the linearly transformed scores, t 1 and u 1 , respectively. In this study, w 1 is referred to as “latent variable 1” (“LV1”) (note that there is ambiguity in naming subsequent latent variables due to deflation, discussed below), whereas t 1 is termed the “LV1 scores.” Note that the optimal w 1 and c 1 are equal to the first left- and right-singular vectors from the singular value decomposition of the cross-product matrix X T Y , with covariance equal to the corresponding singular value. PLS employs a “deflation” scheme to ensure that latent variable scores are statistically uncorrelated and that later latent variables attempt to model only the portion of the responses that are not already well captured by earlier latent variables. After each latent variable, LV i , is chosen, X and Y are deflated as follows: p i = X T t i / ( t i T t i ) X ← X − t i p i T m i = Y T t i / ( t i T t i ) Y ← Y − t i m i T . The rank-1 approximations, t i p i T and t i m i T , are obtained by OLS regression of (deflated versions of) X and Y , respectively, on score t i . As a consequence of this asymmetric deflation scheme, the process of selecting latent variables in PLS can be continued until the number of latent variables equal to the number of covariates in X has been obtained, using the same objective function, with now X and Y deflated: w i , c i = arg max ‖ w ‖ , ‖ c ‖ = 1 [ Cov ( X w , Y c ) ] 2 t i = X w i u i = Y c i . Because the approximation to Y is imperfect (i.e., it is obtained by regressing on LV scores for X rather than for Y itself), PLS still works if the number of latent variables used is greater than the number of responses in Y, and even a full model may not result in a residual matrix of 0 . A final consideration in the PLS model is that interpretation of the weight vectors W is complicated by the fact that they are calculated on an iteratively deflated X matrix. In other words, T ≠ X W . Thus, we define a matrix R , such that columns of R describe linear combinations of the original X covariates that give rise to the score matrix T ; i.e., T = X R . In this study, column j of R is referred to as the “latent variable j ” even though latent variable often refers to columns of matrix W in other studies. This allows the PLS results to be expressed in terms of an estimated regression matrix Β ^ h P L S , based on h latent variables, in analogy with OLS and PCR: Y = T h M h T + H = ( X R h ) M h T + H = X Β ^ h P L S + H Β ^ h P L S = R h M h T . In the case of a single response variable, this reduces to y = T h m h T + h = ( X R h ) m h T + h = X β ^ h P L S + h β ^ h P L S = R h m h T . Whereas M h previously had a number of rows equal to the number of responses, m h is now an h -dimensional row vector. Pseudocode for a modified PLS algorithm, which makes use of the NIPALS algorithm described in ref. 45, is provided in Dataset S4. Latent Variable Similarities. We calculated the similarities between LVs i and j of functional responses k 1 and k 2 as the absolute dot product between them: | r i ( k 1 ) ⋅ r j ( k 2 ) | = | ∑ l r i , l ( k 1 ) r j , l ( k 2 ) | , with the absolute value indicating that the LVs extend in both positive and negative directions. The distributions for these dot products across all pairs of responses are shown in Fig. S3. The extent of similarity is found as the fraction of response pairs with P value <0.05 (corresponding to absolute dot product >0.335) compared with a null distribution of randomly chosen pairs of vectors on a 34-dimensional unit hypersphere (Fig. S3). This null distribution was approximated by simulating 1 million random pairs of vectors according to the null model. To assess whether the number of statistically similar pairs, n , is significantly greater than expected given the null distribution, we sum the corresponding binomial distribution to find the P value, p = ∑ i = n N ( N n ) 0.05 n 0.95 N − n , where ( ⋅ ⋅ ) is the binomial coefficient, and N is the total number of response pairs. Note that this assumes response LVs are independent; however, the extent to which responses are correlated (due to sharing of cytokine stimulant, responding cell type, and/or STAT measurement) is unknown. Immunological Aging Axis. The immunological aging axis from Fig. 4 was calculated using logistic regression of age on the top three LV scores. All ages below the median were binarized to zero whereas the remaining ages greater than or equal to the median were binarized to unity. Logistic regression was favored over ridge regression to diminish the effect of outliers, especially near the base of the cone. Logistic regression was implemented using the sklearn.linear_model.LogisticRegressionCV() class in scikit-learn v0.17 (40), with a 10-fold randomized cross-validation scheme. The dashed red line in Fig. 5A is plotted as a = a 0 + β t , where β is the regression vector from logistic regression, t parameterizes the equation, and a 0 is chosen to minimize the sum of perpendicular distances (2-norm) from each point to the dashed line, a 0 = arg min a ′ 0 ∑ i = 1 N ‖ ( a ′ 0 − x i ) − ( ( a ′ 0 − x i ) ⋅ β ) β ‖ 2 , where x i is a vector specifying the top three LV scores for individual i . The coefficients of the age axis from Fig. 4C are obtained by the product of the first three columns of the LV matrix, R 3 , with the logistic regression vector, β , from above; i.e., the coefficients of the age axis are the elements of R 3 β . Twin Pair Immunotype Similarities Null Model. To determine the effect of twin status on immunotype similarities, we compared the interindividual distances between twin pairs (MZ or DZ) to those between randomly chosen individuals in the cohort. However, we wanted to ensure that the null distribution had the same distribution of ages as that in the twin distributions (note that this age distribution may be different for MZ vs. DZ twins, so a separate null model is constructed for both). Using the MZ null as an example, for each age, a , represented in the MZ distribution, we randomly select two nontwin individuals from the full dataset where the age of each individual is within the k nearest individuals by age to a —note that as ages in the dataset are discrete by year, the set of individuals may be larger than k itself if there are many individuals in the age group farthest away from a . Using this as a null distribution, with k chosen to be 10 nearest neighbors, we find that the means of both the MZ and DZ distributions are significantly lower than the means of their respective null distributions (P < 0.0001), indicating twin status is correlated with similarity in immunotype. Although this result is theoretically dependent on k , where low values of k indicate more strict enforcement of age correction in the null model at the expense of greater sampling noise, we found the conclusion robust to choice of k up to 10.

Acknowledgments We thank all members of the A.K.C. and P.B. laboratories for insightful comments on this study. A.K.C. and K.J.K. received financial support from NIH Grant R01 HL120724 and the Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University. K.J.K. was also a National Science Foundation predoctoral fellow. P.B. is supported by a starting grant from the European Research Council, the Swedish Research Council, the Swedish Society for Medical Research, the Swedish Cancer Foundation, and Karolinska Institutet.

Footnotes Author contributions: M.M.D., A.K.C., and P.B. designed research; K.J.K., C.L.D., H.M., and P.B. performed research; K.J.K., D.N., A.K.C., and P.B. analyzed data; and K.J.K., K.S., M.M.D., A.K.C., and P.B. wrote the paper.

Reviewers: R.A., Emory University; and J.H., Icahn School of Medicine.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1705065114/-/DCSupplemental.