Systemic sclerosis (SSc) is an autoimmune disease associated with fibrosis and serious complications, including pulmonary arterial hypertension (PAH). Abnormal B cell responses have been associated with SSc pathogenesis, and de Bourcy et al. analyzed immunoglobulin heavy chain (IGH) transcripts of SSc-PAH patients enrolled in a B cell depletion clinical study. SSc-PAH was associated with several B cell development anomalies, particularly underuse of the IGHV2-5 segment and B cell homeostasis abnormalities. Depletion temporarily reversed these anomalous SSc-PAH disease signatures, and the data suggest that the rate of naïve B cell replenishment could be estimated from baseline measurements. These results define antibody signatures associated with SSc-PAH and reveal how B cell depletion shapes the antibody repertoire during reconstitution.

Systemic sclerosis with pulmonary arterial hypertension (SSc-PAH) is a debilitating and frequently lethal disease of unknown cause lacking effective treatment options. Lymphocyte anomalies and autoantibodies observed in systemic sclerosis have suggested an autoimmune character. We study the clonal structure of the B cell repertoire in SSc-PAH using immunoglobulin heavy chain (IGH) sequencing before and after B cell depletion. We found SSc-PAH to be associated with anomalies in B cell development, namely, altered VDJ rearrangement frequencies (reduced IGHV2-5 segment usage) and an increased somatic mutation–fixation probability in expanded B cell lineages. SSc-PAH was also characterized by anomalies in B cell homeostasis, namely, an expanded immunoglobulin D–positive (IgD + ) proportion with reduced mutation loads and an expanded proportion of highly antibody-secreting cells. Disease signatures pertaining to IGHV2-5 segment usage, IgD proportions, and mutation loads were temporarily reversed after B cell depletion. Analyzing the time course of B cell depletion, we find that the kinetics of naïve replenishment are predictable from baseline measurements alone, that release of plasma cells into the periphery can precede naïve replenishment, and that modes of B cell receptor diversity are highly elastic. Our findings reveal humoral immune signatures of SSc-PAH and uncover determinism in the effects of B cell depletion on the antibody repertoire.

( A ) Study design. Whether infusions were of rituximab or placebo depended on the study arm. (Note that not all the indicated time points were available for all participants.) ( B ) Number of distinct IGH sequences observed as a function of time, colored by study arm. ( C ) Fraction of observed sequences that had the IgA or IgG isotype as a function of time, colored by study arm. ( D ) Fraction of IgD or IgM sequences that displayed zero somatic mutations as a function of time, colored by study arm. ( E ) Summary of the signatures identified in (B) and (C): fold change in isotype-switched proportion from baseline to depletion versus fold change in sequence diversity from baseline to depletion. Here, “depletion” refers to the week 4 and week 12 visits; diversities and isotype-switched proportions were averaged across these two time points. Numeric data values are also provided in table S1. ( F ) Phylogenetic distance between each participant’s baseline repertoire and her repertoire at the week 36 visit. Week 36 was chosen as the common second time point because all but two study arm A recipients had recovered more than 10,000 sequences by then and times beyond week 36 had not been sampled for all participants. Numeric data values are also provided in table S2.

Here, we elucidate signatures of SSc-PAH and the dynamics of B cell replenishment after depletion using massively parallel sequencing of IGH transcripts found in peripheral blood mononuclear cells (PBMCs) (“antibody repertoire sequencing”), which provides a wealth of information on the processes of VDJ recombination, clonal expansion, somatic hypermutation, and isotype switching that shape the B cell repertoire ( 26 , 27 ). We longitudinally studied 11 participants from an ongoing randomized, double-blind, placebo-controlled phase 2 multicenter trial of rituximab for the treatment of SSc-PAH, sequencing 85 samples and obtaining >4 million IGH sequences. Infusions of 1000 mg of rituximab or a placebo were given at weeks 0 and 2; blood was sampled at baseline (preinfusion), at weeks 2 and 4 relative to the first infusion, and roughly in 12-week increments after that ( Fig. 1A ). All SSc-PAH participants were females between the ages of 48 and 73, inclusive; healthy baseline controls were 15 females between the ages of 50 and 75, inclusive.

Systemic sclerosis (SSc), or scleroderma, is a rare chronic autoimmune disease that leads to tissue fibrosis and vasculopathy ( 1 ). Indications of a dysregulated immune system include reduced lymphocyte counts ( 2 ), T cell anomalies ( 3 ), and B cell anomalies, such as the presence of autoantibodies ( 4 ), increased CD19 expression ( 5 ), increased naïve B cell proportions, and diminished memory B cell proportions ( 5 ). Both preclinical and clinical studies have implicated B cell abnormalities in the pathogenesis of SSc ( 6 , 7 ). A complication with particularly poor prognosis is pulmonary arterial hypertension (PAH) ( 8 ), which can arise from SSc-associated interstitial lung disease (ILD) but also directly from SSc without notable ILD; the latter case, which we will refer to as SSc-PAH, is the object of the present study. Current treatment typically focuses on pulmonary vasodilation and is not curative ( 9 ). Immunotherapy with the B cell–depleting agent rituximab, a monoclonal antibody against CD20 ( 10 , 11 ), has shown promise as a potential treatment for SSc-associated skin fibrosis and/or ILD ( 12 – 17 ). There is early anecdotal evidence that rituximab is effective in certain refractory cases of PAH, including SSc-PAH: For example, B cell depletion with rituximab is effective in a model of T helper 2 cell–mediated PAH ( 18 ), and an SSc patient with mild ILD and very substantial PAH who was refractory to standard PAH therapy was successfully treated with rituximab twice ( 19 ). The B cell basis of SSc has previously been studied ( 20 ) with respect to subset sizes ( 5 ), cytokine levels ( 21 ), response regulators ( 5 , 22 ) and autoantibody specificities ( 23 , 24 ) but not at the level of sequence-based B cell phylogeny. Reconstitution of the immunoglobulin heavy chain (IGH) repertoire after rituximab administration has been studied before ( 25 ), but the sample size was small (two participants) and the sequencing depth was low (on the order of 680 sequences).

RESULTS

Identification of a study group undergoing B cell depletion Because the clinical trial providing study samples is ongoing, it is blinded to whether study arm A or B corresponded to the rituximab group as opposed to placebo (Fig. 1A). Here, we show that the characteristics of group A are consistent with CD20+ B cell depletion. Group A shows a transient reduction in overall B cell receptor (BCR) diversity (Fig. 1B), and B cells that remain during the depletion period preferentially tend to have undergone isotype switching (Fig. 1C) or acquired one or more somatic mutations (Fig. 1D), consistent with the hypothesis that they are CD20− B cells in the terminal stages of differentiation [plasmablasts and plasma cells (28)]. B cell depletion was observed to be transient, ended by repopulation with new naïve cells (Fig. 1, B to D). Summarizing the above depletion signatures as fold changes between baseline and early postinfusion time points, it is evident that participants cluster into unaffected and affected groups (Fig. 1E). To compare the initial B cell repertoire with the reconstituted repertoire after depletion, we applied the phylogenetic distance metric UniFrac (29), whose use on immune repertoire data was recently demonstrated (30). As expected, we found that group A displayed greater preinfusion-to-postinfusion distances than group B (Fig. 1F), consistent with repertoire reconstitution by a fresh set of naïve B cells not clonally related to the baseline population. These observations establish group A as a cohort that can be used to study B cell depletion.

B cell signatures of SSc-PAH The hypervariable complementarity-determining region 3 (CDR3) of antibodies is a key determinant of antigen specificity (31). Antibody repertoire analysis has been used to search for convergent CDR3 sequences that characterize the immune response to certain infectious agents, notably dengue virus (32). To investigate which antibodies may be responsible for autoimmunity in SSc, we tested all CDR3 amino acid sequences identified in this study, together with their one-mismatch derivatives, for enrichment in the SSc-PAH cohort versus the healthy cohort (Fisher’s exact tests with Benjamini-Hochberg correction). No sequences displayed a statistically significant association with SSc-PAH (fig. S1, A and B). This negative result may be due to limited statistical power or undersampling of the repertoire, but also to biological idiosyncrasy: The relevant autoantigen(s) may differ from participant to participant, and antibodies binding any given autoantigen may be encoded by distinct amino acid sequences. We may nevertheless gain insight into the disease by studying the global structure of the antibody repertoire rather than individual antibody sequences. SSc-PAH participants tended to display slightly lower overall BCR diversity (fig. S1C), a higher proportion of immunoglobulin D (IgD) diversity, and lower mean somatic mutation loads in IgD than healthy participants (Fig. 2A), consistent with previous reports of reduced lymphocyte counts (2) and expanded naïve B cell proportions (5). This finding constitutes evidence for altered B cell homeostatic mechanisms. Fig. 2 Signatures of SSc-PAH compared with healthy Ig repertoires at baseline. (A) IgD proportion of the IGH sequence repertoire and average mutation load in IgD. (B) Average number of transcripts recorded per distinct IGH sequence (i.e., sum total of all transcripts across all observed IGH sequences divided by the number of observed IGH sequences). Each point corresponds to one study participant. Wilcoxon–Mann-Whitney test, P = 0.0025; n = 26. (Asterisks in the figure correspond to significance codes: *0.01 ≤ P < 0.05; **0.001 ≤ P < 0.01; ***0.0001 ≤ P < 0.001.) (C) Distribution of transcript abundances, separated by isotype compartment. Each curve corresponds to one study participant. Here, “abundance” is the number of transcript copies associated with a sequence, and “frequency” refers to the number of distinct sequences that display a given abundance value. (D) Percentage of IGH sequences that had a relative transcript abundance of at least 0.05% of the total repertoire, separated by isotype compartment. Points straddling the lower edge of the panel correspond to the value 0, which cannot be rendered on the logarithmic scale. Wilcoxon–Mann-Whitney tests: P = 0.026, n = 26 for IgD/IgM; P = 0.00061, n = 26 for IgG/IgA. [Note: When the single high value in SSc-PAH IgD/IgM is removed (participant SR3), the P value for the IgD/IgM comparison changes from 0.026 to 0.049 (<0.05); there are two points at 0 for healthy IgM/IgD]. (E) Proportion of somatic mutations that were considered to have become fixed in their B cell lineage, that is, present in at least 80% of sequences in that lineage. The proportion of fixed mutations was calculated separately for each lineage and then averaged across all lineages in the relevant size bin. Here, lineage size refers to the number of distinct IGH sequences in the lineage. Wilcoxon–Mann-Whitney tests, for lineage size bins from left to right: P = 0.00040, n = 25; P = 0.27, n = 24; P = 0.015, n = 20; P = 0.022, n = 19; P = 0.023, n = 15. (Note: The two highest points for the “15–19” bin, the highest point for the “20–24” bin, and the two highest points for the “25–29” bin correspond to five distinct SSc-PAH participants, namely, SP5, SP4, SP2, SR4, and SR2.) (F) Percentage of lineages that used the IGHV2-5 segment for each isotype compartment. Points straddling the lower edge of the panel correspond to the value 0, which cannot be rendered on the logarithmic scale. Wilcoxon–Mann-Whitney tests: P = 0.00051, n = 26 for IgD/IgM; P = 0.0045, n = 26 for IgG/IgA. [Note: When the single high point for SSc-PAH IgD/IgM is removed (participant SR3), the P value for the IgD/IgM comparison changes from 5.1 × 10−4 to 7.3 × 10−6. When the single point at 0 in SSc-PAH IgG/IgA is removed (participant SP2), the P value for the IgG/IgA comparison changes from 4.4 × 10−3 to 9.6 × 10−3.] It has also been reported that memory B cells are reduced in number but activated in scleroderma (5). Here, we are able to obtain further insight into the anomalies of the antigen-experienced compartment. We found that the average expression of distinct BCRs was higher in SSc-PAH (Fig. 2B), that this increase in average expression was due to an accumulation of BCRs at the highest observed expression levels (Fig. 2C), and that the effect was mostly driven by the isotype-switched (i.e., antigen-experienced) compartment (Fig. 2D). The effect persisted when abundances were defined in a relative rather than absolute fashion, that is, as a proportion of the total repertoire abundances, eliminating any undesired potential systematic variations in sequencing depth (Fig. 2D). Note that amplification biases were corrected for using unique molecular identifiers (UIDs) (see Materials and Methods). IGH sequences observed at high abundances are likely to correspond to plasmablasts, so the excess diversity of high-abundance BCRs suggests that B cells differentiate more frequently into plasmablasts in SSc-PAH participants than in healthy ones. To further elucidate the importance of selection processes acting on proliferating B cell lineages, we measured what fraction of somatic mutations had become fixed in each lineage, defining a fixed mutation as one that was present in at least 80% of the lineage’s sequences. (A lineage was defined as a set of sequences with the same V segment, same J segment, same CDR3 length, and a 90% CDR3 sequence similarity in single-linkage clustering.) To eliminate artifacts due to differences in lineage sizes, we separately carried out the analyses for different ranges of lineage sizes. Germline alleles distinct from the reference could be misinterpreted as fixed somatic mutations, but no convincing evidence for the presence of novel germline alleles was found using TIgGER (33) on the present data set. We found that the SSc-PAH cohort, on average, tended to have a larger fraction of fixed mutations than the healthy cohort across lineage sizes of up to 30 sequences (larger lineages were not consistently found in participants) (Fig. 2E). This finding suggests that SSc-PAH lineages have, on average, undergone more selective sweeps, that is, have been subject to sustained affinity maturation in response to antigen. One can speculate that the antigens in question are self-antigens, perhaps triggered by molecular mimicry (34), and are not necessarily the same for each patient. To assess whether VDJ rearrangement probabilities or postrearrangement tolerance checkpoints (35) may be affected in SSc-PAH, we tested for differential expression of V segments, which form the most diverse set of germline segments compared with D and J. To focus on original VDJ recombination events while removing the effects of clonal expansion and transcript abundance, we defined V segment counts as a number of lineages (rather than a number of sequences or a number of transcripts). We found that IGHV2-5 was used at a lower level in SSc-PAH participants (fig. S1D and Fig. 2F). Note that IGHV2-5 usage was higher for the isotype-switched than for the unswitched compartment (Fig. 2F), but differential expression between disease and healthy cohorts was observed separately in both compartments, so the effect cannot be solely a consequence of the SSc-PAH–related expansion of the unswitched compartment (Fig. 2A). Differential expression was particularly pronounced in the unswitched compartment (Fig. 2F). Having identified these B cell signatures of SSc-PAH, we can ask how they are affected in participants undergoing B cell depletion (study arm A). In study arm A, the SSc-PAH–related excess of IgD diversity and deficit in mean IgD mutation load were transiently reversed (Fig. 3, A and B), as was the deficit in IGHV2-5 usage (Fig. 3C). By contrast, B cell depletion did not have an evident effect on the fraction of fixed mutations (Fig. 3D). Fig. 3 Time courses of SSc-PAH signatures. Dashed black lines indicate the range of values observed in healthy participants at baseline. Note that the data displayed for study arms A and B in the present figure (longitudinal experiment) were acquired in a separate batch from the data on healthy participants (cross-sectional experiment; already described in Fig. 2) and so may not be directly comparable to the healthy data; we nevertheless include the dashed lines as a guide. (A) IgD proportion. (B) IgD mutation loads. (C) Percentage of lineages using the IGHV2-5 segment (all isotypes combined). (D) Proportion of somatic mutations observed in a lineage that were considered “fixed,” that is, present in at least 80% of the lineage’s sequences. The proportion of fixed mutations was calculated separately for each lineage containing between 5 and 29 distinct sequences and then averaged across all those lineages.

Kinetics of repertoire reconstitution Because we have inferred that the peripheral naïve repertoire is almost entirely wiped out in study group A (Fig. 1, B to E), we can use that group’s longitudinal repertoire data to deduce various rates governing homeostasis of the humoral immune system. Using the number of IgM sequences with zero somatic mutations as a proxy for the naïve diversity of the repertoire, we can express the rate of change of naïve diversity as (1)where M x denotes the number of IgM sequences with x mutations, a is the generation rate of naïve sequences entering the periphery from the bone marrow, b is the mutation probability per unit time, c 1 is the class switch probability per unit time, and c 2 is the probability of apoptosis per unit time. Sequences that exit the M 0 diversity by acquiring a mutation will increase the M 1 diversity (2) Cells can leave the IgM compartment by undergoing either apoptosis (rate c 2 ) or isotype switching (predominantly to IgA or IgG; rate c 1 ). In the latter case, they increase the isotype-switched diversity (3)where S x denotes the number of IgA and IgG sequences with x mutations. In Eqs. 1 and 2, we have made the simplifying assumption that the “exit” rates b, c 1 , and c 2 applicable to the M 0 diversity are the same for the M 1 diversity; similarly, in Eq. 3, we have assumed that the rates b and c 2 governing exits from the S 0 diversity have the same values as in Eqs. 1 and 2. For a first approximation, it seems reasonable to assume that the fraction of a compartment that undergoes apoptosis, mutation, or class switching is roughly independent of the identity of the compartment, particularly if we restrict ourselves to lowly mutated compartments that likely have a smaller proportion of long-lived memory or plasma cells than more highly mutated compartments. Given that the assessment of diversity using repertoire sequencing is a reliable measurement (fig. S2, A and B), we can determine a and d ≡ (b + c 1 + c 2 ) from the B cell replenishment time course using Eq. 1, starting at the onset of replenishment (after the more or less extended near-total depletion period). Fitting the linear model for all groups simultaneously as described in fig. S2C [after correcting observed sequence numbers for unseen sequences using the Chao1 estimator (36)] suggested that coefficient d varied little between participants, whereas coefficient a varied significantly, so we fitted the data again using a reduced model with a universal coefficient d for all participants and coefficients a that were allowed to vary from participant to participant (fits shown in Fig. 4A). As expected for a steady-state diversity controlled by these generation and exit rates, the diversity to which the M 0 compartment returns after depletion is commensurate with its initial diversity (fig. S2D). Fig. 4 Quantitative modeling of repertoire replenishment rates. (A) Fit for the naïve diversity M 0 (number of nonmutated IgM sequences, Chao1 estimate) as a function of time, after the onset of replenishment. Only the two participants (SR1 and SR6) with the largest numbers of corresponding time points are shown. Numeric data values are also provided in table S3. (B) Fit parameters describing sequence generation (a), mutation (b), class switch (c 1 ), and apoptosis (c 2 ) rates. SR1 to SR6 are labels for individual participants. Participant SR3 is omitted because an insufficient number of time points after the onset of depletion were available. (C) Baseline naïve diversity versus naïve generation rate. R2 value is for the linear fit shown by the dashed line performed with intercept set to 0. Numeric data values are also provided in table S4. (D) Baseline naïve diversity versus time to onset of replenishment (defined as the interval between second visit and the earliest visit exhibiting more than 5000 distinct IGH sequences). Numeric data values are also provided in table S5. PPMC, Pearson product-moment correlation. In steady state, that is, in the baseline homeostatic regime, the time derivatives in Eqs. 2 and 3 are zero, so we can deduce (4)and (5)followed by c 2 = d − b − c 1 (Fig. 4B). Note that the steady-state form of Eq. 2 can be generalized to higher mutation loads (6)meaning that the distribution of mutation loads in the IgM compartment should be an exponential decay, which is roughly obeyed in our data (fig. S2E; note that the isotype-switched compartment displayed in fig. S2F follows a contrasting distribution peaked at a nonzero mutation load). Because the data show a hint that there may be a slight excess of highly mutated IgM sequences compared with the exponential law, which may be attributable to long-lived IgM memory cells and plasmablasts that do not follow our simple system of equations above, we decided to determine b on the basis of only M 0 and M 1 using Eq. 4 rather than fitting the entire distribution M x using Eq. 6. The resulting sequence generation, mutation, class switch, and apoptosis rates (Fig. 4B) show that apoptosis due to the absence of activation by any cognate antigen is the most important factor limiting the naïve diversity; once a cell gets activated by a cognate antigen, a mutation event is more likely to occur first than a class switching event (c 2 > b > c 1 ). If one wanted to predict the time course of naïve B cell replenishment after the first depleted time point, one would need three pieces: the time to onset of replenishment, the generation rate a, and the exit rate d. Now, first, the exit rate d appeared to be near-“universal” between participants. Second, in steady state (from Eq. 1 evaluated with ), the rate a is expected to be proportional to the baseline naïve diversity, a fact that was borne out by our data (Fig. 4C). Third, the time to onset of replenishment was found to be inversely correlated with baseline diversity (Fig. 4D). These observations suggest the intriguing possibility of predicting depletion and replenishment time courses from baseline measurements alone.