Abstract We previously showed, in a pilot study with publicly available data, that T cell receptor (TCR) repertoires from tumor infiltrating lymphocytes (TILs) could be distinguished from adjacent healthy tissue repertoires by the presence of TCRs bearing specific, biophysicochemical motifs in their antigen binding regions. We hypothesized that such motifs might allow development of a novel approach to cancer detection. The motifs were cancer specific and achieved high classification accuracy: we found distinct motifs for breast versus colorectal cancer-associated repertoires, and the colorectal cancer motif achieved 93% accuracy, while the breast cancer motif achieved 94% accuracy. In the current study, we sought to determine whether such motifs exist for ovarian cancer, a cancer type for which detection methods are urgently needed. We made two significant advances over the prior work. First, the prior study used patient-matched TILs and healthy repertoires, collecting healthy tissue adjacent to the tumors. The current study collected TILs from patients with high-grade serous ovarian carcinoma (HGSOC) and healthy ovary repertoires from cancer-free women undergoing hysterectomy/salpingo-oophorectomy for benign disease. Thus, the classification task is distinguishing women with cancer from women without cancer. Second, in the prior study, classification accuracy was measured by patient-hold-out cross-validation on the training data. In the current study, classification accuracy was additionally assessed on an independent cohort not used during model development to establish the generalizability of the motif to unseen data. Classification accuracy was 95% by patient-hold-out cross-validation on the training set and 80% when the model was applied to the blinded test set. The results on the blinded test set demonstrate a biophysicochemical TCR motif found overwhelmingly in women with HGSOC but rarely in women with healthy ovaries, strengthening the proposal that cancer detection approaches might benefit from incorporation of TCR motif-based biomarkers. Furthermore, these results call for studies on large cohorts to establish higher classification accuracies, as well as for studies in other cancer types.

Citation: Ostmeyer J, Lucas E, Christley S, Lea J, Monson N, Tiro J, et al. (2020) Biophysicochemical motifs in T cell receptor sequences as a potential biomarker for high-grade serous ovarian carcinoma. PLoS ONE 15(3): e0229569. https://doi.org/10.1371/journal.pone.0229569 Editor: David Wai Chan, The University of Hong Kong, HONG KONG Received: November 19, 2019; Accepted: February 9, 2020; Published: March 5, 2020 Copyright: © 2020 Ostmeyer et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All sequence data are freely available from the VDJServer Community Data Portal (CDP) (vdjserver.org) under the project accession 3276777473314001386-242ac116-0001-012. Funding: This project was supported by funding to LGC from UT Southwestern Medical Center, Be the Difference Foundation, Commercial Real Estate Women of Dallas (CREW Dallas), and an anonymous donor. CREW Dallas is NOT a commercial entity. It is a 501c3. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: NO authors have competing interests. The authors are not aware of any competing interests.

Introduction Despite the tremendous genomic heterogeneity between cancers, there is evidence that cancer patients mount T cell responses against antigens they have in common, including tumor antigens. Shared tumor antigens can be generally classified into three categories: (1) self-antigens with dysregulated expression or increased copy numbers, such as MelanA, HER2, SOX2, and NY-ESO-1 [1–5], (2) altered self-antigens, such as recurrent oncogenic mutations, including BRAF V600E and CDK R24C [6] and TGF-βRII frameshift mutations [7], and (3) non-self-antigens–viral epitopes expressed by virus-induced cancers, such as those derived from Human Papilloma Virus [8, 9], Hepatitis B Virus [10], and Epstein Barr Virus [11]. Ovarian cancer is considered rich in the first category of shared tumor antigens, with relatively large percentages of ovarian cancers expressing MAGE-A1, MAGE-A3, NY-ESO-1, and others [12, 13]. In the case of the alpha folate receptor, 97% of ovarian cancers were found to express it, with the vast majority having moderate or strong expression levels, while only 63% of healthy ovaries were found to express it, and in all cases the expression was weak [14]. Evidence for T cell responses against shared tumor antigens comes from studies demonstrating the presence of T cells with binding capacity for, and reactivity to, the shared antigens [1, 3, 4, 15–20]. Indeed, responses against shared tumor antigens may outnumber those against mutated neoantigens, including for highly mutated cancers such as melanoma [21, 22]. In addition to effector T cells responding to tumor antigens, a significant portion of the tumor-infiltrating lymphocyte (TIL) population is expected to be regulatory T cells that are reactive to tissue-restricted self-antigens associated with the organ of cancer origin, as these T cells are highly enriched in cancer lesions [21]. Thus, on balance, we expect much of a TIL population to be composed of T cells with specificity for antigens shared across cancer patients and not present, or present at significantly reduced levels, in cancer-free individuals. We hypothesized that the above-described T cell responses could serve as the basis for cancer early detection biomarkers and sought to develop a method for detecting them that didn’t require knowledge of the target antigens and didn’t rely on the assumption that T cells responding to a common target would express T cell receptors with the same amino acid sequence. Utilizing publicly available TCR deep sequencing data, we applied multiple instance learning (MIL) after converting the TCR amino acid sequences to a biophysicochemical representation using Atchley Factors [23–25]. We found that TCR repertoires from breast or colorectal cancer TILs could be distinguished from adjacent healthy tissue repertoires by the presence of TCRs bearing specific, biophysicochemical motifs in their antigen binding regions [25]. The motifs were different between the two cancer types, and both achieved high classification accuracy. The colorectal cancer motif achieved 93% accuracy, while the breast cancer motif achieved 94% accuracy. In the current study, we sought to establish the plausibility of using TCR motifs for ovarian cancer detection and applied our method to locally collected patient samples. We made two significant advances over the prior work. First, the prior study used patient-matched TILs and healthy repertoires, collecting healthy tissue adjacent to the tumors. Thus, the classification task was to distinguish two repertoires that had both been collected from an organ effected by cancer, one repertoire from within the cancerous lesion and one repertoire from a lesion-free region. The current study collected TILs from patients with high-grade serous ovarian carcinoma (HGSOC) and collected healthy ovary repertoires from cancer-free women undergoing hysterectomy with salpingo-oophorectomy for benign disease. Thus, the current classification task is distinguishing repertoires from women with HGSOC versus repertoires from women with healthy ovaries. The second advance comes from the opportunity to assess the motif on a blinded test data set. In the prior study, only a training data set was available, and classification accuracy was measured by patient-hold-out cross-validation. In the current study, both a training and test data set were available. Thus, in addition to assessing classification accuracy of the motif by patient-hold-out cross-validation, the ability of the motif to generalize to a new, independent cohort of data not used for motif discovery was assessed. The current study revealed a TCR biophysicochemical motif present overwhelmingly in HGSOC TILs repertoires but rarely in healthy ovary repertoires. The motif is specific to HGSOC, i.e., it is different from the motifs previously identified for colorectal and breast cancer. The classification accuracy assessed by cross-validation on the training data was 95% (19/20). Applying the same model selection and cross-validation procedure to data with permutated labels resulted in an average classification accuracy of 55%, and the accuracies of all 20 permutations were < 95%. Application of the best model to the unseen test set resulted in a classification accuracy of 80% (16/20), indicating that the motif has some capacity to generalize. These results strengthen the proposal that cancer detection approaches might benefit from incorporation of TCR motif-based biomarkers and call for studies assessing the approach on large training and testing data sets and on additional cancer types.

Results Cohort I The best performing model by patient-holdout cross-validation on Cohort I used a motif of three amino acid residues and allowed for a single gap (Table 2). Under that model, the average number of motifs per tumor sample was 7,683.3, and the average number of motifs per healthy sample was 6,154.2. The largest number of motifs in any sample was 13,277. The best average log-likelihood was observed at the last (2,500th) gradient optimization step. The model correctly classified 95% (19/20) of held-out samples with an average log-likelihood of 0.332 bits (Fig 3A). The model correctly classified all healthy ovarian samples, giving a specificity of 100%, although one was quite close to the threshold score of 0.5. The model correctly classified all but one tumor sample, giving a sensitivity of 90%. To estimate the probability of correctly classifying 19 of 20 samples by chance, we performed a permutation analysis with 20 permutation runs. For each permutation, the sample labels were permuted and then patient-holdout cross-validation was performed. Early stopping was applied. The classification accuracies of all 20 permutations were < 95%, allowing us to assign p < 0.05 to the observed accuracy (Table 3). The average log-likelihood over all permutations was 0.993 bits, and the average accuracy was 55%. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 3. Results. (a) Classification results obtained by leave-out cross-validation for each patient in Cohort I. (b) Illustration of the classifier weights averaged across all 20 cross-validation runs (error bars for the standard deviation are omitted because the range was too small to plot relative to the size of each arrow). For each of the five Atchley factors, the weights are shown for the three residue positions. The weight for the log-frequency of the receptor is also shown. Positive weight values are shown pointing up, and negative weight values are shown pointing down. The length of the arrow corresponds to the weight's magnitude. (c) All motifs with a score above 0.5 (middle column) are shown for the 20 patient samples. Each motif is shown in the context of its respective CDR3. The leftmost column indicates the patient and the right most column indicates the number of times the motif is observed in the sample. (d) Classification results obtained on Cohort II test samples. (e) The ROC curve shows true and false positive rates for different thresholds of a positive diagnosis based on the model applied to Cohort II. The area under the curve is 0.79. (f) All motifs with a score above 0.5 (middle column) shown for the 20 patient samples in Cohort II. Each motif is shown in the context of its respective CDR3. The leftmost column indicates the patient and the right most column indicates the number of times the motif is observed in the sample. https://doi.org/10.1371/journal.pone.0229569.g003 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 3. Permutation results. Each row corresponds to a single permutation of the Cohort I data set, indicated in column 1. The second column shows the loss averaged over all patient-hold-out cross-validations. The third column shows the classification accuracy over all patient-hold-out cross-validations. The fourth column shows the fitting step, out of 2500, at which the lowest average loss was observed. https://doi.org/10.1371/journal.pone.0229569.t003 To discern the features that increase the probability of a HGSOC categorization, we examined the model weights across all 20 cross-validation runs (Fig 3B). The weights reveal how each Atchley factor contributes to the score and the relative importance of each position in the motif. Motifs with a positively charged, hydrophilic residue that tends to participate in alpha-helices in position 1, followed by a small residue that tends to participate in bends and coils in position 2, followed by a large, positively charged residue in position 3 will be scored by the model with a high probability of deriving from a HGSOC-associated repertoire. The weight for the relative abundance of the motif is positive indicating that more abundant motifs would have a higher probability than less abundant motifs. We aligned the high scoring motifs from each holdout sample and present them within the context of the CDR3 sequences from which they originated (Fig 3C). The motifs varied in terms of their component residues, but a restricted set of amino acids was observed at each position. Amino acids Glutamic acid, Lysine, and Arginine were common in position 1, Tryptophan and Tyrosine were common in position 2, and Histidine and Tryptophan were common in position 3. We also determined the number of times each CDR3 appeared in each sample and noted that most of them appear only once. None of the CDR3 sequences are shared across patients. Cohort II Given the potential for overfitting and model selection bias, we assessed the model’s performance on samples not used for model selection or parameter fitting, i.e., on Cohort II. After selecting the best performing model using cross-validation on Cohort I, as described above, we then refit the parameters of the selected model using all 20 Cohort I samples using 2,500 gradient optimization steps, which was determined to be the optimal number of steps in the cross-validation (Table 2). The resulting weights β 1 through β 16 appear indistinguishable from those in Fig 3C. The newly fitted model was then applied to Cohort II and correctly classified 80% (16/20) of the samples with an average log-likelihood fit of 0.821 bits. The model correctly classified all but one healthy ovarian sample (specificity 90%) and misclassified three tumor samples (sensitivity 70%) (Fig 3D). The area under the Receiver Operating Characteristic (ROC) curve was 0.79 (Fig 3E). We aligned the high scoring motifs from the Cohort II samples and present them within the context of the CDR3 sequences from which they originated (Fig 3F). As with the Cohort I motifs, the amino acid residues present at each position vary, but the variability is restricted to a subset. As with the Cohort I samples, amino acids Glutamic Acid and Lysine are common in position 1, Tryptophan and Tyrosine are common in position 2, and Histidine, and Tryptophan are common in position 3. In contrast, Arginine was common in position 1 of Cohort I motifs but is found in position 1 of only one Cohort II motif, and Aspartic Acid is common in position 3 of Cohort II motifs but was not observed in position 3 of Cohort I motifs. As with Cohort I, we found that the majority of CDR3s containing high-scoring motifs were present only one time in their sample.

Discussion We previously hypothesized that T cell responses against antigens shared among cancer patients might enable development of a new approach to cancer detection [25]. Shared tumor antigens are not favored for antigen-targeted immunotherapy where the goal is to elicit such a high degree of tumor-cell killing that the tumor is eradicated. In that case, antigens with expression patterns highly-restricted to the tumor and that are targeted by high-affinity TCRs are needed. For cancer detection, however, it is only necessary that the corresponding TCRs be present in patients with the cancer and not in those without or that they be present with an elevated abundance in those with cancer relative to those without. To determine whether such T cell responses might enable cancer detection, we first sought to develop a method for identifying the corresponding TCRs that didn’t require knowledge of the target antigens and didn’t rely on the assumption that T cells responding to a common target would express TCRs with the same amino acid sequence. To accomplish this, we developed the method described here, converting amino acid sequences into numerical vectors whose components correspond to amino acid biophysicochemical values, such as charge, and applying multiple instance learning. In all cases in which the method has been applied, it has identified a motif that can distinguish the tissue or patient groups of interest with solid performance [25, 33]. We hypothesize that TCRs bearing these motifs have overlapping antigen binding profiles and are concentrated in cancer tissue due to the presence of a common antigen there. This is a hypothesis that will have to be tested experimentally, but the strong classification performance of the motifs warrants further study, despite uncertainty regarding any shared antigen specificity. In our first application of this method to TCRs, we considered motifs of four residues and did not allow gaps [25]. Additionally, we took the natural logarithm of the motif relative abundance term. Taking that same model and fitting the weight values on Cohort I, we obtained a classification accuracy of 90% with a likelihood error of 0.666 (Table 2). To determine whether we could improve the performance, we explored additional models not considered in our prior work (Table 2). The best performing model used a three-residue motif allowing for one gap and achieved a classification accuracy of 95% with a likelihood error of 0.332 (Table 2). Thus, while the approach has produced good results across multiple cancer types, each one has required optimization of the motif representation to obtain the best performance. Additional innovation to the modeling approach is required to produce a method that works across multiple cancer types without this customization. Whenever multiple models are evaluated on the same data and the best performing model is selected, model selection bias can occur. To determine the extent of model selection bias in our Cohort I result, we evaluated the selected model’s performance on Cohort II, which is wholly unseen (i.e., not used for parameter fitting or model selection). The classification accuracy on Cohort II is 80% with a likelihood error of 0.821. Reduced performance on test data is expected, and these results indicate that the model has identified a signal that is expected to generalize to new samples with 80% accuracy. We have applied the method to three cancer types and in each case identified a distinct biophysicochemical motif. While for breast cancer, all receptors bearing the motif were of high abundance, and in some cases were the top most abundant clone, for colorectal cancer, all but a few of the motif-bearing clones were of low abundance [25]. In the case of ovarian cancer, we again observed that motif-bearing clones are of low abundance, and in fact, in all but a few cases, the corresponding CDR3 sequences were observed in the sample only a single time. While this is perhaps surprising, we note that frozen tissue was used for the colorectal samples in our prior study, while the ovarian samples in this study were all formalin-fixed paraffin-embedded samples that had been collected between 2009 and 2016. The samples are therefore likely subject to significant DNA damage and to have significantly reduced sequence coverage of target regions [34]. It seems unlikely that the motif identified by our approach is purely an artifact given that it correctly classified 80% of the Cohort II samples. Taking the data at face value, it appears the motifs that mark repertoires as being HGSOC-associated are found in low frequency clones. While our previous results demonstrate that TCR repertoires from TILs can be distinguished from adjacent healthy tissue repertoires by the presence of TCRs bearing specific, biophysicochemical motifs in their antigen binding regions, our current results go further by demonstrating that TILs repertoires from women with HGSOC can be distinguished from ovarian tissue-associated repertoires from women with healthy ovaries. Thus, in this case, we are distinguishing women with cancer from women without cancer, which is the classification task that is directly relevant to cancer detection. Despite this significant advance over the prior work, however, there are still several limitations that must be addressed. First, the HGSOC samples used in this study were primarily from women with stage III or IV disease. It is critical to determine whether this or another signature can be detected at early stages of disease, particularly before the appearance of invasive disease. Second, to have any potential utility for cancer detection, the signature must be detectable in tissue collected by minimally invasive means. That typically means blood. While the overlap between TILs T cell repertoires and the peripheral T cell repertoire has been shown to be relatively low, it is much higher, with as much as ~50–60% overlap, when the CD8+PD-1+ subset of peripheral T cells is sorted [35–40]. Furthermore, the specific antigens recognized by this subset were similar to that of the TILs population [40]. Thus, it is reasonable to expect that a TCR signature found in the tissue can be detected in this or another peripheral T cell subset. An additional potential utility of our approach is in the diagnosis of women who present with an ovarian mass. Thus, it will be essential to assess the signature on benign ovarian tumors, as well as on ovarian cancers of other types, to determine whether the signature presented here is present in those cases or whether these have their own unique signature. Taken together, our current and prior results indicate that TCR-based biomarkers have potential utility for cancer detection. They justify further studies on larger patient cohorts designed to improve the generalizability of the signature with a particular focus on blood samples from patients with early stage disease. Additionally, they justify application of this method in other cancer types, such as pancreatic cancer, where, like ovarian, the need for early detection methods are particularly critical.