The basic gist of GLIPH (Grouping of Lymphocyte Interactions by Paratope Hotspots) is that since TCRs with similar antigen specificity have been observed to have similar CDR3 sequences then clustering TCRs in the space of amino acid sequences will also effectively cluster them by specificity. The GLIPH algorithm links two TCRs into a cluster by either of the following heuristics:

global: nearly identical sequences (differing by only 1 amino acid)

local: sharing distinctive k-mers (for k=2,3,4) in the region of the TCR most likely to contact the peptide, where distinctiveness is defined as being comparatively rare in a reference population of naive T-cells

In addition to these linkage criteria, GLIPH also uses a variety of features to score the quality/purity of each cluster. The complete algorithm schematic is given in Extended Data Figure 3: