a, Spike-in normalized COF-STAP-seq tag counts (left heat map) for 30,936 CP candidates (columns) clustered based on their preferential activation by different COFs (rows). These tag counts were transformed for each CP separately into Z-scores (right heat map) to highlight the differential activation by different COFs independently of the overall activity of the CP. We then used these Z-score-transformed values to cluster the CPs into five groups of respectively similar activation profiles across all COFs irrespective of absolute activation levels using k-means clustering (the CPs in both heat maps are organized identically according to these groups, see coloured bar on top). The line plot on the left shows the average spike-in normalized COF-STAP-seq tag count across all CPs of each group for each of the 13 COFs and the two controls. b, Per cent of variance in the data explained by clustering CPs into different number of clusters with k-means (k ranging from 1 to 10). Increasing the number of clusters beyond five is of little benefit in explaining the variance in the data. c, Gain of per cent variance explained by increasing the number of clusters in steps of one from three to six. d, Distribution of sum of squared distances to centroids of the clusters for number of clusters ranging from one to ten, using a fivefold cross-validation approach. The data was binned randomly into five equally sized bins, one bin was left aside as a test set and clustering was performed on the remaining four bins. Sum of squared distances to the nearest centroid for each data point in the test set was then calculated. The procedure was repeated for each number of clusters (k). Increasing the number of clusters beyond five does not lead to substantially more coherent or dense clusters. For each box, n = 30,936 independent CPs. e–g, Clustering of 30,936 CPs (columns) based on their preferential activation by different COFs (rows) as in a, but using data for only one replicate as indicated. k-means clustering (k = 5) for each individual replicate reproduces qualitatively the same groups obtained with the merged replicates (see a). h, Agreement between assignment of CPs to groups in individual replicates and in the pooled data (left). In each replicate, around 85% of CPs are assigned to the same group as in the assignment based on pooled replicates. Bar plot, number of replicates that reproduce group assignment for individual CPs is shown on the right. For around 94% of CPs, the group assignment is reproduced in at least two replicates. i, Pairwise distances in CP response to six COFs and two controls for CPs belonging to the same (intra-) or different (inter-) clusters (defined in S2 cells) in all three D. melanogaster cell lines. n = 115,508,123 and 362,994,457 independent CP pairs for intra and inter-cluster boxes, respectively. *P ≤ 0.01; one-sided Wilcoxon rank-sum test. j, Induction (activation above GFP) of CPs (five groups defined in S2 cells; see a) by P65 and six COFs in S2 (top), OSC (middle) and Kc167 (bottom) cells. Each of the six COFs preferentially activates the same CP groups in all three cell lines; that is, COF–CP preferences appear to be cell-type independent. n = 5,723, 11,538, 3,203, 5,038 and 5,434 CPs, for groups 1 to 5, respectively. In d, i, j, boxes show median and interquartile range; whiskers indicate 5th and 95th percentiles.