The multiple co-clustering method yielded 15 views in which there was large variation in the number of features, and the number of subject and feature clusters (Supplementary Table S3; views are sorted in descending order of the number of features). The number of subject clusters ranged from 3 to 9, while the number of feature clusters ranged from 1 to 11 for numerical features, from 1 to 3 for categorical features, and 1 for integer features in case that these types of features may be included in a view. Visual inspection of these views (Fig. 2) shows that the distributions in numerical feature clusters are similar for each subject cluster within a view, except for view 10 (several different features clusters are clearly visible). Further, view memberships of features are displayed in Supplementary Fig. S1 for FC features and in Table 2 for non-FC features. With respect to relationships between views, we focused on the first principal component of numerical features for each view. The first principal component is a weighted linear combination of features that explains most of the variability of features, representing the underlying trend of the data. To compare underlying trends of views, we evaluated Pearson’s correlation coefficient between the first principal components of relevant FC features in each view (Fig. 3a). First principal components are highly correlated between different views, except for view 10, in which the first principal component seems to be independent of those in most of the remaining views (correlated only with views 6, 7, 8 and 12 at significant level 0.05). Note that here we relied on PCA rather than CCA, because CCA focuses on maximum correlation with specific combinations of features while our interest is to evaluate correlations of underlying trends.

Figure 2 Visualization of all views (view 1–15). The horizontal axis denotes features while the vertical axis denotes subjects. Depressive subjects are indicated by a hyphen while features directly related to diagnose of depression, namely, BDI and HRSD are indicated by a vertical bar. Note that when adjacent subjects or features are indicated by hyphens, these hyphens appear to be merged in a panel. Subject clusters are sorted in ascending order of proportions of depressive subjects from top to bottom. Further, in each subject cluster, control subjects are first displayed. Feature clusters are sorted in order of numerical, categorical and integer types from left to right (separated by thick bold lines). In each type of features, feature clusters are further sorted in descending order of cluster size. Within this sortation, the order of subjects or features is arbitrary. Grey color denotes a missing entry. Full size image

Table 2 Members of non-FC features in a view. Full size table

Figure 3 Analysis of feature/subject clusters. Panel (a) Pearson’s correlation coefficients between first principal components of FC features in views. Panel (b): Agreement between the subject cluster and the label of control/depression in blue. The agreement is measured in terms of adjusted Rand Index52: zero for no agreement (chance level); one for complete agreement. Also, the proportion of depression-related features (i.e., clinical questionnaire) for selected features is displayed in yellow. Panel (c): Average Cohen’s d for FC feature clusters in each view. For each feature cluster, we sorted estimated means in Gaussian component, and evaluated Cohen’s d, i.e., \(d=({\mu }_{2}-{\mu }_{1})/\sqrt{({\sigma }_{2}^{2}+{\sigma }_{1}^{2}\mathrm{)/2}}\), between neighboring means μ 1 and μ 2 (μ 1 < μ 2 ). Finally, we took the average of all Cohen’s d within a view. In this analysis, we conditioned feature clusters that had more than three FC features, while subject clusters that had more than three depressive subjects. Red bold lines denote levels of separability37 while dashed lines denote boundaries between these levels. Note that for view 15, no FC feature is included; hence, we omitted this view from the analysis in Panels (a) and (c). Full size image

Next, we analyzed subject clusters. Our intent was to find meaningful subtypes of depression. Accordingly, we focused on relevance of cluster memberships to depression and separability between subject clusters. First, we evaluated relevance of subject cluster solutions for depression (Fig. 3b). In terms of the concordance between cluster memberships, the label of control/depression, and the proportion of depression related features for selected features, view 10 is the most useful for control/depression labels. Here we defined depression related features as follows: for numerical features BAS, BDI, BIS, GAF, PHQ9, HRSD, JART, PANASP, PANASN, SHAPS, STAI, N, E, O, A, and C; for categorical features drug, Melancholic, Recurrent, Response, Remission, BDI items, HRSD items, SNPs, and MINIs. In short, these features represent the current status of the psychiatric condition of a subject. In Supplementary Table S2, these features are shown with asterisks *. We did not include FC features for this definition.

Second, we evaluated separability of subject clusters of depressive subjects in FC feature clusters by means of Cohen’s d (Fig. 3c)37, which is commonly used to measure effect size of differences of two distributions. Here, we follow the criterion of effect size37: 0.2 small; 0.5 medium; 0.8 large. Using this criterion, subject clusters of depressive subjects in view 10 were well separated, while those in views 4, 6, 7, 8, 12, 13 and 14 were modestly so, and the remainder of the views were slightly separated. These results suggest that view 10 is specific in two regards: relevance for control/depression and large separability of depressive clusters in terms of FC.

Further, we analyzed view 10 more in detail. This view consists of five subject clusters, which match the label of control/depression well: Two clusters for control subjects (subject clusters C1, C2) and three clusters for depressive subjects (subject clusters D1, D2, D3) (Fig. 4a). View 10 contained several non-FC features that discriminate well among subject clusters (features in bold in Table 2). Further, the view had a high proportion of depression-related features (60%), including 39 numerical features and 19 categorical features, but none of the integer features. Since the categorical features do not clearly distinguish between subject clusters (Fig. 4b), we focus only on numerical features in this view for further analysis. Numerical features are clustered into five feature clusters F1-5. We characterize each feature cluster based on characteristics of their member features (Supplementary Table S4). Since this view is closely related to the label of control/depression, it is consistent that the view has feature clusters (namely, F3 and F5) that are related to the initial status of depression. However, it is noteworthy that the view also contains a feature cluster that is related to the after-treatment status of depression (namely, F1). Furthermore, features related to CATS (Child Abuse Trauma Scale) are included in the same feature cluster F1. These results suggest that the subject clusters D1, D2 and D3 for depressive subjects may be related to after-treatment status of depression, which might be further related to stress experiences during childhood. Lastly, feature clusters F2 and F4 are related to specific functional connectivity in fMRI image data, which suggests a possible association between the subject clusters and neural substrates (Supplementary Fig. S1). In this view, the majority of these functional connections are, on average, higher for depressed patients than for controls, while the remainder of views, except for view 11, has more connectivity that is higher for controls than for depressed patients. Further, this connectivity network in view 10 forms the topology of a star with the central hub, which is also observed in other views. The relevant brain areas for the functional connectivity of view 10 are Dorsal DMN.02 (hub), Dorsal DMN.04, Dorsal DMN.06, Ventral DMN.01, Ventral DMN.05, Ventral DMN.09, LECN.01, Precuneus.01, Dorsal DMN.01, Dorsal DMN.03, Ventral DMN.07, RECN.02, and RECN.04, where DMN denotes Default Mode Network; RECN Right Executive Control network; LECN Left Executive Control network (Supplementary Fig. S2). Based on Automated Anatomical Labeling (AAL), the hub of this network is identified as the right angular gyrus (AG) while the remainder of brain areas are ACC (Anterior Cingulate Cortex). R.L, Angular.L, Calcarine.R.L, Occip.Mid.L, Frontal.Med.Orb.R.L., Frontal.Mid.R.L, Frontal.Mid.Orb.L, Frontal.Sup.R.L, Frontal.Sup.Medial.R.L, PCC (Posterior Cingulate Cortex). R.L, Precuneus.L, and Ventral PCC.R.L (Supplementary Table S5). The star topology structure of this network is visualized in Fig. 5, which shows the key role of AG in relevant default mode networks.

Figure 4 Visualization of view 10. For numerical features in Panel (a) and categorical features in Panel (b). The horizontal axis denotes features while the vertical axis denotes subjects. Depressive subjects are indicated by hyphens. Both subjects and features are sorted in the order of the subject- and feature-cluster indices. Subject clusters are named C1, C2, D1, D2, and D3 from top to bottom, while for numerical feature clusters, F1, F2, F3, F4, F5 from left to right. Subjects within a subject cluster are sorted in the order of control and depressive subjects. The color in a cell denotes the corresponding value of the data matrix indicated in the color bar where the missing entries are in gray. Full size image

Figure 5 Structure of functional connectivity in feature clusters F2 (in bold line) and F4 (in dashed line). Color corresponds to intrinsic connectivity networks. Relevant brain areas for network nodes are based on Automated Anatomical Labeling (AAL). DMN denotes default mode network; RECN right executive control network; LECN left executive control network. Full size image

In the cluster analysis of view 10, it is important to note that view 10 contains after-six-week scores such as BDI. Since view 10 also includes other features that are available before onset of treatment, this raises the possibility of prediction of treatment outcome prior to treatment. In this regard, we explore several important implications drawn from the results of cluster analysis. First, subject clusters can be represented by a small number of relevant features. Since in our clustering method each feature cluster consists of similar (i.e., highly correlated) features, a feature cluster can be represented by a reduced number of these features. It turns out that a subject cluster solution in a view can be represented by a small number of features associated with each feature cluster. View 10 (Fig. 4a) is represented by CATS scores (associated to feature cluster F1), the first principal scores of angular-gyrus related FC (associated to F2; we simply refer to it ‘AG related FC score’), and BDI (associated to F3). These features can indeed explain the resultant subject cluster membership properly (Fig. 6a). Here, we do not consider features in F4 and F5, because these features are strongly correlated with those in F2 and F3, respectively. Based on distributions of these scores in each subject cluster (Fig. 6b–e), we can characterize subject clusters as follows: subject cluster D1 by high CATS, high FC, low BDI and high BDI6w; subject cluster D2 by low CATS, moderate FC, low BDI and low BDI6w; subject cluster D3 by high CATS, low FC, high BDI and low BDI6w (Supplementary Table S6), where we also include the after-six-week BDI (BDI6w) score associated with F1. This characterization provides a basis for predicting remission of SSRI treatment that is measured by the after-six-week BDI score, because for subject cluster D1, the after-six-week BDI score is high; hence, a subject in D1 is unlikely to remit while a subject in D2 or D3 is likely to remit.

Figure 6 Characterization of D1, D2 and D3. Panel (a): Results of prediction of subject cluster memberships via the leave-one-out cross validation. The y-axis denotes the proportion of correct predictions of subject cluster membership: true cluster memberships are those of view 10, while predicted cluster memberships are those yielded based on finite mixture models via a leave-one-out cross validation. We carried out the leave-one-out cross validation as follows. First, we remove a particular depression subject from the data. Second, using the reduced data, we re-estimate parameters in Gaussian mixture models for specific predictors. For this parameter estimation, we use subject cluster memberships on D1, D2 and D3. Third, based on the obtained Gaussian mixture model, we estimate a cluster membership of the removed subject, which enables us to evaluate whether the model can correctly predict the cluster membership of the removed subject. Lastly, we repeat these steps for all depressive subjects, evaluating a proportion of correct predictions on cluster memberships of the depression subjects. For predictors (i.e., selected features for mixture models) in the x-axis, we started with CATS, subsequently adding functional connectivity in feature clusters F2 and F4, BDI and the remainder of features in view 10. Panels (b)–(e): Boxplots for each subject cluster in view 10. Panel (b) for CATS scores; Panel (c) for the first principal scores for FC features in feature cluster F2; Panel (d) for BDI scores; Panel (e) for after-six-week BDI scores. Full size image

Finally, we investigate the remainder of views. We have identified non-FC features that discriminate between subject clusters based on a statistical test for individual features (features in bold in Table 2). From these results, significant correspondences between non-FC features and brain areas are found in view 4, view 8, and view 12 (Supplementary Table S7). Note that significant correspondence is also observed in view 10). These correspondences help us to interpret these views. First, view 4 includes depression items related to BDI, MINI1, and MINI3, while it also includes FC-features in which the three dominant brain areas are Language.02 (Temporal.Mid.L, in AAL), Ventral DMN.04 (Occipital.Mid.L), and Language.05 (Frontal.Inf.Orb.R) with 25, 22, and 21 connectivity, respectively. Second, view 8 includes features BDI_t1_17 and HRDS_t2_15. These items in clinical questionnaires are related to fatigue and hypochondriasis, respectively. The dominant brain area is Dorsal DMN.02 (Angular gyrus.R). Third, view 12 includes Episode and RecNum, which are related to the extent of repetitions of MDD. The dominant brain areas are Language.04 (Temporal.Mid.L), Dorsal DMN.01 (Cingulum.Ant, Frontal.Sup.Medial), and RECN.04 (Frontal.Sup.Medial).

We have so far analyzed subject cluster solutions one by one. One may wonder if it is possible to integrate all these cluster solutions, yielding a single cluster solution, which may exhibit distinct demographic, clinical and end-phenotypic characteristics. One possible way of such analysis is to carry out hierarchical clustering, using Hamming distance defined as the percentage of views in which subject cluster memberships differ (Supplementary Fig. S3). However, in the present study, we were not able to characterize the yielded clusters in a meaningful manner. Possibly, by integrating cluster solutions, useful information on cluster solutions might be lost.