Patient characteristics and fully automated image features

We obtained 2,186 haematoxylin and eosin (H&E) stained whole-slide histopathology images from The Cancer Genome Atlas (TCGA)37,38, encompassing lung adenocarcinoma and lung squamous cell carcinoma as well as adjacent benign tissue. All images captured at × 40 magnification were tiled with open microscopy environment tools39. To target regions with pathological changes, our automated pipeline skipped regions with relatively sparse cellularity such as alveolar spaces and selected the 10 densest tiles per image for further analysis. We also acquired 294 tissue microarray images from the Stanford Tissue Microarray (TMA) Database40, with one representative histopathology image selected by pathologists for each of the 227 lung adenocarcinoma and 67 lung squamous cell carcinoma patients. Patient characteristics of both the TCGA and TMA cohorts are summarized in Tables 1 and 2, respectively.

Table 1 Patient characteristics of TCGA cohort. Full size table

Table 2 Patient characteristics of the TMA cohort. Full size table

To extract objective morphological information from thousands of images, we built a fully automated image-segmentation pipeline to identify the tumour nuclei and tumour cytoplasm from the histopathology images using the Otsu method41 (see Methods for details), and extracted quantitative features from the identified tumour nuclei and cytoplasm (Supplementary Fig. 1). Our fully automated pipeline reliably identified most tumour cells and tumour nuclei, and the results were consistent across different slides and images from different batches (Supplementary Fig. 2). A total of 9,879 quantitative features were extracted from each image tile with CellProfiler42,43. Types of image features included cell size, shape, distribution of pixel intensity in the cells and nuclei, as well as texture of the cells and nuclei. Supplementary Table 1 provides a list of feature categories included in this study.

Image features accurately identify tumour parts

To determine if the quantitative image features were biologically relevant, we first examined if they could distinguish malignancy from normal adjacent tissue (inflammation, atelectasis or lymphocytic infiltration in the absence of tumour cells) for the TCGA cohort. We used seven classifiers: naive Bayes, support vector machines (SVM) with Gaussian kernel, SVM with linear kernel, SVM with polynomial kernel, bagging for classification trees, random forest utilizing conditional inference trees44 and Breiman's random forest45. The TCGA data set was randomly partitioned into distinct training and test set, with models built and optimized through the training data and classification performance evaluated through the test set. This process was repeated 20 times to ensure the robustness of our classifiers. Our classifiers achieved an average area under the receiver operating characteristic curve (AUC) of 0.81 (best classifiers: SVM with Gaussian kernel, random forest utilizing conditional inference trees, and Breiman's random forest (AUC=0.85). The performance of these three classifiers did not differ significantly (analysis of variance (ANOVA) test P value=0.8514)) in distinguishing between adenocarcinoma and adjacent dense benign tissue when using the top 80 quantitative features (Fig. 1a and Supplementary Table 2). When classifying squamous cell carcinoma with adjacent benign tissue, the AUCs of our classifiers with 80 features were >0.85 (Fig. 1b and Supplementary Table 2). The performance of the top three classifiers did not differ much (ANOVA test P value=0.31). In general, the top quantitative features were Haralick features of the nuclei (sum variance, difference variance, correlation coefficient of adjacent pixels), radial distribution of pixel intensity and intensity mass displacement of the cytoplasm.

Figure 1: Quantitative image features accurately distinguished malignancies from adjacent dense normal tissues. (a) ROC curves for classifying lung adenocarcinoma versus adjacent dense normal tissues in the TCGA test set. Classifiers with 80 features attained average AUC of 0.81. (b) ROC curves for classifying lung squamous cell carcinoma from adjacent dense normal tissues in the TCGA test set. Classifiers with 80 features attained average AUC of 0.85. The performance of different classifiers is shown. CIT, conditional inference trees; ROC, receiver operator characteristics. Full size image

Image features distinguish tumour types in both cohorts

To further validate the biological relevance of the quantitative features, we applied our classifiers to distinguish between adenocarcinoma and squamous cell carcinoma using the same set of fully automated features in both TCGA and TMA data sets. Our results showed that using 240 features selected by their utility in this task (assessed through the information gain ratio measurement), our best classifiers, including SVMs with Gaussian kernel and random forest classifiers, attained an AUC of above 0.75 in the TCGA data set (average of all classifiers: 0.72; Fig. 2a and Supplementary Table 3). The performance of the top classifiers did not differ significantly (ANOVA test P value=0.08). The top quantitative features selected by information gain ratio included Haralick texture features of the nuclei (sum entropy, InfoMeas1, difference variance, angular second moment), edge intensity of the nuclei, texture features of the cytoplasm and intensity distribution of the cytoplasm. Some of the feature groups overlapped with those that were used to distinguish between benign and malignant lesions. For instance, Haralick texture features such as sum entropy and difference variance were among the top features in both classification tasks.

Figure 2: Quantitative image features successfully distinguished histopathology images of lung adenocarcinoma from those of lung squamous cell carcinoma. (a) ROC curves for classifying the two malignancies in the TCGA test set. Most classifiers achieved AUC>0.7. (b) ROC curves for classifying the two malignancies in the TMA test set. Most classifiers achieved AUC >0.75, indicating that our informatics pipeline was successfully validated in the independent TMA data set. The performance of different classifiers is shown. CIT, conditional inference trees; ROC, receiver operator characteristics. Full size image

The relevance of our quantitative image features for diagnostic classification was also validated in the TMA data set. Utilizing the same informatics pipeline on these samples, most of the classifiers achieved AUC around 0.78 (SVM with Gaussian kernel has the highest AUC of 0.85; the performance of the top three classifiers did not differ significantly (ANOVA test P value=0.13)), indicating the robustness of our informatics method (Fig. 2b and Supplementary Table 3). The slightly higher AUC in the TMA samples relative to the TCGA samples may be due to the manual selection of representative views by the pathologist, whereas the entire slide was used for the TCGA samples. The top quantitative features included texture features in the tumour nucleus and cytoplasm, and radial distribution of pixel intensity.

Image features predict stage I adenocarcinoma survival

We next investigated the prognostic values of our quantitative feature sets. Stage I adenocarcinoma patients are known to have diverse survival outcomes (Fig. 3a). In the TCGA cohort, more than 50% of the stage I adenocarcinoma patients died within 5 years after the initial diagnosis, whereas ∼15% of the patients survived for more than 10 years. A number of studies aimed to distinguish patients with different survival outcomes with additional visual patterns22,23. However, non-systematic errors may take place using these subjective assessments, and these visual evaluations are hard to standardize26,27,28. It is thus difficult for human evaluators to predict survival outcomes based purely on the H&E stained microscopic slides4,18. Although higher tumour grade is thought to be associated with poorer survival outcomes14, this association is weak in patients with stage I lung adenocarcinoma in both TCGA and TMA data sets (log-rank test P value>0.05; Fig. 3b).

Figure 3: Quantitative image features predicted the survival outcomes of stage I lung adenocarcinoma patients. (a) Kaplan–Meier curves of lung adenocarcinoma patients stratified by tumour stage. Patients with higher stages tended to have worse prognosis (log-rank test P value <0.001 in TCGA data set, log-rank test P=0.0068 in TMA data set). However, the survival outcomes varied widely. (left: TCGA data set, right: TMA data set). (b) Kaplan–Meier curves of stage I lung adenocarcinoma patients stratified by tumour grade. Tumour grade did not significantly correlate with survival (left: TCGA data set, log-rank test P value=0.06; right: TMA data set, log-rank test P value=0.0502). (c) Kaplan–Meier curves of stage I lung adenocarcinoma patients stratified using quantitative image features. Image features predicted the survival outcomes. Elastic net-Cox proportional hazards model categorized patients into two prognostic groups, with a statistically significant difference in their survival outcomes in the TCGA test set (log-rank test P value=0.0023). (d) The same classification workflow was validated in the TMA data set, with comparable prediction performance. (log-rank test P value=0.028). (e) Sample image of stage I adenocarcinoma with long survival. This patient suffered from stage IB, grade 3 lung adenocarcinoma, and survived more than 99 months after diagnosis. Our classifier correctly predicted the patient as a long survivor. (f) Sample image of stage I adenocarcinoma with short survival. This patient suffered from stage IB, grade 3 lung adenocarcinoma, and survived less than 12 months after diagnosis. Our classifier correctly predicted the patient as a short survivor. Full size image

With an aim to provide better prognostic prediction with the H&E slides, we investigated whether our quantitative features could predict survival in stage I patients. We built elastic net-Cox proportional hazards models46 to select the most informative quantitative image features and calculated survival indices derived from H&E stained microscopic pathology images (see Methods). Patients were categorized into longer-term or shorter-term survivors based on their survival indices. Our model successfully distinguished shorter-term survivors from longer-term survivors in the test set (log-rank test P value=0.0023; Fig. 3c). Among the 60 image features selected by our methods, the top features that facilitated classification of survival outcomes included texture of the nuclei, Zernike shape decomposition of the nuclei, and Zernike shape decomposition of the cytoplasm (Supplementary Data 1).

Our approach for survival prediction was validated with images from an independent data set (the Stanford TMA database). The same image processing workflow with elastic net-Cox proportional hazards model selected a similar set of features, which also successfully distinguished longer-term survivors from shorter-term survivors in the stage I adenocarcinoma cohort (log-rank test P value=0.028; Fig. 3d). The patients in different survival groups did not have significantly different treatments (χ2-test P value>0.9 for neoadjuvant chemotherapy, radiation therapy and targeted molecular therapy).

Figure 3e,f show some examples of histopathology images from stage I lung adenocarcinoma patients with the same pathology grade, but with different survival outcomes. The differences in tumour cell morphology between the two histopathology images were not easily identified by visual inspection, but could be distinguished based on our quantitative image features. These quantitative features proved to be useful in predicting survival outcomes of stage I adenocarcinoma patients.

Image features predict squamous cell carcinoma survival

Stage and grade alone only have limited predictive values in stratifying survival outcomes in patients with squamous cell carcinoma (log-rank test P value>0.2; Fig. 4a,b)19. To validate the generalizability of our survival prediction method to other lung cancers, we utilized similar informatics workflow incorporating image features and tumour stage to build prediction models in squamous cell carcinoma based on our quantitative image features. Our elastic net models selected 15 features and classified patients into different survival groups (log-rank test P value=0.023; Fig. 4c). Features most indicative of survival outcomes included Zernike shape in the tumour nuclei and cytoplasm (Supplementary Data 2).

Figure 4: Quantitative image features predicted the survival outcomes of lung squamous cell carcinoma patients. (a) Kaplan–Meier curves of lung squamous cell carcinoma patients stratified by tumour stage. Although patients with higher stages generally have worse outcomes, the trend was not statistically significant (left: TCGA data set, log-rank test P value=0.216; right: TMA data set, log-rank test P value=0.388). (b) Kaplan–Meier curves of stage I lung squamous cell carcinoma patients stratified by tumour grade. Tumour grade did not significantly correlate with survival. (left: TCGA data set, log-rank test P value=0.847; right: TMA data set, log-rank test P value=0.964). (c) Kaplan–Meier curves of lung squamous cell carcinoma patients stratified using quantitative image features. The image features predicted the survival outcomes. Elastic net-Cox proportional hazards model categorized patients into two prognostic groups, with a statistically significant difference in their survival in the TCGA test set (log-rank test P value=0.023). (d) The same classification workflow was validated in the TMA data set, with comparable prediction performance. (log-rank test P value=0.035). (e) Sample image of lung squamous cell carcinoma in a patient with long survival. This patient suffered from stage I, grade 1 lung squamous cell carcinoma, and survived more than 70 months after diagnosis. Our classifier correctly predicted the patient as a long survivor. (f) Sample image of squamous cell carcinoma in a patient with short survival. This patient suffered from stage I, grade 1 lung squamous cell carcinoma, and only survived 12.4 months after diagnosis. Our classifier correctly predicted the patient as a short survivor. Full size image

Our prognostic methodology for squamous cell carcinoma was also confirmed in the independent Stanford TMA cohort. Elastic net-Cox proportional hazards model successfully distinguished longer-term survivors from shorter-term survivors with lung squamous cell carcinoma (log-rank test P value=0.035; Fig. 4d). The patients in different survival groups did not have significantly different treatments (χ2-test P value>0.71 for neoadjuvant chemotherapy, radiation therapy and targeted molecular therapy). Similarly, Zernike shape, texture and radial distribution of intensity were among the top prediction features. Figure 4e,f shows examples of histopathology images from squamous cell carcinoma patients with the same pathology stage and grade, but with different survival outcomes. As with lung adenocarcinoma, the visual features associated with survival outcomes of lung squamous carcinoma were not well established24,25, but our methodology could quantify some of the pathology patterns predictive of patient survival.