Background

Based on international diagnostic guidelines, high-resolution CT plays a central part in the diagnosis of fibrotic lung disease. In the correct clinical context, when high-resolution CT appearances are those of usual interstitial pneumonia, a diagnosis of idiopathic pulmonary fibrosis can be made without surgical lung biopsy. We investigated the use of a deep learning algorithm for provision of automated classification of fibrotic lung disease on high-resolution CT according to criteria specified in two international diagnostic guideline statements: the 2011 American Thoracic Society (ATS)/European Respiratory Society (ERS)/Japanese Respiratory Society (JRS)/Latin American Thoracic Association (ALAT) guidelines for diagnosis and management of idiopathic pulmonary fibrosis and the Fleischner Society diagnostic criteria for idiopathic pulmonary fibrosis.

Methods

In this case-cohort study, for algorithm development and testing, a database of 1157 anonymised high-resolution CT scans showing evidence of diffuse fibrotic lung disease was generated from two institutions. We separated the scans into three non-overlapping cohorts (training set, n=929; validation set, n=89; and test set A, n=139) and classified them using 2011 ATS/ERS/JRS/ALAT idiopathic pulmonary fibrosis diagnostic guidelines. For each scan, the lungs were segmented and resampled to create a maximum of 500 unique four slice combinations, which we converted into image montages. The final training dataset consisted of 420 096 unique montages for algorithm training. We evaluated algorithm performance, reported as accuracy, prognostic accuracy, and weighted κ coefficient (κw) of interobserver agreement, on test set A and a cohort of 150 high-resolution CT scans (test set B) with fibrotic lung disease compared with the majority vote of 91 specialist thoracic radiologists drawn from multiple international thoracic imaging societies. We then reclassified high-resolution CT scans according to Fleischner Society diagnostic criteria for idiopathic pulmonary fibrosis. We retrained the algorithm using these criteria and evaluated its performance on 75 fibrotic lung disease specific high-resolution CT scans compared with four specialist thoracic radiologists using weighted κ coefficient of interobserver agreement.

Findings

The accuracy of the algorithm on test set A was 76·4%, with 92·7% of diagnoses within one category. The algorithm took 2·31 s to evaluate 150 four slice montages (each montage representing a single case from test set B). The median accuracy of the thoracic radiologists on test set B was 70·7% (IQR 65·3–74·7), and the accuracy of the algorithm was 73·3% (93·3% were within one category), outperforming 60 (66%) of 91 thoracic radiologists. Median interobserver agreement between each of the thoracic radiologists and the radiologist's majority opinion was good (κw=0·67 [IQR 0·58–0·72]). Interobserver agreement between the algorithm and the radiologist's majority opinion was good (κw=0·69), outperforming 56 (62%) of 91 thoracic radiologists. The algorithm provided equally prognostic discrimination between usual interstitial pneumonia and non-usual interstitial pneumonia diagnoses (hazard ratio 2·88, 95% CI 1·79–4·61, p<0·0001) compared with the majority opinion of the thoracic radiologists (2·74, 1·67–4·48, p<0·0001). For Fleischner Society high-resolution CT criteria for usual interstitial pneumonia, median interobserver agreement between the radiologists was moderate (κw=0·56 [IQR 0·55–0·58]), but was good between the algorithm and the radiologists (κw=0·64 [0·55–0·72]).

Interpretation

High-resolution CT evaluation by a deep learning algorithm might provide low-cost, reproducible, near-instantaneous classification of fibrotic lung disease with human-level accuracy. These methods could be of benefit to centres at which thoracic imaging expertise is scarce, as well as for stratification of patients in clinical trials.

Funding

None.