An artificially intelligent nanoarray that is based on chemiresistive layers of molecularly modified gold nanoparticles and random network of single-wall carbon nanotubes was designed and fabricated (detailed information can be found in the Experimental Methods ). The inorganic nanomaterial-based species in these sensors (, gold nanoparticles or single-wall carbon nanotubes) provide the electrical conductivity, whereas the organic layer functions as a sensing layer (recognition element) for adsorbed VOCs. (7, 36, 58) The sensing mechanism of these sensors can be affected by one or a combination of the following mechanisms. Sorption of VOCs into the organic film affects electron tunneling by reversible swelling or aggregation of the layers, which increases or decreases interparticle distance, leading, respectively, to an increase or decrease in the electrical resistance of the film. (6, 36, 58) Another possible sensing mechanism that does not have steric changes within the sensing layer is charge transfer from or to the inorganic nanomaterial that develops on exposure to VOCs. (6, 36, 58) In fact, the dielectric constant of the organic layer might change significantly when that of sorbed vapor differs significantly. (6, 36, 58) The permittivity of the organic matrix surrounding the metal cores increases due to the higher dielectric constant of the sorbed vapor than that of the organic layer (, water, methanol). In these cases, tunneling activation energy decreases, leading to a decrease in the electrical resistance of the sensing film. Hence, sorbed vapor with a lower dielectric constant results in an increase in resistivity (, toluene,-hexane). (6, 36, 58) The chemical diversity of both conductive inorganic nanomaterials and organic layers results in the sensors responding differently to breath VOCs, which creates unique fingerprints in resistance changes. Selection of the conductive inorganic nanomaterials and organic layers can also be accurately tailored to a desired sensing application.

In conjunction with artificial intelligence methods, the nanoarray was used for a meta-analysis of several groups of subjects under real-world circumstances, each manifesting a specific health condition ( Figure 1 ). This analysis was carried out on breath samples collected in a controlled manner from 1404 eligible subjects collected between January 2011 and June 2014 from 14 departments in nine clinical centers in five different countries (Israel, France, USA, Latvia, and China). Of the subjects, 813 were patients diagnosed with one of the following 17 diseases: chronic kidney failure (CKD), idiopathic Parkinson’s disease (IPD), atypical Parkinsonism (PDISM), multiple sclerosis (MS), Crohn’s disease (CD), ulcerative colitis (UC), irritable bowel syndrome (IBS), pulmonary arterial hypertension (PAH), pre-eclampsia in pregnant women (PET), head and neck cancer (HNC), lung cancer (LC), colorectal cancer (CRC), bladder cancer (BC), kidney cancer (KC), prostate cancer (PC), gastric cancer (GC), and ovarian cancer (OC). Some of these diseases are not clinically correlated (, pre-eclampsia and Parkinson’s disease), and therefore, they can serve as a model for evaluating the performance of the artificially intelligent nanoarray in disease diagnosis without disturbances of clinically confounding factors. The other diseases do possess clinical correlation between each other (, lung cancer and pulmonary artery hypertension; colorectal cancer and Crohn’s disease), and therefore, they can serve as a model for evaluating performance in disease diagnosis with practical clinical interruptions/effects of clinically confounding factors. Most of the diseases can be categorized into cancerous diseases, inflammatory diseases, neurological diseases, and independent diseases. Therefore, inter- and/or intra-comparison between these groups can evaluate the clinical classification ability of this nanoarray. The mean age of the patient groups was 55 ± 10 years; 423 (52%) of the patients population were male, and 296 (36%) were active smokers. Breath samples were also collected from each of 591 control subjects concurrently enrolled with the patients at each site. The mean age of the control population was 52 ± 8 years; 257 (43%) were males, and 134 (23%) were active smokers. The demographic characteristics of all tested patients and controls in the current study are reviewed in Table 1 . All samples were collected according to the Helsinki ethics protocol issued for the study at each of the collaborating institutes and after signed consent of each subject. Detailed information on the clinical design and inclusion and exclusion criteria can be found in the Experimental Methods and Supporting Information (SI), section 1.1, including Tables S1-S12. Detailed information on breath collection and analysis are also in the Experimental Methods

Breath Analysis with the Artificially Intelligent Nanoarray

i.e., lower signals were obtained from older smokers). Three of the 59 sensing features were correlated with gender. None of the sensing features were correlated with geographical location. Using linear correlations, the raw data were stratified and adjusted, and a second regression models were applied to ensure the correction was effective. For detailed statistical analysis, refer to the During exposure to breath samples, interaction between the VOCs and the organic sensing layer changes the electrical resistance of the sensors. This change recovers to baseline resistance almost immediately at the end of exposure. At this stage, a major caution was taken into account to assure the stability and lack (or minimal) drift of the sensors and/or sensing features over the entire period of study. The outcome indicated that the sensors were highly stable, with negligible drifts and/or fluctuations being seen during the study (see SI , section 2.1 and Figures S1–S3). This finding negates the possibility of discrimination between different diseases because of drift issues. From each sensing response, four (numerical) sensing features (SFs) were read out: the relative change of sensors’ resistance at the peak (beginning), middle, and end of the exposure and the area under the curve of the whole measurement. In total, 59 eligible and stable SFs (SF-01–SF-59) were used for the statistical analysis ( Figure 2 ). For details regarding each of the read sensing features, see SI , Table S13. Figure 2 shows that some sensors were more sensitive to the differences between the VOC patterns of the different disease populations. For example, SF-43 ( Figure 2 , black arrow, and SI , Figure S4), in which the sensor proved highly discriminative between head and neck cancer and other cancerous disease, whereas discrimination between inflammatory bowel disease and other internal (noncancerous) diseases was not (see SI , Figure S4). Although other sensors were less sensitive, this was indicated by a wide overlap in their responses to the breath samples from different diseases, such as SF-29 ( Figure 2 , blue arrow). At this stage, a major caution was taken into account prior to statistical analysis. The caution was the exploration of multiple linear regression models for examining and stratifying the effects of possible confounding factors: sex, age, smoking status, and location of sampling site. This analysis showed that, of the 59 studied sensing features, 39 were correlated with age and/or smoking, most of them being negative (, lower signals were obtained from older smokers). Three of the 59 sensing features were correlated with gender. None of the sensing features were correlated with geographical location. Using linear correlations, the raw data were stratified and adjusted, and a second regression models were applied to ensure the correction was effective. For detailed statistical analysis, refer to the Experimental Methods

Figure 2 Figure 2. Heat map of 59 stable sensing features, extracted from 20 different nanomaterial-based sensors on the artificially intelligent nanoarray. Each raw datum in the heat map represents the mean responses for each of the 17 diseases tested in this way. Some sensing features (SFs) were more sensitive than others to differences in the breath VOCs. No individual sensing feature was sufficiently informative to discriminate among all the diseases, but the overall response patterns had discriminative potential (columns in the heat map). For details regarding each of the measured sensing features, see SI, Table S13.

n = 11). The accuracy of the blind analysis of each model was calculated as the total number of samples correctly classified over the total number of independent set samples (n = 14). In some cases, the analysis was of low accuracy in discriminating between two groups (e.g., 64% in a comparison of gastric vs bladder cancer). In other cases, 100% accuracy was found in 13 different comparisons (e.g., lung vs head and neck cancer). The average accuracy of all 120 classifiers was 86%. vs CC and 86% for TOX vs OC, yielding an overall average accuracy for this analysis of 58% ( To semiquantify the differences seen in the columns of Figure 2 , combinations of sensitive sensors were used to create a series of discriminant factor analysis (DFA) binary classifiers (see Experimental Methods ) to obtain disease breath signatures that allow the different diseases to be distinguished. To ensure valid results free from artifacts or overfitting, we have divided the data set of each analysis as a training and validation set; 77% of each group was selected randomly for the training set, and 23% of each group were omitted as blind samples. The DFA classifier consisted of 120 binary models, each discriminating a pair of the diseases. Thirty randomly chosen samples from each group were used, for this analysis, to ensure uniform sample size. For each binary classifier, 46 samples were used as a training set (23 samples per each of the compared diseases) and 14 randomly chosen independent samples (7 samples per group) were classified in a blind manner (prostate cancer samples were excluded from this specific analysis, due to the small sample size;= 11). The accuracy of the blind analysis of each model was calculated as the total number of samples correctly classified over the total number of independent set samples (= 14). In some cases, the analysis was of low accuracy in discriminating between two groups (, 64% in a comparison of gastricbladder cancer). In other cases, 100% accuracy was found in 13 different comparisons (, lunghead and neck cancer). The average accuracy of all 120 classifiers was 86%. Figure 3 presents the discriminative power of the nanoarray in terms of accuracy scored in the blind analysis; for the exact sensitivity, specificity, and accuracy of each comparison, please refer to the SI , Table S14. To test whether the discrimination achieved between the different groups was influenced by any bias, possibly caused by the confounding factors geography and/or methodology, we applied the exact obtained classifiers that successfully discriminated among the diseases to the corresponding control groups, collected at the same sites under the same conditions and environment. This last analysis resulted in accuracies between 29% for PAHCC and 86% for TOX vs OC, yielding an overall average accuracy for this analysis of 58% ( Figure 3 ). (For the exact sensitivity, specificity and accuracy of each comparison, See SI , Table S15). In some cases, two or more diseases shared the same control group, as in (1) Crohn’s disease, ulcerative colitis, and irritable bowel syndrome; (2) kidney and bladder cancer; and (3) idiopathic and atypical Parkinsonism. Therefore, the last analysis was not applicable in these cases ( Figure 3 , hatched boxes). In contrast to the high accuracy achieved among diseases (86%), the classification of the control samples resulted in random results with a total accuracy of 58%, ruling out the possibility of coincidence. In certain comparisons, the results were higher than the arbitrary classification of the control subjects. Overall, these findings emphasize that the differences in the VOC composition during disease are much more stressed and are more significant than the minor intra-individual differences found among the heterogeneous control groups.

Figure 3 Figure 3. Graphical presentation of the accuracy of the binary DFA classifiers. Each box represents the accuracy achieved in a blind validation of each pair of subject groups. The left heat map gives the results of comparisons between groups of patients, whereas the graph on the right gives the results of the same classifiers applied to the corresponding control groups. The average accuracy was 86% for all disease classifiers (left graph) and 58% for the corresponding control groups (right graph). The letter “C” beside each disease named in the right figure means the “control” group relates to that specific disease.

e.g., no resemblance could be seen between multiple sclerosis and Parkinsonian groups; both patients and their corresponding control groups), although both were enrolled and tested in the same department (Carmel Medical Center, Haifa, Israel). In addition, there is no evidence that the samples were clustered due to resemblance in features of sex and/or smoking habits; for example, in the case of pre-eclampsia and ovarian cancer, both groups included only nonsmoking females (see in vitro and ex vivo studies are required to support this conclusion. To explore similarities and/or differences among the breath VOCs associated with each disease, hierarchal clustering analysis was conducted. In this analysis, the responses of the sensors were clustered and regrouped according to similarities and/or differences in the collective pattern of the VOCs. Each clustering step represents greater similarities between the profiles, suggesting considerable resemblance among the samples (subjects) of a specific cluster (see Experimental Methods for more details). Two important inferences emerged from the results. The first is that the data were not clustered according to possible confounding factors, such as sampling location, racial, and/or ethnic factors (, no resemblance could be seen between multiple sclerosis and Parkinsonian groups; both patients and their corresponding control groups), although both were enrolled and tested in the same department (Carmel Medical Center, Haifa, Israel). In addition, there is no evidence that the samples were clustered due to resemblance in features of sex and/or smoking habits; for example, in the case of pre-eclampsia and ovarian cancer, both groups included only nonsmoking females (see Figure 4 and SI , Figure S5). Second, there was a strong resemblance between subgroups with common pathophysiologies; for example, a high similarity was found among most of the cancerous disease, as also among diseases associated with increased inflammatory activity (Crohn’s disease, ulcerative colitis, and pre-eclampsia), whereas the Parkinsonian-related cases (idiopathic and atypical Parkinsonism) were subgrouped together ( Figure 4 ). In a parallel complementary analysis, hierarchal clustering analysis for the corresponding control groups was carried out. The results indicated greater similarities and more homogeneous sensing responses compared with the disease clustering. Figure S5 of the SI show that the interclustering distances are shorter than the ones in the disease group clustering. These results support the hypothesis that similarities in pathophysiological processes are expressed in quite similar breath patterns. However, furtherandstudies are required to support this conclusion.