Cell cultures

Six different human breast cell lines were used in the experiments: MDA-231, MCF-7, SKBR3 (kindly provided by Prof. Giannini G., Department of Molecular Medicine, “Sapienza” University of Rome, Rome, Italy), BT474, ZR75-1 and MCF-10A (kindly supplied by Dr. Falcioni R., Department of Experimental Oncology, Regina Elena National Cancer Institute, Rome, Italy).

The immortalised, non-transformed human mammary epithelial cell line MCF-10A, referred to as healthy control, was grown in DMEM/F12 medium (Sigma-Aldrich) supplemented with 5%fetal bovine serum, 20 ng/ml epidermal growth factor (EGF), 10 μg/ml insulin, 0.5 μg/ml hydrocortisone (Sigma-Aldrich), 100 units/ml penicillin and 100 μg/ml streptomycin (Sigma-Aldrich), as previously described52.

The five human breast cancer cell lines (canc1 to canc5) were derived from different breast cancer histotypes: MDA-231 (canc4), MCF-7 (canc5) and SKBR3 (canc1) cell lines from metastatic breast adenocarcinoma (MetAC), BT474 cells (canc2) from invasive ductal carcinoma (IDC) and ZR75-1 cells (canc3) from metastatic invasive ductal carcinoma (MetIDC) (see ATCC.org website). These cancer cells were grown in DMEM culture medium (DMEM high-glucose medium (Sigma-Aldrich) supplemented with 10% fetal bovine serum (Sigma-Aldrich), 100 units/ml penicillin and 100 μg/ml streptomycin (Sigma-Aldrich)). All cells were cultured under standard conditions at 37°C in humidified atmosphere containing 5% CO 2 .

Analysis of cell growth rate

In order to inoculate the media for the later VOC experiments with a number of cells that will result in a similar cell number for all cell lines after 24 h, the proliferation rate of healthy control and cancer cell lines was analysed beforehand by cell count over 96 h (Supplementary Fig. S4). Each cell line was seeded in triplicate in its specific culture medium at an initial concentration of 2.5 × 105 cells/flask (25 cm2) and cell growth was monitored after 24, 48, 72 and 96 h. Data from individual growth curves was used to calculate growth rate and doubling time (t d ) with http://www.doubling-time.com/compute.php. The proliferation of healthy control cells was also analysed in DMEM culture medium (as used in the later VOC analysis) to check for a possible alteration of growth rate: No significant changes (compared to their growth in the specific culture medium) were observed up to 96 h of culture (Supplementary Fig. S4c).

Sample preparation

For VOCs analysis, healthy control and cancer lines (canc1 to canc5) were seeded in triplicate in culture flasks (25 cm2) in 5 mL of their specific culture medium and were grown for 24 h. The number of plated cells was chosen based on the specific doubling time of every cell line, in order to obtain a comparable cell number at the end of the incubation of 24 h. After 24 h, the specific culture medium was removed and replaced with 5 mL of the DMEM culture medium. Cells were grown in these conditions for the next 96 h, up to a confluence of 50%-60% (around 1.5 × 106 cells/flask). After this incubation period, the DMEM culture medium was harvested, centrifuged at 1200 rpm for 5 min and collected in sterilised glass vials. Note that with this procedure, all samples derive from flasks with comparable cell density.

The medium control was obtained by incubating DMEM culture medium in the same conditions as the cell samples, but without seeded cells. Thus, medium control contains the same background odour as the cell samples, but without the influence of cells. Cell viability was evaluated by Trypan Blue exclusion test in order to assess the effect of any cell stress during the incubation time.

Animals

Drosophila melanogaster were kept at 25°C on a 12/12 light/dark cycle. Flies were reared on standard medium (100 ml contain: 0.7 g of agar, 2.4 g yeast, 2.1 g of sugar beet syrup, 7.1 g of cornmeal, 6.7 g of fructose, 1.4 ml of Nipagin (10%), 0.6 ml of propionic acid).

Flies were of genotype w; P[Orco:Gal4]; P[UAS:GCaMP3]attP40, expressing the Ca2+ reporter GCaMP342,37 in all Orco bearing cells (UAS-GCaMP3 flies were provided by Loren L. Looger, Howard Hughes Medical Institute, Janelia Farm Research Campus, Ashburn, Virginia, USA). 1-5 day old female flies were used for experiments.

Odorant preparation

The reference odorant 1-butanol was purchased from Sigma (Sigma-Aldrich, Steinheim, Germany; CAS: 71-36-3) in ≥99.5% purity and diluted in 5 ml mineral oil (Sigma-Aldrich, Steinheim, Germany; CAS: 8042-47-5) to a concentration of 10−4 vol/vol. All odours were prepared in 20 ml headspace vials, covered with nitrogen and sealed with a Teflon septum (Axel Semrau, Germany). Cancer samples were used in 1 ml aliquots. Nitrogen was directly taken from an injector connected to a gas bottle.

Calcium imaging

Calcium imaging was performed as described elsewhere44,53. In brief, we used a fluorescence microscope (BX50WI, Olympus, Tokyo, Japan) equipped with a 50× air lens (Olympus LM Plan FI 50×/0.5). A CCD camera (SensiCam, PCO, Kelheim, Germany) was mounted on the microscope recording with 8 × 8 pixel on-chip binning, which resulted in 80 × 60 pixel sized images. For each stimulus, recordings of 20 s at a rate of 4 Hz were performed using TILLvisION (TILL Photonics, Gräfelfing, Germany). A monochromator (Polychrome V, TILL Photonics, Gräfelfing, Germany) produced excitation light of 470 nm wavelength which was directed onto the antenna via a 500 nm low-pass filter and a 495 nm dichroic mirror. Emission light was filtered through a 505 nm high-pass emission filter.

Stimulus application

Odours were applied automatically using a computer-controlled autosampler (PAL, CTC Switzerland). 2 ml of headspace was injected in two 1 ml portions at time points 6 s and 9 s with an injection speed of 1 ml/s into a continuous flow of purified air flowing at 60 ml/min. The 1 s stimulus was directed onto the antenna of the fly via a Teflon® tube (inner diameter 1 mm, length 38 cm).

Experiments were performed double-blind. The seven test odours (healthy control, medium control, canc1, canc2, canc3, canc4, canc5) were measured in random order. Reference odours (1-butanol and N 2 ) were measured before and after a full block of test odours in order to ensure reproducibility and viability of the preparation (1-butanol) and to exclude contamination of the system (N 2 ). The autosampler syringe was flushed with purified air for 1 min after each injection and washed with pentane (Merck, Darmstadt, Germany) automatically after each application of 1-butanol.

GC-MS measurements

Vials containing samples were placed in a water bath equilibrated at 40°C. Headspace VOCs were preconcentrated onto a SPME fibre (50/30 μm divinylbenzene/carboxen/PDMS, SUPELCO, Bellefonte, PA, USA) manually exposed to sample vapours for 1 h. The fibre with extracted VOCs was transferred to the GCMS (GCMS-QP 2010 Shimadzu) and desorbed at 250°C for 3 minutes in the injection port of the GC. The instrument is equipped with an EQUITY-5 (poly(5% diphenyl/95% dimethyl siloxane) phase, SUPELCO, Bellefonte, PA, USA) capillary column, 30 m length × 0.25 mm I.D. × 0.25 μm thickness. The analysis was conducted in splitless mode using ultra-high purity helium as carrier gas. Carrier gas constant liner velocity was kept constant at 30.2 cm/min. The oven temperature was kept at 40°C for 5 min, then increased by 7°C/min up to 220°C and then at 15°C/min up to 300°C. The final temperature was held for 2 min (total run time: 39 min). The mass spectrometer was used in the full scan mode over a mass range of 40–450 m/z. The detector voltage was 0.7 kV. The temperature of interface and ion source was kept constant at 250°C. The GC area of the samples was calculated using the section GCMS post-run analysis of the GCMS solutions software (version 2.4, Shimadzu Corporation).

Unsupervised feature selection

Antenna imaging movies were processed in a KNIME (www.knime.org) workflow using the ImageBee plugin54 for insect neuroimage data (http://tech.knime.org/community/image-processing). For feature selection, all 11 movies recorded in one fly (reference and test odours) were concatenated, resulting in a movie matrix A with dimensions (m = 80 × 11 time points) × (n = 80 × 60 pixels). All images were aligned by cross correlation to correct for animal movement.

In principle, all n pixels could be used for distance computations, however at the cost of including many unresponsive or noisy pixels that can obscure odour distances. We thus selected c = 300 pixels (features) based on their contribution to the norm of A.

Before feature selection, data was preprocessed to ensure that the norm of A was not dominated by e.g. unspecific background fluorescence: 1) Background fluorescence was removed by subtracting the mean image separately for each of the 11 movies. 2) Photon shot noise was reduced by smoothing individual images with a Gaussian kernel (width 9). Then, we selected c column vectors (pixels) into the m × c matrix C such that the Frobenius norm error was minimised (where C+ is the pseudoinverse). The objective criterion was optimised with the convex cone algorithm55. Minimising the Frobenius norm error in this way corresponds to the norm error minimisation objective of PCA and it helps to identify a diverse set of pixels that contribute a lot to the variance in A. I.e., instead of selecting pixels by an equally-spaced grid, pixel selection was biased towards areas that actually responded to the odour stimuli.

Response spot time series

Unsupervised feature selection from the movie matrix A (with m time points and n pixels) resulted in a set of c = 300 pixel indices. Before extracting the pixels from (the movement-corrected, but otherwise unprocessed) A, dye bleaching and other global intensity changes were reduced by histogram normalisation: For each of the 11 movies in A, we took the first image as a reference and matched all histograms of the remaining images to the histogram of the first image, obtaining the processed movie . Then, the c pixels were extracted from . In order to reduce noise, a postprocessing step described in55 replaced a pixel p j by the average of those pixels that have a time series more similar to p j than to any of the other c − 1 pixels.

In summary, from each antenna recording we obtained a set of c representative response spots, i.e. pixel positions and the corresponding averaged and processed time series, in an unsupervised fashion. Response spots were normalised to the prestimulus interval, separately for each of the 11 stimuli, by computing (F i − F 0 )/F 0 for each time point i. Here, F 0 is the mean fluorescence during 20 time points before stimulus application and F i is the fluorescence value at time point i.

For each time series, we selected five time points from each of the two response peaks (marked in Fig. 3b), resulting in t = 10 time points for each of the s = 11 stimuli, i.e. 110 time points. Together, each fly contributed a c × (s × t) response profile matrix M which was then z-score normalised: From all i rows (response spots), the mean μ i was subtracted and rows were divided by the standard deviation σ i . Likewise, from all j columns (time points), the mean μ j was subtracted and columns were divided by the standard deviation σ j .

Clustering, distance matrices, PCA

For Fig. 3, clustering was performed on complete (unnormalised) response spot time series from a single fly. For Fig. 4, clustering of (normalised) response profiles was performed on the row-concatenated matrix (all a = 1, …, N flies pooled). In both cases, we used the k-means clustering algorithm (stats package for R, default settings, 1000 restarts). The number of clusters (15) was estimated based on a scree plot of the overall within-cluster sum of squares error.

For analysis of odour distances, the M a were reshaped as s × (t × c). Odour × odour (s × s) Euclidean distance matrices (Fig. 5e, f) were computed on the M a . For correlation analysis, we regarded only the 7 × 7 distance matrices for the 7 test odours, enabling us to state explicitly that distances between the relevant test odours are correlated. We correlated, for each time point, the individual distance matrices: Fig. 5e shows the mean of all (N * (N − 1))/2 pairwise correlations between the N flies over time. By correlation we refer to the Pearson product moment correlation coefficient. Only the lower diagonal submatrices (without the diagonal) of the distance matrices were correlated.