Some of the data described in this methods section were not included in the present paper (primarily due to a lack of sufficient time points for longitudinal analysis), but their descriptions are included for the sake of complete description of the shared data sets.

Participant

The participant (author R.A.P.) is a right-handed Caucasian male, aged 45 years at the onset of the study. He suffers from plaque psoriasis but is otherwise generally healthy. The participant has a history of anxiety disorder, but no other neuropsychiatric disorders. Before initiation of the pilot period, the participant received a physical examination with a full blood workup (comprehensive metabolic panel (CMP), complete blood count (CBC) thyroid stimulating hormone (TSH), erythrocyte sedimentation rate (ESR) and lipid panel) along with tests for HIV and hepatitis B and C. No significant findings were observed in these tests.

Ethical review

The present study was submitted to the University of Texas Office of Research Support, which determined that it did not meet the requirements for human subjects research as defined by the Common Rule (45 CFR 46) or FDA Regulations (21 CFR 50 and 56), and thus that Institutional Review Board (IRB) approval was not necessary. Subsequent data collection at Stanford University was performed under a similar determination, while data collection at Washington University was collected under an approved IRB protocol.

Study phases

A pilot period began on 25 September 2012. During the pilot phase, the initial protocol was tested for several weeks. Upon examination of the data and further discussion with a number of other researchers, several changes were made to the MRI acquisition during the week of 15 October 2012. The production phase began on 22 October 2012 (with scan session 13) and ended on 11 March 2014, with an extended hiatus from 6 March 2013 to 30 April 2013 (further discussed in the Audiometry section below). Additional data were collected on two later occasions; an extensive rsfMRI data set was collected at Washington University on 3 April 2015, and a diffusion MRI data set was collected at Stanford University on 15 May 2015.

Sample size

The initial sample target was to collect data for one calendar year, yielding roughly 100 samples. However, due to travel and the extended hiatus mentioned above, scanning was extended to obtain sufficient samples. Once the final target of 100 scanning sessions was reached, the final revised stopping criterion was to obtain 48 blood draws (since the RNA-sequencing was performed in batches of six samples), yielding a total of 104 scanning sessions. No time-series statistical analyses were performed before the determination of this criterion.

Subject blinding

To prevent knowledge of the results from affecting the subject, the subject was initially blinded to the results of any analyses of the repeated tests. He had access to the clinical blood work results, and also examined individual MRI scans for quality control purposes, but was not exposed to any analysis of temporal changes during the initial period of the study. After the hiatus in March 2013, this blinding was discontinued and the subject analysed the first set of data collected to that point. Initial a priori hypotheses were recorded before these analyses.

Imaging procedures

MRI was performed in a fixed schedule, subject to availability of the participant. Scans were performed at fixed times of the day; Mondays at 1700 hours, and Tuesdays and Thursdays were performed at 0730 hours. After the hiatus in March 2013, the Monday sessions were eliminated. Imaging was performed on a Siemens Skyra 3 T MRI scanner using a 32-channel head coil. Parameters described below are for the production phase. The imaging protocols included the following:

Anatomical MRI

T 1 - and T 2 -weighted anatomical images were acquired using a protocol patterned after the Human Connectome Project34. These data were collected for 15 Monday afternoon sessions (3 during the pilot phase, 10 during the production phase through 30 April 2013, and a 1-year follow-up collected on 4 November 2013 (sub090)). T 1 -weighted data were collected using an MP-RAGE sequence (sagittal, 256 slices, 0.8 mm isotropic resolution, echo time (TE)=2.14 ms, repetition time (TR)=2,400 ms, inversion time (TI)=1,000 ms, flip angle=8 degrees, GRAPPA factor=2, 7 min 40 s scan time). T 2 - weighted data were collected using a T2-SPACE sequence (sagittal, 256 slices, 0.8 mm isotropic resolution, TE=565 ms, TR=3,200 ms, GRAPPA factor=2, 8 min 24 s scan time).

Diffusion-weighted imaging

Diffusion-weighted imaging data were collected for 19 Tuesday morning sessions (15 during the production phase) through 30 April 2013. In each session, two diffusion-weighted imaging scans were acquired using a multi-band35 Stejskal–Tanner echo-planar imaging (EPI) sequence, with opposite (L>R and R>L) gradient readout directions. Two shells of 30 directions each were acquired (b=1,000 and 2,000 s mm−2), plus four low-b acquisitions interspersed among the 60 diffusion-weighted images (one every 15 frames). Parameters were b=1,000/2,000 s mm−2, 1.74 × 1.74 × 1.7 mm resolution, 72 axial slices, field of view (FOV)=223 mm, 128 × 128 matrix, TR=5,000 ms, TE=108 ms, GRAPPA factor=2, MB factor=3.

An additional session of HARDI data was collected at Stanford University on 15 May 2015. Data were acquired using 3 × slice acceleration with a blipped-CAIPI shift of FOV/3 and minimum TE (81 ms) using a 5/8 partial-k readout with homodyne reconstruction. Scanner calibrations for RF transmit power can interact with nonuniform RF power across the head (due to B1 inhomogeneity), resulting in excessive RF power and thus over-flipping in the centre of the head. To alleviate this problem, and to keep the peak B1 amplitude under the hardware limit, the prescribed excitation and refocusing flip angles were set to 77 deg and 160 deg, respectively36. Seventy-five diffusion-weighted directions were collected at two b-values (1,500 and 3,000 s mm−2), plus 10 acquisitions with a nominal b-value of 0; each direction was collected twice across runs with opposite (A–P and P–A) gradient readout directions.

Resting-state fMRI

RSfMRI was performed in 100 scans throughout the data collection period (89 in the production phase), using a multi-band EPI sequence (TR=1.16 ms, TE=30 ms, flip angle=63 degrees (the Ernst angle for grey matter), voxel size=2.4 × 2.4 × 2 mm, distance factor=20%, 68 slices, oriented 30 degrees back from AC/PC, 96 × 96 matrix, 230 mm FOV, MB factor=4, 10:00 scan length). Starting with session 27 (December 12 2012), the number of slices was changed to 64 because of an update to the multi-band sequence that increased the minimum TR beyond 1.16 for 68 slices. Acoustic noise cancellation for the resting-state scan was attempted in each session using the Optoacoustics active noise cancellation system, but in some runs the system failed to cancel the noise successfully (which was noted in the scan database). The first minute of acquisition was discarded before subsequent analyses to exclude fMRI responses evoked by the start of the scan and the noise-cancelling headphones. Five full production-phase sessions were also excluded from analysis due to low signal to noise ratio (SNR) that led to poor registration (identified by visual inspection), resulting in 84 sessions that were included in analysis.

An additional session of rsfMRI data was collected at Washington University on 3 April 2015 to assess cross-site scanner and sequence effects, as well as the effect of eyes open versus eyes closed on resting-state networks. The data were collected on a 3-T TIM TRIO system with a standard (non-multi-band) EPI sequence on a 12-channel coil, with TR=2.5 s and 4-mm isotropic voxels. This data set included ten 10-min runs of eyes-closed resting-state data and ten 10-min runs of eyes-open resting-state data (with fixation crosshair).

Field mapping

A dual-gradient echo-field map sequence was acquired with the same prescription as the functional images. In addition, spin echo field maps were collected with A–P and P—A phase encoding. Collection of field maps was discontinued as of 30 April 2013, resulting in 38 acquisitions.

Task fMRI: n-back

An n-back task was performed using a blocked design, with a factorial combination of memory load (1 versus 2 back) and stimulus type (faces, houses and Chinese characters) across blocks. Twenty percent of items were targets, and 20% were foils. The acquisition sequence was identical to that used for the rsfMRI scan (acquisition time=8 min). Weekly acquisition was performed 15 times across different sessions.

Task fMRI: motion/stop signal

A motion discrimination task with an embedded stop signal task was performed eight times across different sessions. On each trial, a moving dot stimulus was presented, with coherence of either upward or downward motion varying across trials (levels: 0, 10, 30 and 70% coherence). On 25% of trials, a visual stop signal (change of the fixation cross from white to red) was presented, at a delay controlled by a 1-up/1-down staircase to ensure 50% stopping accuracy37. The subject’s task was to perform the motion discrimination as quickly as possible, but withhold responses when the stop signal occurred. The MRI acquisition was identical to that used for the rsfMRI scan (acquisition time=7 min 11 s).

Task fMRI: object localizer

A multiple-object localizer (including both cropped and naturalistic faces, human bodies, human limbs, houses, places, cars, guitars, words and numbers) was performed eight times (twice each across four sessions); this task was kindly provided by K. Grill-Spector from Stanford University. Each stimulus class was presented in 4-s mini-blocks with items presented at 2 Hz (eight items per mini-block). In each run, 12 mini-blocks of each class were presented along 12 interspersed 4-s fixation blocks (acquisition time: 5 min 13 s). Half of the blocks included a single phase-scrambled image; the subject’s task was to press a button whenever a phase-scrambled item appeared.

Task fMRI: language localizer

A verbal working-memory localizer38 was performed five times across separate sessions. In each trial, a string of 12 words (either a sentence or a string of non-words) was presented sequentially (400 ms per word), followed by a 1-s probe item; the subject’s task was to decide whether the probe item matched any of the words in the preceding string.

Task fMRI: spatial working-memory localizer

A spatial working-memory localizer39 was performed four times across separate sessions. On each trial, a 4 × 2 spatial grid is presented, and locations in that grid are presented sequentially (1,000 ms per location), followed by a forced-choice probe between two grids, one of which contained all of the locations presented in the preceding series. In the easy condition, one location is presented on each presentation, whereas in the hard condition two locations are presented on each presentation. Twelve 32-s experimental blocks were interspersed with 4 16-s fixation blocks (acquisition time=7 min 28 s).

Task fMRI: retinotopic mapping

Polar angle (with reference to the vertical meridian, with the centre of fovea as the origin) was mapped using a flickering checkerboard wedge (45 degree) that rotated periodically in a counterclockwise direction through the visual field with a cycle duration of 20 s. As the wedge rotates, it creates a wave of activation throughout retinotopically organized visual areas, successively and systematically stimulating portions of each map. In this way, the entire visual field is represented by a time-dependent pattern of activity across space. In each fMRI run, the wedge completed 12 cycles of rotation (240 s total). The acquisition sequence was identical to that used for the rsfMRI scan (acquisition time=4 min).

Breath-holding fMRI

A breath-holding task40 was performed 18 times across different sessions. Visual cues instructed the subject to breathe in and out every 6.96 s (3.48 s inhale, 3.48 s exhale), followed by a 19.72-s breath hold; all events were time-locked to image acquisition. The MRI acquisition was identical to that used for the rsfMRI scan (acquisition time=6 min 8 s).

Other measurements

A set of questionnaires was administered in the morning and evening and after each scanning session (see Supplementary Note 1 for complete listing of questions). All survey data were collected using a custom survey created with the appsoma.com system, and saved to a CouchDB database server for later analysis. All scores were rectified so that values were consistent with the name of the variable.

Naked weight and estimated body fat were measured upon waking using the FitBit Aria scale. Weight remained within a relatively small range during the study (mean=68.0 kg, range: 65.8–69.9 kg), showing a slow rise over the first 10 months of the study and a decline afterwards.

Sleep was measured on most nights before scanning sessions using ZEO sleep monitor. Data were missing for a number of nights due to the sensor falling off or sensor failure.

Daily weather data (high- and low-temperature and precipitation) were obtained from NOAA (http://cdo.ncdc.noaa.gov/qclcd/QCLCD?prior=N&callsign=ATT).

Audiometry

During the pilot period, the participant noticed an increase in the level of his pre-existing tinnitus. Because of the potential for noise-induced hearing damage, baseline audiometry was performed at the UT Speech and Hearing Clinic on 16 October 2012 (within 8 h of a scanning session), 17 October 2012 and 19 October 2012 (both roughly 26 h after the previous scanning session). These exams showed moderate hearing loss (40–60 dB HL) in the high-frequency range (at or above 3 kHz), which was consistent across the two sessions. In addition, there was a slight (10–15 dB HL) loss in the lower frequencies, which was restored by 5 dB in the later tests, suggesting a potential short-term effect possibly due to seasonal allergies. Distortion product otoacoustic emission (DPOAE) testing on the first testing day showed consistent results, with impaired function at 3 and 4 kHz. Tympanometry and reflex threshold testing were performed at the first exam, and were normal. These findings are consistent with noise-related hearing loss that was almost certainly pre-existing at the onset of scanning (based on the subject’s history of environmental noise exposure and long-standing tinnitus), though the lack of previous audiometry makes this impossible to confirm. Audiometry was repeated regularly to assess potential further hearing damage. On 6 March 2013, audiometry showed increased hearing loss, increasing to 70 dB HL at 6,000 Hz. For this reason, imaging was temporarily stopped. The protocol was re-started on 30 April 2013 with a more limited set of measurements (that is, limited to functional acquisitions for which the active noise cancellation system could be employed). Follow-up testing was performed on 1 May 2013, which showed that the apparent loss was not sustained. Final testing after completion of the study showed no clinically significant changes from the initial tests.

Omics profiling

Blood was drawn every Tuesday around 0800 hours, following the MRI scan, by a phlebotomist at the UT Student Health Center Laboratory. The subject abstained from food and beverages (other than water) from 2000 hours the previous night until after the blood draw was completed. The standard blood draw included two 10 ml venous blood collection tubes (lavender K+/EDTA), which were transported immediately to the UT Genomic Sequencing and Analysis Facility for processing. In addition, a 5-ml lavender EDTA tube was collected once per month for CBC/differential analysis. On several occasions, additional blood was taken to perform comprehensive metabolic panels. A total of 48 samples were collected.

PBMCs were harvested by Ficoll gradient from whole blood. The PBMC population was then split into two equal fractions: one washed in PBS and frozen for future use and one immediately placed in lysis solution to eliminate nuclease activity for RNA isolation. Plasma was saved for future analysis.

RNA sequencing

RNA was extracted using the Qiagen RNeasy Mini Kit. RNA integrity number (RIN) was measured using an Agilent Bioanalyzer. RNA libraries were prepared for sequencing according to vendor protocols using NEBNext R Small RNA Library Prep Set for Illumina R (Multiplex Compatible), Cat #E7330L, according to the protocol described by Podnar et al.22 RNA was fragmented using elevated temperature in carefully controlled buffer conditions to yield average fragment sizes of 200 nucleotides. These fragments were directionally ligated to 5′ and 3′ adaptors so that sequence orientation is preserved throughout sequencing. Reverse transcription and PCR were performed to complete the DNA sequencing libraries, which were then sequenced on the HiSeq 2500.

Metabolomics

Metabolomic profiling was performed by the West Coast Metabolomics Center at UC Davis41. Thirty-microlitre aliquots of serum are extracted by 1 ml of degassed acetonitrile: isopropanol: water (3:3:2, v/v/v) at 20 °C, centrifuged and decanted with subsequent evaporation of the solvent to complete dryness. A clean-up step with acetonitrile/water (1:1) removes membrane lipids and triglycerides. The cleaned extract is aliquoted into two equal portions and the supernatant is dried down again. Internal standards C08-C30 FAMEs are added and the sample is derivatized by methoxyamine hydrochloride in pyridine and subsequently by N-methyl-N-trimethylsilyltrifluoroacetamide for trimethylsilylation of acidic protons. Data are acquired using the following chromatographic parameters, with more details to be found in ref. 42. Column: Restek Corporation rtx5Sil-MS (30 m length × 0.25 mm internal diameter with a 0.25-μm film made of 95% dimethly/5%diphenylpolysiloxane), mobile phase: helium, column temperature: 50–330 °C, flow rate: 1 ml min−1, injection volume: 0.5 μl, injection: 25 splitless time into a multi-baffled glass liner, injection temperature: 50 °C ramped to 250 °C by 12 °C s−1, gradient: 50 °C for 1 min, then ramped at 20 °C min−1 to 330 °C, held constant for 5 min. A Leco Pegasus IV mass spectrometer is used with unit mass resolution at 17 spectra per second from 80 to 500 Da at −70 eV ionization energy and 1,800 V detector voltage with a 230-°C transfer line and a 250-°C ion source. ChromaTOF (version 2.32) was used for data preprocessing without smoothing, 3 s peak width, baseline subtraction just above the noise level and automatic mass spectral deconvolution and peak detection at signal/noise levels of 5:1 throughout the chromatogram. Apex masses were submitted to the BinBase algorithm using the settings: validity of chromatogram (<10 peaks with intensity >107 counts per s), unbiased retention index marker detection (MS similarity >800, validity of intensity range for high-m/z marker ions), retention index calculation by 5th-order polynomial regression. Spectra are cut to 5% base peak abundance and matched to database entries from most to least abundant spectra using the following matching filters: retention index window±2,000 units (equivalent to about±2 s retention time), validation of unique ions and apex masses (unique ion must be included in apexing masses and present at >3% of base peak abundance), mass spectrum similarity must-fit criteria dependent on peak purity and signal/noise ratios and a final isomer filter. Peak height values were normalized across samples by dividing by the mean peak height across all metabolites. For the purposes of further analysis, only metabolites with known chemical names were used (106 named metabolites out of 258 total).

Genomics

Genotyping was performed commercially by 23andMe; it was originally performed on the V2 platform and then updated to the V3 platform, giving a total of 996,000 single-nucleotide polymorphisms. In addition, whole-exome sequencing was performed. At the first and third collection, 200 μl of whole blood were used for DNA isolation. Ten micrograms of genomic DNA were used to create next-generation sequencing libraries for exome sequencing. The standard Illumina DNA fragment library protocol was used to shear, end-repair, dA-tail and ligate sequencing adaptors.

These finished libraries were enriched using the Nimblegen SeqCap kit.

Data analysis

Analysis code is openly accessible at https://github.com/poldrack/myconnectome and http://www.nil.wustl.edu/labs/petersen/Resources.html.

A fully reproducible analysis workflow was developed for the present data set using virtual machine provisioning. Using this system, it is possible to replicate all of the reported statistical analyses on one’s own computer. The system installs a virtual machine and all necessary software, downloads the necessary data and runs the analyses. To use the virtual machine:

1 Install the necessary software dependencies: 2 Move to the directory where you want to house the project, and then clone the myconnectome vagrant set-up using the following command: git clone https://github.com/poldrack/myconnectome-vm.git 3 cd into the vm directory: cd myconnectome-vm 4 set up the vagrant VM (which may take a significant amount of time) using the command: vagrant up The final step will automatically start the analysis processes, which will take several hours to complete. Using a web browser on your local machine, you can view the analysis status and results at http://192.128.0.20:5000. A precomputed version of the same results can be viewed at http://results.myconnectome.org.

Time-series analyses

Time-series analyses relating the different types of data (behavioural/imaging/omics) were performed in R. Initial examination of the time-series data showed significant autocorrelation in many of the variables, so an autoregressive modelling approach was employed using the auto.arima function within the forecast package for R32, which automatically determines the appropriate autoregressive model order and tests for association between variables. Because of the uneven missingness across variables, each analysis was performed in a bivariate manner, such that correlations between variables were not accounted for in the analysis. Control for multiple tests was achieved in these analyses using the Benjamini–Hochberg FDR correction at P<0.1, with additional thresholding for absolute Pearson correlation >0.2 to exclude occasional significant results with small observed correlations.

Email analysis

Sent emails were obtained for the period of the study, and were preprocessed to extract the body text (excluding signatures and included messages) and to remove common proper names. All texts were scanned for words from the LIWC 2007 dictionary using RIOT Scan word counting software (version 1.8.2)43, available from http://riot.ryanb.cc. Three measures were used: positive words, negative words and the categorical dynamic index, which reflects the prevalence of terms reflecting categorical versus dynamic thinking44.

fMRI preprocessing

All fMRI data were preprocessed according to a pipeline developed at Washington University, St Louis45. Data were realigned to correct for head motion and normalized to a mode of 1,000 (one multiplicative constant applied to all voxels and all frames). Each session was registered to a single session that had previously been registered to the mean T 1 -weighted structural image and an atlas. The session-to-atlas transform was inverted and applied to the mean field map so that the distortion correction could be applied in each session’s space. The undistorted data were then re-registered to the atlas space. The transforms for head motion correction and affine registration to atlas space were combined with the field-map-based distortion correction to resample the data from the original session space to the undistorted 3-mm isotropic atlas space in a single step using FSL’s applywarp tool46.

Because field maps were not available for all sessions, a mean field map was generated from 34 available field maps (after exclusion of four poor-quality field maps based on visual inspection). Magnitude images were registered to each other, and transforms were resolved (by reconstructing the n−1 transforms between all images using the n*(n−1)/2 computed transform pairs47) and applied to generate a mean magnitude image. The mean magnitude image was registered to an atlas representative template, and transforms from magnitude image to atlas space were computed for each session by combining the session-to-mean and mean-to-atlas transforms. Phase images were then transformed using the composed transforms modified to eliminate intensity scaling (which was relevant only for the magnitude images), and a mean phase image in atlas space was computed.

Artefacts were reduced using frame censoring, regression (excluding censored frames) and spectral filtering45. Frames with framewise displacement >0.25 mm were censored, as well as uncensored segments of data lasting fewer than five contiguous volumes (frames kept: 97.1±3.7%). Nuisance regressors included whole brain, white matter and ventricular signals and their derivatives, in addition to 24 movement regressors derived by Volterra expression48. Interpolation over censored frames was computed by least-squares spectral estimation, so that continous data could be bandpass filtered (0.009<f<0.08 Hz).

fMRI surface mapping and parcellation

Surface generation and sampling of functional data to anatomical surfaces followed a similar procedure as described in Glasser et al.49 and Wig et al.50 First, following volumetric registration, anatomical surfaces were generated from the subject’s MP-RAGE image using FreeSurfer’s default recon-all processing pipeline (version 5.0). This pipeline included brain extraction, segmentation, generation of white matter and pial surfaces, inflation of the surfaces to a sphere, and surface shape-based spherical registration of the subject’s native surface to the fsaverage surface51,52,53,54. The fsaverage-registered left and right hemisphere surfaces were brought into register with each other using deformation maps from a landmark-based registration of left and right fsaverage surfaces to a hybrid left–right fsaverage surface (fs LR55) and resampled to a resolution of 164,000 vertices (164k fs LR) using Caret tools56. Finally, the subject’s 164k fs LR surface was downsampled to a 32,492 vertex surface (fs LR 32k), which allowed for analysis in a computationally tractable space while still oversampling the underlying resolution of BOLD data used in subsequent analyses. The various deformations from the native surfaces to the fs LR 32k surface were composed into a single deformation map allowing for one-step resampling. The above procedure results in a surface space that allows for quantitative analysis across subjects as well as between hemispheres. A script for this procedure is available on the Van Essen Lab website (http://brainvis.wustl.edu/wiki/index.php/Caret: Operations/Freesurfer_to_fs_LR).

Surface processing of the BOLD data proceeded through the following steps. First, the BOLD volumes are sampled to the subject’s individual native mid-thickness surface (generated as the average of the white and pial surfaces) using the ribbon-constrained sampling procedure available in Connectome Workbench (version 0.7). This procedure samples data from voxels within the grey matter ribbon (that is, between the white and pial surfaces) that lay in a cylinder orthogonal to the local mid-thickness surface weighted by the extent to which the voxel falls within the ribbon; it is designed to minimize partial-volume effects arising from the low sampling resolution of the BOLD data relative to the structural image acquisition57. Voxels with a time-series coefficient of variation 0.5 s.d. higher than the mean coefficient of variation of nearby voxels (within a 5-mm sigma Gaussian neighbourhood) are excluded from the volume to surface sampling, as described in ref. 49. This procedure is designed to avoid sampling highly variable voxels to the surface that likely contain large blood vessels that sit outside brain tissue. Once sampled to the native surface, time courses were deformed and resampled from the individual’s native surface to the 32k fs LR surface in a single step using the deformation map generated as described above. Finally, the time courses were smoothed along the 32k fs LR surface using a Gaussian smoothing kernel (s=2.55).

These surfaces are then combined with volumetric subcortical and cerebellar data into the CIFTI format using Connectome Workbench49, creating full brain time courses that exclude non-grey matter tissue. Subcortical (including accumbens, amygdala, caudate, hippocampus, pallidum, putamen and thalamus) and cerebellar voxels were selected based on the Freesurfer segmentation of the individual subject. Volumetric data was smoothed within this mask with a Gaussian kernel (s=2.55) before being combined with the surface data.

Individual subject parcellation

An individual subject parcellation was generated following the procedures described in detail in Gordon et al.6 and Wig et al.50, with minor modifications related to processing single-subject as opposed to group average data. In brief, for each hemisphere, whole-brain CIFTI-space correlation maps were computed at every surface vertex from the BOLD time courses concatenated across all sessions. For each vertex, spatial gradients of the similarity of resting-state correlation maps were computed along the cortical surface. Edges in the spatial gradients were identified by the watershed transform58 and averaged across all vertices to generate an ‘resting state functional connectivity (RSFC)-boundary map’, indicating the frequency with which a given vertex was identified as an edge. To produce discrete parcels, the watershed transform was applied again starting from all local minima. Parcels were merged together if they were considered insufficiently dissimilar based on the edge frequency value (below the 55th percentile) in the RSFC-boundary map. We then eliminated all parcels and portions of parcels in vertices with high boundary-map values (top quartile of values in the boundary map), and parcels containing fewer than 20 cortical vertices (∼40 mm2). This procedure generated a 616-region parcellation that forms the basis for many of the subsequent analyses. An additional 14 subcortical regions were obtained from the Freesurfer subcortical parcellation, giving a total of 630 regions for subsequent analysis.

Module assignment

The modular structure of the parcel-wise graph was derived using the Infomap algorithm8, following Power et al.59 The parcel-wise adjacency matrix was computed by cross-correlating the average time courses from each parcel concatenated across all sessions. Connections were removed if the geodesic distance between parcel centroids was <30 mm. Module assignment was computed at several correlation thresholds (ranging from 1 to 6% edge density, in steps of 1%). Modules with eight or fewer parcels were eliminated from consideration, and those parcels were considered unassigned. A ‘consensus’ assignment was derived by collapsing across thresholds, giving each node the assignment it has at the sparsest possible threshold at which it was successfully assigned. Small communities that were only present at a single threshold were removed.

Resting fMRI: network analyses

For each of the regions identified in the foregoing parcellation schemes, we estimated connectivity matrices using both full (Pearson) correlation and partial correlation; censored time points were excluded from computation of these correlation matrices. L1-regularized partial correlation was estimated using the graphical lasso implemented in the QUIC R package60; the regularization parameter was iteratively modified for each session to identify the value at which the network had a density of 7.5%, and mean across sessions was thresholded to obtain the specified level of network density. L2-regularized partial correlation was estimated for each session using the rags2ridges R package61 with lambda=0.0001. Graph–theoretic metrics of modularity, global efficiency and participation index were computed using the Brain Connectivity Toolbox http://www.brain-connectivity-toolbox.net/ using unthresholded (that is, signed and weighted) correlation matrices.

Diffusion-weighted imaging processing

Diffusion data were processed using the FSL Diffusion Toolbox. Using the pairs of images with opposite phase encoding, the susceptibility-induced off-resonance field was estimated using a method similar to that described in Andersson62 as implemented in FSL’s TOPUP tool, and the two images were combined into a single corrected one. Simultaneous correction of eddy-current effects and head motion was performed using the FSL EDDY tool. Diffusion parameters were estimated at each voxel using BEDPOSTX with a two-fibre model. Probabilistic tractography was performed using prob trackx2. The cortical and subcortical parcels were spatially transformed into the space of the diffusion data by registering the low-b image to the main anatomical image and then inverting the warp. Tracking was performed using each parcel as a seed voxel, with all other parcels specified as termination masks, white matter specified as a waypoint mask and the cerebrospinal fluid (CSF) mask specified as a rejection mask; the prob trackx2 distance correction was used. A total of 500,000 samples was performed from each seed. Tract counts were summarized and used to generate binarized adjacency matrices with a given tract density, for comparison with resting-state connectomes.

Whole-exome sequencing analysis

Alignment to the hg19 reference genome was performed using BWA-MEM (version 0.7.4)63. PCR duplicates were removed using the samtools dedup tool. The GATK IndelRealigner module was used to correct misalignments due to indels, and base quality scores were recalibrated using the GATK BaseRecalibrator tool. Variant calling was performed using the GATK Unified Genotyper and VariantRecalibrator tools.

RNA-seq analysis

Initial quality assurance was performed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Examination of RIN values showed five sessions with RIN below the commonly used threshold of RIN 6. However, further examination of expression profiles using clustering did not identify these sessions as outliers, and they were thus included in the analysis to maximize sample size, with RIN values included as a nuisance covariate.

Paired-end reads were mapped to the hg19 reference genome using bowtie2 (version 2.2.2) via tophat (version 2.0.11)64. Per-gene read counts were obtained using htseq-count (HTSeq package, version 0.5.4p5) (http://www-huber.embl.de/users/anders/HTSeq/doc/index.html), resulting in counts for 23,715 genes.

For subsequent analyses using general statistical analysis tools (which are not adapted to analysis of overdispersed count-valued data such as RNA-seq reads), we generated variance-stabilized versions of the read counts. Read counts were first normalized for library size using the estimateSizeFactors function (DeSeq package, version 1.14.0)65, and genes were then filtered for a mean read count across sessions of at least 4 and no more than 10,000, as well as filtering out small RNAs (using the VEGA database66), resulting in 13,847 genes passing filtering (removing 4,133 small RNAs, 5,508 below threshold and 221 above threshold). The relationship between mean and dispersion of read counts was estimated using estimateDispersions (DeSeq) and a variance-stabilizing transform was applied using varianceStabilizingTransformation (DeSeq). Subsequent visual examination of clustering across subjects and genes showed no visible evidence of outliers.

Gene network analysis

Gene networks were identified from the RNA-seq data using WGCNA23, which was applied to the variance-normalized count data, using the WGCNA R package67 (version 1.47). Before performing WGCNA, each gene was regressed against RIN values for each session along with the top three principal components estimated across all genes (to remove global effects due to technical variation across sessions), and the residuals were used for the WGCNA analysis. The power for soft thresholding was chosen as 8 based on the scale-free criterion. Correlations were estimated using a robust bicorrelation mid-weight estimator. The resulting networks were further functionally characterized using DAVID24. The set of genes associated with each module was submitted to DAVID for functional annotation. The default background set (comprising all human genes) was used. Two separate annotation analyses were used. One focused on curated biological pathways, using the following pathway databases: Reactome68, Panther69 and BioCarta (http://www.biocarta.com/). A second analysis used Gene Ontology biological processes and molecular functions; while this database includes a larger number of annotated gene associations, most of these are not manually curated or experimentally verified, and thus are more likely to reflect false positives70.

Phenome-wide network analyses

All of the different data types were combined using a phenome-wide network analysis approach. For this approach, the associations between each pair of variables (for example, all behavioural phenotypes versus all metabolites) were computed using the automated ARIMA model selection approach and thresholded at FDR q<0.05. To make the graph more interpretable, within-type connections were excluded for gene modules and metabolites; in addition, graph–theoretic functional connectivity measures were excluded because they are derivative of the other connectivity measures included in the analysis. Significant associations (either positive or negative) between variables were treated as edges in the graph. Clustering was performed using Infomap implemented in igraph (version 0.7.1), and visualization was performed using the Cytoscape software package71 (version 3.2.1).

GOBS data set

The GOBS study sample28 consists of Mexican-American individuals from large extended pedigrees sampled randomly from the San Antonio community. The sample analysed for the present study (which included all subjects for whom fMRI and gene expression data had passed quality assurance) included 591 invididuals (236 male, mean age 43 years, range 18–85). All experiments were performed with IRB approval from the University of Texas Health Science Center at San Antonio (UTHSCSA). All participants provided written informed consent on forms approved by the Institutional Review Boards at the UTHSCSA and Yale University.

Transcriptomic methods for the GOBS study were similar to those previously described in detail by Sprooten et al.29 Briefly, peripheral blood samples were obtained in the morning after an overnight fast during the MRI clinic visit of study participants, and lymphocytes were isolated from the fresh samples and subsequently frozen and stored. Genome-wide transcriptional profiles were generated using the Illumina HumanHT-12 v3.0 Expression BeadChip, which contains more than 47,000 unique probes in total, hybridization to which is assessed at 30 different beads on average.

Individual probes were mapped to genes, taking the probe with the highest mean expression level across subjects for each gene72. These gene-level expression measures were then combined using the gene clusters identified from the MyConnectome data set, as well as clusters identified by performing WGCNA directly on the GOBS data. In each case, the first principal component was computed across genes to obtain a cluster eigengene for each cluster and all subjects, after removing the top five principal components computed across all transcripts as well as the top 10 genotypic principal components to reduce effects of admixture. Heritability of expression was assessed using the SOLAR software package73 (http://solar.txbiomedgenetics.org/).