The following standard describes a way of arranging data (see Fig. 1) and specifying metadata for a subset of neuroimaging experiments. It follows a simple but carefully defined terminology. The filenames are formed with a series of key-values and end with a file type, where keys and file types are predefined and values are chosen by the user. Some aspects of the standard are mandatory. For example, each dataset needs to have at least one subject directory. Some aspects are regulated but optional. For example, T1-weighted scans do not need to be included, but when they are available they should be saved under a particular file name pattern specified in the standard. The standard provides data dictionaries and strict naming conventions for structural (T1w, T2w etc.), diffusion, and functional MRI data as well as accompanying behavioural and physiological data. In addition clear definitions of terms used in TSV and JSON files are provided together with links to DICOM, Cognitive Atlas24, and Cognitive Paradigm25 ontologies.

Figure 1: Illustration of a BIDS structured dataset. BIDS is a format for standardizing and describing outputs of neuroimaging experiments (left) in a way that is intuitive to understand and easy to use with existing analysis tools (right). Full size image

This standard aspires to describe a majority of datasets, but acknowledges that there will be cases that do not fit the present version (1.0.0) of BIDS. In such cases one can include additional files and subfolders to the existing folder structure following a set of general naming guidelines and common sense. For example, one may want to include eye tracking data (BIDS does not cover this type of data yet). A sensible place to put it is next to the continuous recording file with the same naming scheme but different extensions. To make sure such additions are not accidental the provided validator raises a warning for all files that do not fit the specification. The solutions will vary from case to case and publicly available datasets will be periodically reviewed to include common data types in the future releases of the BIDS specification.

Raw versus derived data

BIDS in its current form is designed to standardize (convert to a common file format) and describe raw data. During analysis, such data will be processed and intermediate as well as final results will be saved. Derivatives of the raw data should be kept separate from the raw data. This clearly separates raw from processed data, makes sharing of raw data easier, and prevents accidental changes to the raw data. Even though BIDS specification currently does not contain a particular naming scheme for different data derivatives (correlation maps, brain masks, contrasts maps, etc.) we recommend keeping them in a separate ‘derivatives’ folder with a similar folder structure as presented below for the raw data. For example: derivatives/sub-01/ses-pre/mask.nii.gz. In the future releases of BIDS we plan to provide more detailed recommendations on how to organize and describe various data derivatives.

The inheritance principle

Any metadata file (e.g., files ending with: .json, .bvec, _events.tsv, _physio.tsv.gz, and _stim.tsv) may be defined at one of four levels (in hierarchical order): MRI acquisition, session, subject, or dataset. Values from the top level are inherited by all lower levels unless they are overridden by a file at the lower level. For example, /task-nback_bold.json may be specified at the dataset level to set Time of Repetition (TR) for all subjects, sessions and runs. If one of the runs has a different TR than the one specified in the dataset level file, a /sub-<subject_id>/sub-<subject_id>_task-nback_bold.json file can be used to specify the TR for that specific run.

File formats

Imaging files

Since BIDS is aimed at facilitating data sharing as well as analysis the file format for storing imaging data was selected based on support from various neuroimaging data analysis packages. We have chosen the NIfTI file format because it is the largest common denominator across neuroimaging software. However, since it offers limited support for the various image acquisition parameters available in DICOM or other scanner specific files, the BIDS standard requires users to provide additional meta information in a sidecar JSON file (with the same filename as the .nii.gz file, but with a .json extension—see section ’Key/value files’ for more information). BIDS standard specifies a carefully selected set of fields together with their definitions which extends the standard DICOM ontology with terms that are crucial for data analysis such as the polarity of phase encoding direction or slice timing (which traditionally have been recorded in inconsistent ways across scanner manufacturers and are not part of the DICOM ontology). In addition to terms specified in BIDS we encourage users to include other information extracted from DICOMs (including private manufacturer fields) during the conversion process so no metadata would be lost. Extraction of a minimal set of BIDS compatible metadata can be performed using dcm2niix (https://www.nitrc.org/projects/dcm2nii/) and dicm2nii (http://www.mathworks.com/matlabcentral/fileexchange/42997) DICOM to NIfTI converters. A provided validator (https://github.com/INCF/bids-validator) will check completeness of provided metadata and look for conflicts between the JSON file and the data recorded in the NIfTI header.

Tabular files

Meta-data most naturally stored as an array are stored in tab-delimited value (TSV) files, similar to comma-separated value (CSV) files where commas are replaced by tabs. A header line is generally required naming each column and, depending on the use, some specific variable names are required (see the full specification for details). String values containing tabs should be escaped using double quotes.

Missing values should be coded as ‘n/a’.

Key/value files (dictionaries)

JSON files will be used for storing key/value pairs, with the key names following a fixed dictionary in the specification. Extensive documentation of the JSON format can be found at http://json.org. Several editors have built-in support for JSON syntax highlighting that aids manual creation and editing of such files. An online editor for JSON with built-in validation is available at http://jsoneditoronline.org. JSON files need to be encoded in ASCII or UTF-8. The order of keys is arbitrary and should does not convey any meaning.

Required, recommended and optional metadata

To maximize adoption and flexibility of the BIDS standard only a small subset of metadata fields and files is required (compulsory). The decision of which metadata fields and files are required was based on the minimal metadata needed to perform standard basic analyses on each type of data. For anatomical scans, only specifying the type (T1 weighted, T2 weighted, T1 map etc. see Section 8.3 in Supplementary File 1) is required. For functional scans (fMRI), the researcher is required to specify a task name (which could be ‘rest’ in so-called resting-state scans), repetition time (in seconds) and timing and duration of all events (stimuli and/or responses, unless the subject was not performing any task; for more details see Section 8.4 in Supplementary File 1). For diffusion weighted imaging the required metadata is limited to b-values (in the form of.bval files) and diffusion gradient tables (in the form of.bvec files; for more information see Section 8.8 in Supplementary File 1). Different types of fieldmaps also include a set of corresponding required fields (see Section 8.9 in Supplementary File 1). Similarly when including physiological (breathing or cardiac) or other continuous recordings the researcher is required to specify a start time (relative to the beginning of image acquisition) and sampling frequency (for more details see Section 8.6 in Supplementary File 1). When a required file or metadata field is missing the BIDS Validator will report an error.

In addition to those mandatory pieces of metadata, the BIDS standard strongly recommends inclusion of other metadata that are crucial for performing some additional types of analyses. Those include, but are not limited to, slice timing (necessary for slice timing correction), phase encoding direction, effective echo spacing, and echo time (required for performing field unwarping). When a recommended piece of metadata is missing, the BIDS Validator will report a warning.

Finally, the BIDS specification also defines a large set of metadata fields that are optional. Those include information that is not crucial for any particular data analysis method, but can be useful when trying to understand the nature of the data or combining data from multiple sources. Those fields include, but are not limited to scanner manufacturer, scanner software version, head coil name, instructions given before the task, multiband acceleration factor, etc. In addition the researcher can extend the metadata dictionaries with their own keys (as long as they do not collide with those already defined in BIDS specification) to include additional information.

Creating a BIDS compatible dataset

The process of creating a BIDS compatible dataset can be split into several steps. In the following section we will present this procedure using a dataset acquired at UCLA by Jessica Cohen as a part of her Ph.D. research28. This dataset includes anatomical, diffusion and task fMRI data and is available (in BIDS format) in OpenfMRI repository under the accession number ds000009 v2.0.1 (Data Citation 1).

Step 1: Convert DICOM files to NIfTI

This dataset has been acquired using an MRI scanner that outputs DICOM files (Siemens Trio) so we can use a DICOM to NIfTI converter such as dcm2niix. This particular converter supports BIDS−it normalizes idiosyncrasies of different scanner manufacturers that are not standardized by DICOM, and outputs a BIDS compatible JSON with most of the required and recommended metadata (such as repetition time, slice timing, and phase encoding direction).

Step 2: Create folder structure, rename and copy NIfTI files

BIDS relies heavily on a particular folder structure and naming scheme of files. We begin creating the folder structure by creating one subfolder for each of the 29 subjects named ‘sub-01’, ‘sub-02’, ‘sub-03’, etc. Inside each of the subject subfolders we create three subfolders: ‘anat’ (for anatomical scans), ‘dwi’ (for diffusion scans), and ‘func’ (for task fMRI). Those names are not arbitrary and must follow the BIDS specification (see Supplementary File 1). This dataset includes two anatomical scans per subject: high-resolution T1 weighted and in-plane T2 weighted. They need to be renamed to ‘sub-01_T1w.nii.gz’ and ‘sub-01_inplaneT2.nii.gz’ (respectively) and moved to the ‘anat’ subfolder. This operation has to be repeated for all subjects. Along the .nii.gz files .json files (with the same body of the file name) should be also moved.

Similarly, we move the diffusion files into the ‘dwi’ folder. The naming scheme is analogous ‘sub-01_dwi.nii.gz’. In addition to .json files we also move the.bvec and .bval files containing gradient information produced by dcm2niix.

Finally, we follow suit with the task fMRI files. This dataset includes four different tasks with the following names: stop-signal, Balloon analog risk task (BART), discounting, and emotion regulation, which we label as ‘stopsignal’, ‘bart’, ‘discounting’ and ‘emotionregulation’ correspondingly. The naming scheme for functional is ‘sub-01_task-stopsignal_bold.nii.gz’ (where ‘01’ is replaced by corresponding subject label for the other subjects and ‘stopsignal’ is replaced by corresponding task label for the other tasks).

Step 3: Add remaining data

In addition to imaging data and metadata, we also need to provide details of the experimental paradigm for the task fMRI data. This is done by creating a tab-delimited text file following the naming scheme of ‘sub-01_task-stopsignal_events.tsv’ for each of the.nii.gz files. These files includes two compulsory columns: ‘onset’ and ‘duration’ (both in seconds) and any number of other arbitrary named columns to categorize and describe events (both stimuli and responses) recorded during the experiments. In the case of this task, we will add columns describing reaction time (in seconds), trial type (âgoâ or âstopâ), subject response, response correctness, and trial outcome.

On top of the experimental paradigm information we also have some demographic information about the participants of the study such as age and sex. This data should be saved in a text file called ‘participants.tsv’ in the root of the dataset directory. This file has one compulsory column: ‘participant_id’ (for example ‘sub-01’, ‘sub-02’) and can include any number of other arbitrarily named columns describing participants. Optionally a ‘participants.json’ file can be provided with description of each column and links to external ontologies (see Section 4.2 in Supplementary File 1).

Step 4: Add missing metadata

All of the metadata in.json files were so far obtained using the dcm2niix converter. In addition to these, we need to provide the name of each fMRI task. Optionally, we can add information about task instructions and description, as well as link the tasks to an external ontology such as Cognitive Atlas or Cognitive Paradigm Ontology. Metadata organization can also be simplified using the inheritance rule: Metadata fields common across all subjects can be specified in one JSON file in the root of the directory instead of being repeated for each subject (see Section 3.5 in Supplementary File 1 for details). Finally we need to create a dataset_description.json file with fields that include the name and description of the dataset as well as the version of BIDS standard used. This file can also be used to list authors and ways to reference the dataset (see Section 8.1 in Supplementary File 1).

Step 5: Validate the dataset

Once the dataset is assembled, the BIDS Validator can be used to check if any of the required or recommended metadata are missing. In addition the validator has built in heuristics to spot incorrect definitions of missing values (for example ‘NA’ instead of ‘n/a’), use of wrong units (milliseconds instead of seconds), missing scans and inconsistent scanning parameters across subjects. The validator works in the Chrome web browser with no need to install additional software, and performs the validation on the client side (i.e., no data are uploaded or shared) so it is suitable for sensitive datasets that are not intended for public sharing.

Any BIDS compatible dataset can be readily fed into MRIQC or QAP toolboxes (see Adoption) that calculate quality measures. Thanks to formal structure of BIDS no additional metadata are required as an input. Outputs of those quality analyses can be included along with the dataset (see Section 3.4 in Supplementary File 1).

Adoption

Despite its relatively young age BIDS has been already adopted by the OpenfMRI repository16. Since the switch to the new standard in December 2015, thirteen new BIDS compatible datasets have been published. In addition several software packages added support for BIDS: SciTran (database)29, Quality Assurance Protocol (QA toolbox—https://github.com/preprocessed-connectomes-project/quality-assessment-protocol), MRIQC (QA toolbox—https://github.com/poldracklab/mriqc), and automatic analysis (workflow toolbox)30 have added BIDS support.

In addition, a number of tools have been developed to help working with BIDS datasets. Those include: bids-validator (a validation tool—https://github.com/INCF/bids-validator), openfmri2bids (OpenfMRI convention to BIDS converter—https://github.com/INCF/openfmri2bids), BIDSto3col (FSL modelling helper tool—https://github.com/INCF/bidsutils/tree/master/BIDSto3col).