Background

Many infectious diseases have similar signs and symptoms, making it challenging for healthcare providers to identify the disease-causing agent. Clinical samples are often tested by multiple test methods to help reveal the microbe that is causing the infectious disease. The results of these test methods can help healthcare professionals determine the best treatment for patients. Today, High-Throughput Sequencing (HTS) or Next Generation Sequencing (NGS) technology has the capability, as a single test, to accomplish what might have required several different tests in the past.

NGS technology may allow the diagnosis of infections without prior knowledge of disease(s) cause. NGS technology can potentially reveal the presence of all microorganisms in a patient sample. Using infectious disease NGS (ID-NGS) technology, each microbial pathogen may be identified by its unique genomic fingerprint. The vision of ID-NGS technology is to further improve patient care by delivering diagnostics which can help identify the microbial makeup in patient samples quickly and accurately.

Challenge Data

A set of 15 mock clinical and 6 in silico metagenomics samples will be provided. Each challenge dataset is a mixture between a certain percentage of background (mock matrix) short reads and target (microbe) short reads. The test algorithm’s performance will be judged based on its estimation of the composition of the target short reads (microbial composition). The background reads ensure that the samples resemble clinically relevant samples.

21 metagenomics samples (15 mock, 6 in silico)

15 mock clinical samples Background at mock clinical relevant level Biothreat agent Near neighbor Coinfection Lab contaminant No template control (NTC) Positive control

6 in silico samples Background Biothreat Near neighbor Coinfection Lab contaminant



*Mix blinded, may not contain all constituents

Submission Format

Input

Hello World data set

Reads from 21 blinded metagenomics sequence files (fasta/fastq)

100,000 subsampled reads from each of the 21 metagenomics sequence files for the optional per read taxonomical origin challenge

FDA-ARGOS database Blinded regulatory-grade microbial genomes



Your pipeline

Run on precisionFDA using a wrapper

Download data and post results using template

Output

FDA-ARGOS genome species identification normalized confidence score [0,1] FDA-ARGOS genome species identification normalized quantity percentage [0,1] Per read FDA-ARGOS genome species identification normalized confidence score [0,1] (optional)

Evaluation

Participants are asked to submit the normalized confidence score (between 0 and 1) of identifying presence of each FDA-ARGOS species within the 21 metagenomics samples and their method for confidence score calculation.

FDA-ARGOS Genome Species ID Sample 1 Sample 2 ... Sample 21 Species 1 1 0.8 ... 0.8 Species N 0 0.3 ... 0.6

Participants are asked to determine the quantity of genetic material originating from each species within the provided FDA-ARGOS reference database and to submit the species normalized quantity percentage (between 0 and 1) within each sample.

FDA-ARGOS Genome Species ID Sample 1 Sample 2 ... Sample 21 Species 1 0.5 0.2 ... 0 Species N 0.5 0.8 ... 0

* It is possible that only a subset of short reads is being taxonomically classified, therefore, the final quantifications are going to be evaluated through root mean square deviation (RMSD) to the known quantities.

(OPTIONAL) The participants are asked to submit the read based normalized confidence score (between 0 and 1) of the presence of each FDA-ARGOS species within designated subsamples. 100,000 reads were subsampled from each of the 21 metagenomics samples and provided to participants.

Read Name FDA-ARGOs Genome Species ID Score Read 1 Species 1 0.9 Read N Species N 0.6

Team