Due to the high homology between the Ifna family genes, accurately detecting quantitative expression of the different gene members by Sanger sequencing or next generation sequencing is difficult. We instead employed nanopore sequencing, which allowed us to acquire full-length reads from each individual sequence that were amplified by the PCR reaction. We aimed to determine the relative quantities of the various Ifna family and Ifnb transcripts, in helminth-treated mouse ear tissue using the MinION; therefore enabling both the differentiation between the various Ifna genes, and the potential to perform quantitative analysis.

Using R9.4 flow cells we have evaluated the MinION technology for the amplicon sequencing of highly similar genes. Since we have an interest in the interferon response during helminth infection 16 , we sequenced the type I interferon (IFN) family. Type I IFNs are a family of intronless antiviral response genes comprised, in mice, of 14 highly homologous Ifna members, as well as the genes Ifnb , Ifnk and Ifne 17 . In humans, sequence similarity across the 14 members of the Ifna genes is 70–80%, with a further 35% sequence similarity between Ifna and Ifnb . Type I IFN has both an important role in innate antiviral immunity and in mounting adaptive T helper cell responses 16 , 18 . Building on previous observations, we aimed to identify which type I IFN member(s) were responsible for driving the type I IFN signalling in our infection model.

Instead of exploiting base-pairing as in the sequencing-by-synthesis approach used by Illumina and others, nanopore sequencing uses an electronic sensor to detect DNA via a change in electric current (reviewed in 4 ). The MinION’s flow cell is comprised of 2048 wells containing a membrane perforated by nanopores. Ligated with a molecular motor, a single stranded DNA molecule passes through the pore, altering the recorded current. After the electronic sequencing is carried out, a software base-calling algorithm transforms the current trace into a modelled DNA sequence. The advantages of the MinION are rapid library preparation, portability 5 , 6 , long molecule sequencing 7 , and sequencing of non-model modifications of the DNA strand 8 . Recent improvements in the chemistry of the MinION have overcome the majority of issues associated with low yield and high error rates that have limited the range of its application. The MinION sequencing device has now been successfully used to sequence genomes of a wide range of sizes, from bacterial and viral genomes 9 , 10 , amplicon sequencing such as bacterial 16S rRNA sequencing 11 , and more recently a human genome 12 . The MinION has also been used for cDNA sequencing 13 , for detecting DNA methylation patterns without chemical treatment 8 , 14 , and for direct RNA sequencing with detection of modified 16S rRNA nucleotides 15 .

High-throughput DNA sequencing is a rapidly evolving field with new methods and applications introduced almost weekly 1 . One of the most recent sequencing technologies available on the market is the MinION sequencing device from Oxford Nanopore Technologies (ONT) 2 . A brief overview of MinION sequencing technology is discussed in our previous study on mitochondrial genome assembly 3 .

Reads were initially base-called during the sequencing runs in January 2017 using Metrichor 2D basecalling, from MinKNOW v1.3.25. An initial analysis of called reads demonstrated substantial disagreement between base-calls and the raw signal (e.g. hairpin adapter sequences matching multiple times when the signal showed only one present), so reads were recalled as in March 2017 using Albacore v0.7.5.

The ONT Native Barcoding Kit (EXP-NBD002) and 2D Ligation Sequencing Kit (SQK-LSK208) were used to prepare the samples for sequencing, as per the manufacturer’s protocol. Briefly, purified PCR amplicon products were blunt-ended, ligated with barcode sequences, pooled in approximately equimolar amounts, then ligated with flow cell adapters and a hairpin linker. In order to explore the effect of ligation method on the degree of chimerism, two different adapter/hairpin ligation reactions were carried out: one using the standard quick (10-minute) ligation, and the other using an overnight ligation at 4° Celsius. No additional adapter-free controls were used; it has been our prior experience that sequencing does not proceed in a callable fashion unless adapter sequences are present. The barcoding scheme used in the library preparation is shown in Figure 2 . Samples were quantified after barcoding for overnight ligation (2.14 ng/µl , 2.54 ng/µl and 2.56 ng/µl for Ifna , Ifnb , and Actb respectively) and for quick ligation (2.13 ng/µl , 2.68 ng/µl and 2.45 ng/µl for Ifna , Ifnb , and Actb respectively). These samples were normalised and pooled together to give 26.6ng each in 33.1 µl distilled water for ligation. After adapter ligation, the quick ligation method showed no detectable nucleic acid, as seen using a fluorescence quantitation with the Quantus fluorometer (Promega), while the overnight ligation quantified at 0.239ng/ µl . We decided to pool the samples together anyway, and were pleasantly surprised to discover a substantial proportion of reads from quick-barcoded sequences.

Ethics approval for maintenance of the N. brasiliensis life cycle is overseen and approved by the Victoria University of Wellington Animal Ethics Committee. C57BL/6J mice were originally obtained from The Jackson Laboratory, Bar Harbour, Maine, USA, and maintained at the Biomedical Researc Unit of the Malaghan Institute of Medical Research by brother X sister mating. Breeding pairs were refreshed regularly to maintain the genetic integrity of the strain. Mice were maintained in specific pathogen-free conditions and all mouse experiments were approved by the Victoria University Animal Ethics Committee (permit number 23907) and carried out according to institutional guidelines.

Ifna cDNA were amplified by PCR using primers designed across a highly-conserved region of all Ifna coding sequences, which resulted in a mixed PCR product containing all 14 Ifna genes. cDNAs of Ifnb and Actb were amplified separately and used as quantification controls. Altogether, the three pooled amplicons were loaded into a flow cell and sequenced. Among the reads that we obtained, we noticed long chimeric reads comprising of two or more sequences from different amplicons. We decided to further examine this phenomenon.

Ifna , Ifnb , and Actb amplicons were generated using specific primers: IfnaF (ATGGCTAGRCTCTGTGCTTTCCT) and IfnaR (AGGGCTCTCCAGAYTTCTGCTCTG) 20 ; IfnbF (CTGGCTTCCATCATGAACAA) and IfnbR (GCAACCACCACTCATTCTGA); and ActbF (AGGGAAATCGTGCGTGACAT) and ActbR (ACGCAGCTCAGTAACAGTCC). PCR amplification was performed using Phusion High-Fidelity PCR Kit (Thermo Scientific), see Figure 1 . PCR products were cleaned using QIAquick PCR Purification Kit (QIA-GEN) and verified by gel electrophoresis.

Two 8-week-old C57BL6/J male mice (Jackson Laboratories, approx 23g), housed and bred at the MIMR under specific pathogen free conditions respecting the local and New Zealand ethic guidelines, were chosen for the investigation. 300 dead infective N. brasiliensis L3 larvae were injected intradermally in each ear of one mouse in 30uL PBS after anaesthesia with an intraperitoneal injection of 200uL ketamine/xylazine. The other mouse was similarly euthanised and injected intradermally in each ear with 30uL PBS. The mice were euthanised in a CO 2 chamber 3h post injection and ears (approx 27–30mg in weight) were immediately harvested and conserved in RNALater at 4C for <1h. RNA extraction of each whole ear was done in 1mL of Trizol following the products’ guidelines (ThermoFisher). cDNA was synthesised using the High Capacity RNA-to-cDNA kit (Applied Biosystems), according to the manufacturer’s instructions. Only the cDNA from the N. brasiliensis -treated mouse was used for this investigation.

Nippostrongylus brasiliensis was originally sourced from Lindsey Dent of the University of Adelaide, South Australia and has been maintained for 22 years by serial passage at the Malaghan Institute. Female Lewis rats were bred and used for maintenance of the N. brasiliensis life cycle when 4 months of age (and weight over 150g), as outlined in Camberis et al. 19 .

Results and discussion

During the initial MinION sequencing run to investigate the expression of Ifna-family members in mice (comparing with Ifnb and Actb transcripts), we encountered issues with 2D base-calling through the Metrichor web service, which seemed to be due to failed alignment of component 1D strands. A BLAST search on some of the longest base-called 1D reads led to a discovery that some reads had multiple mappings to our target Ifna-family members. Further exploration of the data demonstrated a situation in which both Ifna and Actb sequences were present in the same read (see Figure 3). This was an unexpected result; we had carried out separate PCR reactions for each transcript, so were not expecting reads to appear that mapped to different transcripts. Our conclusion was that chimeric ligation of input DNA was occurring at some stage during the sample preparation process, but all we were able to determine at the time was that this chimerism was happening some time after the PCR, but before the sequencing. The present experiment was designed in light of these prior results to more easily quantify the degree of ligation that was happening.

Read counts Despite using a 2D ligation chemistry in the sample preparation, and selecting out hairpin-containing reads using streptavidin beads, the majority of reads could not be called as an aligned 2D sequence: of 329,591 sequenced reads, 299,124 were base-called by Albacore, and 1005 (0.3%) of these base-called reads had an aligned 2D sequence (see Supplementary File 1). Any called reads that were not called as 2D were processed further as 1D sequence, i.e. the remaining 298,119 (99.7%) of called reads. Discussions with ONT staff, in particular Forrest Brennen, during the London Calling conference in 2017 provided insight into what had caused the failure in 2D base-calls. Oxford Nanopore Technologies introduced a chemistry upgrade for their 2D ligation sequencing kits that produced a different, and more obvious, hairpin signal with three peaks rather than two. This modified hairpin signal was the one that the Metrichor and Albacore base-callers were looking for in January 2017 and March 2017 respectively. However, the 2D barcoding kit that we used still had the old hairpin adapter included, and this meant that the base-callers ignored the hairpin region and attempted to call the entire sequence as a 1D read. Oxford Nanopore Technologies subsequently updated their Albacore base-caller to correct this error for 2D barcoded reads, but due to discontinuing the 2D chemistry in preference to the faster and more accurate 1D2 chemistry, the 2D base-caller is no longer developed or included in Albacore. We were able to obtain from ONT the latest, and only, Albacore version that included this fix (version 1.2.4), and recalling reads showed substantial improvement in detecting 2D sequence: 40.8% of reads were called as 2D reads, which was much closer to the 48.6% of reads that we found with a detectable hairpin adapter in the 1D base-called sequence.

Read mapping Called 1D reads were mapped to Actb, Ifnb1, an Ifna consensus sequence, additional interferon sequences, the ONT control strand sequence, and known ONT adapter sequences (see Supplementary File 2) using LAST v83321. A total of 261,183 reads (87.6% of called 1D reads) were discovered that mapped to at least one known amplicon and/or barcode sequence.

Categorisation of non-chimeric reads After elimination of definitively chimeric reads, 256,620 reads remained that appeared to map uniquely to single sequences (see Figure 5). A small proportion of these sequences (14,223; or 5.5%) had detectable barcode sequences, but did not map to any amplicons (i.e. mappable to an overnight or quick barcode sequence only). It is expected that these unmapped barcoded sequences were unamplified mouse cDNA sequences. A difference in read counts was observed between overnight-barcoded sequences and quick-barcoded sequences (77.8% overnight, 22.2% quick), which was consistent with the difference in input amount observed during sample preparation. An attempt was made during sample preparation to add in the three different amplicon preparations in equimolar quantities, which was more successful for the Actb preparation (33.6%) than it was for the Ifna and Ifnb preparations (42.7% and 23.7%, respectively). An additional categorisation of Ifna family members (see Supplementary File 3) was attempted, but is not presented here as it detracts from the main chimeric read investigation. Intermediate results and a processing script from this categorisation are available in verbose form as Supplementary File 4.

Read signal confirmation of chimerism A few of the reads were investigated at the raw signal level to make sure that the electrical trace was in agreement with the base-called signal. A demonstrative signal trace for a non-chimeric 2D read comprising of a single barcode-adapted amplicon is shown in Figure 6. Read traces typically began with a high-current (but relatively uniform) open pore state, followed by an intermediate stall signal (also fairly uniform), after which the highly variable sequence trace begins. Hairpin adapters could be easily identified in the raw signal as a bridge structure a little over halfway through a 2D sequence. A number of situations were observed in the base-called sequence where ligation during sample prep seems to have occurred, and in some cases this ligation resulted in multiple hairpin adapters being ligated in the same sequence. One such occurrence of this is seen in Figure 7, where two barcoded overnight sequences from two different amplicons (Ifnb1 and Ifna2) were joined together. Because two amplicons were concatenated, this ligation must have happened after the barcoding step of sample preparation (i.e. during adapter ligation). This finding has potential implications for other sequencing technologies, as the ligation process used for sample preparation is unlikely to be specific for nanopore sequencing. The formation of chimeric reads during sample preparation may be one explanation for the index switching phenomenon seen in Illumina-sequenced reads (e.g. see 22–24), and presents a substantial problem for dual-indexed reads where identical indexes are used for different samples. Where dual-indexed reads are not used, ligation of reads with the same index may still be problematic depending on the particular sequencing application.