The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compel additional exploration. We therefore undertook whole-genome sequencing of the acidogenic A. niger wild-type strain (ATCC 1015) and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence, and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was used to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 Mb of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis supported up-regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases, and protein transporters in the protein producing CBS 513.88 strain. Our results and data sets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi.

The saprotrophic filamentous fungus Aspergillus niger is found globally and exhibits a great diversity in its phenotype. A. niger has become one of the major workhorses in industrial biotechnology, being very efficient in producing both polysaccharide-degrading enzymes (particularly amylases, pectinases, and xylanases) or organic acids (mainly citric acid) in high amounts. It also has a long history of safe use (Schuster et al. 2002; van Dijck et al. 2003; van Dijck 2008). Commercial importance is illustrated by a world market for industrial enzymes of nearly US$ 5 billion in 2009, of which filamentous fungi account for roughly half of the production (Lubertozzi and Keasling 2008), and a global citric acid production of 9 × 106 metric tons in 2000 (Karaffa and Kubicek 2003).

In 2007, the genome sequence of A. niger strain CBS 513.88, used for industrial enzyme production, was published (Pel et al. 2007). This strain was derived from A. niger NRRL 3122, a strain developed for glucoamylase A production by classical mutagenesis and screening methods (van Lanen and Smith 1968). This work initiated a number of new genome-based investigations (Sun et al. 2007; Andersen et al. 2008a,b; Martens-Uzunova and Schaap 2008; Yuan et al. 2008a,b) but did not reveal differences between citric-acid-producing and enzyme-producing A. niger strains (Cullen 2007).

In this study, we present the nearly complete genome sequence of the citric-acid-producing A. niger wild-type strain ATCC 1015 and compare it to the genome sequence of the enzyme-producing strain CBS 513.88. The genetic diversity of these two A. niger strains was determined by applying systems biology tools as well as new bioinformatics methods to examine multi-level differences that distinguish the wild-type citric-acid-producing strain from the mutagenized glucoamylase A–producing strain.

Results

General genome statistics The 34.85-Mb genome sequence of A. niger ATCC 1015 was generated using a shotgun approach and then further improved to a high-quality assembly of 24 finished contigs separated by 15 gaps (including eight from centromeric regions). Genome statistics are summarized in Table 1, with details in Supplemental Text 1. The full sequence and annotations are available from the Joint Genome Institute (JGI) Genome Portal (http://genome.jgi-psf.org/Aspni5) and from NCBI (accession number ACJE00000000). View this table: Table 1. General genome statistics for A. niger ATCC 1015 and A. niger CBS 513.88 The genome sequence of A. niger CBS 513.88 (Pel et al. 2007) was improved using the ATCC 1015 sequence to close 186 contig gaps and modify the gene models associated with these gaps (Table 1; Supplemental Table 1). The updated A. niger CBS 513.88 genome sequence is accessible through EMBL (accession numbers AM269948–AM270415). We note a large difference (2882) in the number of called genes in the two strains (Table 1). A thorough analysis indicates an overprediction of genes in CBS 513.88 and an underprediction of genes in ATCC 1015, and 396/510 unique genes in CBS 513.88/ATCC 1015 (Supplemental Text 2; Supplemental Fig. 1; for details, see Supplemental Tables 2–9). For further validation of the absence/presence of individual proteins, we have performed gDNA hybridizations, which can be consulted for reference (Supplemental Table 16).

Unique genes in both strains suggest horizontal gene transfer to be a cause for the amylase hyper-producer phenotype For the genes unique to the amylase-producing strain CBS 513.88 (Supplemental Table 6), the most notable genes are two alpha-amylases that are identical to the Aspergillus oryzae alpha-amylase, a possible cause for the amylase hyper-producer phenotype. We discuss this in detail below (Fig. 2). Furthermore, we find three possible polyketide synthases, which suggests a unique secondary metabolite profile for this strain. Examining the genes found only in ATCC 1015 (Supplemental Table 6) does not point to any obvious cause for citric acid hyper-production, but four possible polyketide synthases and a putative NRPS are found to be unique for ATCC 1015, suggesting this strain has unique secondary metabolites as well. Multiple effects of this type are, indeed, seen for both strains in analyses below (Fig. 1; Supplemental Fig. 10; Table 2). View this table: Table 2. Exo-metabolomic profiling of 11 A. niger strains based on HPLC-DAD-FLD and HPLC-DAD-HRMS (this study) View larger version: Download as PowerPoint Slide Figure 1. Synteny map of the contigs of A. niger ATCC 1015 to the supercontigs of A. niger CBS 513.88. The coloring of the chromosomes shows syntenic regions in A. niger CBS 513.88. Arabic numerals show the number of the supercontig in A. niger CBS 513.88. Gray areas show regions not found in the CBS 513.88 genome sequence (Pel et al. 2007). (Filled black circles) Proposed locations of centromeric regions. Sequenced telomeres are marked with a T. Zeroes mark the first base of the contigs. A black line underneath a section of the chromosomes denotes inverted sequence. Black histograms show SNPs per kilobase (number of single nucleotide polymorphisms/kilobase) between the sequences of the two strains (y-axis: 0–160 SNPs/kb). Gaps between contigs and centromeres are not to scale. The alignment demonstrates almost complete synteny between the two strains, with the exception of a cross-over event between the left arms of chromosomes III and VII. An overview of the genes found in the gaps unique to ATCC 1015 can be found in Supplemental Table 2.

Synteny mapping shows 0.5 Mb of chromosomal rearrangements and a whole-arm inversion The genomes of the two strains are largely syntenic (Fig. 1). For example, the 1429 protein-encoding genes of CBS 513.88 supercontig An01 that have predicted counterparts in ATCC 1015 are, with one exception, all mapped to chromosome 2b. A similar example is seen for all but one of 1486 protein-encoding genes on supercontig An02. Both exceptions encode putative Tan1-like transposases. Notwithstanding, a number of significant differences in genome configuration exist (Fig. 1). More than 0.5 Mb of additional genome sequence in ATCC 1015 resides in four large regions on chromosomes (Chr) II, III, and V (Fig. 1). The flanking sequences of three of these additional elements in ATCC 1015 are syntenic to continuous sequences in CBS 513.88 and are thus true differences in genome configuration (Supplemental Fig. 2; Supplemental Table 2; for details on ChrIII, see next section). The fourth region on the left arm of ChrV in ATCC 1015 could not be verified due to a contig gap for CBS 513.88. The contig gap in the right arm of ChrV in the genome sequence of ATCC 1015 is spanned by continuous sequence in CBS 513.88, containing 15 predicted genes. An overview of the genes found in the gaps unique to ATCC 1015 can be found in Supplemental Table 2. Other striking differences include a large inversion in ChrVIII and the inversion and translocation of a large fragment between the left arms of ChrIII and ChrVII (Supplemental Fig. 3). Both were confirmed with PCR spanning the breakpoints (data not shown). Finally, the presence of telomere sequences in genome data confirms an inversion of the complete right arm of ChrVI.

Two extra alpha-amylase-encoding genes likely obtained through HGT An unmatched region identified for the left arm of ChrIII (Fig. 1) spans 72.5 kb of unique sequence in ATCC 1015. Remarkably, for CBS 513.88, a unique 85.3-kb sequence is found at this location (Fig. 2A). Gene annotation may be found in Supplemental Table 10. To confirm the actual absence of the region in the chromosome of ATCC 1015, we performed gDNA hybridizations, which did indeed support this finding (Supplemental Fig. 4). View larger version: Download as PowerPoint Slide Figure 2. Horizontal gene transfer of alpha-amylase genes from A. oryzae to A. niger CBS 513.88. (A) Unmatched region identified for the left arm of chromosome III spanning 65 kb for ATCC 1015 encoding 30 predicted genes and 85 kb for CBS 513.88 encoding 24 predicted genes. The unmatched region is flanked on one side by a small local inversion of three predicted genes (in green). (B) The 12.4-kb HGT region is part of an identical 12.7-kb region present in A. oryzae RIB40 supercontigs SC113 and SC023. In A. niger CBS 513.88, the transferred region is enclosed by a 203-bp inverted repeat (red arrow). An12g07000 (blue) is a putative transposase and identical to the A. oryzae transposon Aot1 tnpA gene. Similarly, the A. niger CBS 513.88 alpha-amylase encoding gene An12g06930 (orange) is identical to A. oryzae annotated genes AO090120000196 (SC113) and AO090023000944 (SC023). (C) Proposed duplication–recombination event between supercontig 12 and supercontig 05 of the 12-kb HGT region. Breakpoints are indicated with dotted lines. The fragment encoding alpha-amylase An05g02200 is identical to An12g06930 (orange). The breakpoints are flanked with additional copies of the 203-bp repeat region (red arrow). The region encoding genes An12g06940 to An12g06970 is identical to the region encoding An05g02210 to An05g02130. The downstream breakpoint occurred in the predicted gene coding region of An12g06970, thereby deleting the original start codon and upstream region. Approximately 12 kb of this region shares >99.8% sequence identity at DNA level with genomic DNA from A. oryzae RIB40 (Fig. 2B). The complete 85-kb fragment including the 12-kb alpha-amylase region also appears to be involved in a recent duplication recombination event to ChrVII, (Fig. 2C). Thus, the CBS 513.88 genome harbors two additional alpha-amylase-encoding genes that are orthologs to the alpha-amylase-encoding genes (AO090023000944, AO090120000196) of A. oryzae RIB40. These findings suggest that CBS 513.88 and the parental strain NRRL 3122 (data not shown) acquired these duplicate alpha-amylase genes (An12g06930, An05g02100) through horizontal gene transfer (HGT). The occurrence of HGT in Aspergilli would further be supported by the presence of alpha-amylase-encoding genes in other black Aspergilli that display >99% nucleotide identity to those of A. oryzae RIB40 and the A. niger CBS 513.88 alpha-amylases (Korman et al. 1990; Shibuya et al. 1992). The origin of the remaining part of the unmatched region remains unclear. No significant similarity was observed at the DNA level, and only a few of the encoded proteins show similarity with other proteins present in the NR protein database. There is also evidence that this HGT may have occurred by the action of transposases. The 12-kb HGT region is flanked by 202-bp inverted, perfect, terminal repeats (ITR) (Fig. 2A,B). Furthermore, this HGT region harbors another gene, An12g07000, that is identical to the A. oryzae tnpA transposase gene. Interestingly, in the A. oryzae RIB40 genome, the same 12-kb fragment has been duplicated twice and is present on multiple chromosomes, but only one perfect copy of the ITRs is retained (Fig. 2C).

Transposon presence is strain-specific Transposon-like sequences were identified in both genomes and quantified (Supplemental Table 11). This comparison pinpoints a remarkable difference in the presence and amount of transposon-related sequences in ATCC 1015 and CBS 513.88 both for class I and for class II transposons: Only 16 sequences were identified in ATCC 1015, but 55 were detected in CBS 513.88. Whereas the ATCC 1015 genome contains no class I superfamily copia and a single copy of class II superfamily Fot1/pogo, these sequences are much more abundant in CBS 513.88, where 15 class I copia, and 13 class II Fot1/pogo are found.

SNP analysis reveals high mutation rates and hypervariable regions We found 8 ± 16 SNPs/kb (average ± standard deviation) and a maximum of 163 SNPs/kb single-nucleotide polymorphisms (SNPs) between A. niger strains ATCC 1015 and CBS 513.88 (Supplemental Table 12). This value is much higher than the maximum of 9 SNPs/kb found by Cuomo et al. (2007) in a SNP analysis of two Fusarium graminearum strains. A comparison of the A. niger ATCC 1015 genome sequence to that of A. niger ATCC 9029 revealed markedly less variation between the two (2 ± 5 SNPs/kb) (Supplemental Fig. 5), indicating a large genomic variation in the A. niger group but little between ATCC 1015 and 9029. These polymorphisms are not uniformly distributed but cluster in hypervariable regions (Fig. 1). This is supported by a gDNA hybridization study, which confirms the presence of distinct hypervariable regions (Supplemental Fig. 4).

Gene comparisons reveal industrially relevant strain differences To identify strain-defining systemic effects of the genome variation, results from the Imprint gene synteny analysis (see Supplemental Table 5; Methods) were used for additional genome-scale investigation. GO term over-representation analysis was performed on gene groups with distinct common properties (for Group overview, see Supplemental Table 5; for details, see Supplemental Table 13). Most interesting is the observation that 37 proteins involved in transcriptional regulation are nonfunctional in CBS 513.88 due to frameshifts or stop codons. This suggests a less stringent regulation of CBS 513.88 relative to ATCC 1015. A comparative shake-flask study of the two strains also suggests mutations in regulatory elements as nitrogen source utilization is impaired in the CBS 513.88 strain (data not shown). While the nitrogen catabolism regulator AreA was originally thought to be mutated in CBS 513.88 (Pel et al. 2007), a resequencing of areA showed the ORFs to be identical. To detect potential differences in the metabolism of the two strains, we compared all genes encoding proteins that display differences at the amino acid level between the two strains on the metabolic network of A. niger (Andersen et al. (2008a) and mapped them to metabolic pathways (Supplemental Fig. 6). Mutations were found in the pathways for biosynthesis of proline, aspartate, asparagine, tryptophan, and histidine, which may be relevant to protein production. Also, mutations were found in the plasma membrane-bound ATPase, in the enzymes of the GABA shunt, of the TCA cycle, and in components of all steps of the electron transport chain, which could be relevant for the production of citric acid.

Phylogenetic analysis confirms high variability between genome sequences To place the comparison of A. niger ATCC 1015 and CBS 513.88 into the larger context of the Aspergillus section Nigri, variation in the two strains was analyzed by phylogenetic analysis of a part of the beta-tubulin sequence for a number of A. niger strains and other black Aspergilli (Fig. 3A). To further explore the genome variability across the A. niger species, we identified four 1-kb regions from ChrII, ChrIV, ChrVI, and ChrVIII that were identical in the ATCC 1015 and ATCC 9029 strains but had ∼20 SNPs/kb relative to the genome sequence of CBS 513.88. These regions were PCR-sequenced in seven A. niger strains including the CBS 513.88 progenitor, NRRL 3122, and the A. niger neotype strain, CBS 554.65 (Kozakiewicz et al. 1992) to form a phylogenetic tree with high variation (Fig. 3B). For previously sequenced strains, the resequencing results were identical to the previously determined genomic sequence, thereby excluding that the genomic variation is an artifact of low-fidelity sequencing. View larger version: Download as PowerPoint Slide Figure 3. (A) Phylogenetic relationship of several black Aspergilli based on partial sequencing of beta-tubulin. The tree was rooted to Aspergillus aculeatus CBS 172.66. (B) Phylogenetic relationship of seven strains of A. niger based on sequencing of 1-kb variable regions from four chromosomes. The tree was rooted to the sequence obtained from Aspergillus carbonarius IMI 388653. Clades based on the exo-metabolomic groupings of Table 2 are shown. For both trees, bootstrap values above 80% of the 1000 performed reiterations are shown. All A. niger strains have an identical sequence for beta-tubulin and form a strongly supported terminal clade together with Aspergillus awamori (Fig. 3A). The other taxonomically different species are phylogenetically distinct from this group, which is in accordance with the serial classification of Frisvad et al. (2007a). The latter clustering based on the selected regions (Fig. 3B) was able to separate the A. niger strains into three groups. Strain CBS 513.88 and its progenitor NRRL 3122 were identical for the regions, indicating limited SNP introduction due to classical strain improvement, and thus confirms high variation in the A. niger group.

Exo-metabolomic profiling describes three distinct groups of A. niger To further compare the two sequenced strains on a system-wide, but nongenomic level, to each other and to other members of the A. niger species group, exo-metabolomic profiles of the two strains were compared to nine other strains, including the CBS 513.88 progenitor strain, NRRL 3122; the widely used laboratory strain ATCC 9029; and the A. niger neotype, CBS 554.65 (Table 2). This gave three distinct clades, which follow the clustering of Figure 3B, but does not conform to the phylogenetic distances calculated. Strain CBS 513.88 and its progenitor strain had very similar secondary metabolite profiles. These two strains differed from the other strains analyzed. Strain ATCC 1015 had a profile similar to seven other strains, although there were some quantitative differences. Strain CBS 126.49 had a unique profile. In the case of ochratoxin A, it is interesting that in ATCC 1015 and ATCC 9029, a remnant of the PKS gene (An15g07920) of the putative ochratoxin cluster in A. niger CBS 513.88 (Pel et al. 2007) was identified, which could be related to a 21-kb deletion in both strains (Supplemental Fig. 7). We have noted that a putative NRPS is found in one of the unique chromosome regions found in ATCC 1015 (Fig. 1), which may account for some of the difference (Supplemental Table 2).

Transcriptome analysis of A. niger ATCC 1015 and CBS 513.88 growing on glucose To evaluate the effect of the differences in genome sequence on the physiology of the two strains, batch cultivations in bioreactors and a comparative transcriptome analysis were performed. Strains ATCC 1015 and CBS 513.88 were grown under the same conditions in batch cultures in a glucose-based minimal medium. The GlaA-producing strain CBS 513.88 produced >1.2 g/L GlaA more than ATCC 1015, while producing ∼1 g/L biomass less. Other measured characteristics were similar for the two strains (Table 3). Statistical analysis showed 4784 significantly (adj. p < 0.05) differentially expressed genes, with an almost equal number of genes up-regulated in either strain (2431 in CBS 513.88 vs. 2353 for ATCC 1015). View this table: Table 3. Statistics for batch cultivations of A. niger ATCC 1015 and A. niger CBS 513.88 Examining the differential expression in the context of metabolic pathways (Supplemental Fig. 8), only the alternative oxidative pathway has uniformly higher expression in ATCC 1015, while a substantial subset of metabolism was up-regulated in CBS 513.88, including glycolysis and the TCA cycle. As the specific growth rates of the strains are similar, we suggest that this extra activity of central metabolism provides precursors for the higher productivity of glucoamylase. We also find increased expression of genes involved in amino acid metabolism, especially the entire biosynthetic pathways of threonine, serine, and tryptophan. Intriguingly, analysis of the amino acid composition of the glucoamylase A protein (Supplemental Fig. 9) revealed that GlaA is atypical in that it has a higher content of specifically tryptophan, threonine, and serine than 90% of the predicted genes of CBS 513.88. For all three amino acids, the content is almost twice as high as the average in the composition of total protein (Christias et al. 1975). We initially thought that this was due to a frameshift mutation identified in the gene for the general amino acid–regulating transcription factor (cross-pathway control protein) CpcA (Supplemental Table 5), but resequencing of this gene showed this frameshift to be a sequencing error. The presence of two sense mutations was confirmed and is currently being investigated. To further support that the increased production of GlaA is significant enough to affect amino acid biosynthesis, we used a genome-scale metabolic model of A. niger (Andersen et al. 2008a) to model the two strains under the growth conditions used (Table 3). The computed fluxes (Supplemental Table 14) show that for all three amino acids, the fluxes through the biosynthetic pathways must be at least twice as high in CBS 513.88 compared to ATCC 1015 to support the increased GlaA production, thus corresponding well with the transcriptome results. For a broader analysis of trends revealed by the transcriptome profiles, a GO term over-representation analysis was conducted on the genes significantly up-regulated in either strain (Supplemental Table 15). The analysis confirmed that traits relevant for protein production—such as amino acid biosynthesis and tRNA aminoacylation—appear to be highly significant for CBS 513.88. For ATCC 1015, GO biological functions for electron transport (adj. p = 2.7 × 10−5), carbohydrate transport (adj. p = 6.8 × 10−3), and organic acid transport (adj. p = 0.035) were significant. An examination of expression of individual genes showed that glaA had significantly increased expression in CBS 513.88 (adj. p = 84 × 10−6), but the fold change (3.2) was lower than the increase in enzyme activity (sixfold) (Table 3). Interestingly, all identified tRNA-aminoacyl synthases were found to be twofold to sixfold up-regulated in CBS 513.88.