Identification of homologous genes

The HomoloGene database contains information on homologous genes in 5 mammals and 15 non-mammals [28]. We searched the HomoloGene database for all 1496 genes of Recon 1 and found a human match for 1,464 genes (97.8%). The mammalian organism with the highest number of genes homologous to Recon 1 genes was the mouse (Mus musculus) (1,415 genes, 97%). The non-mammalian organism with the highest number of genes homologous to Recon 1 genes was the zebra-fish (Danio rerio) (1,200 genes, 82%) (Additional file 1).

Creation of draft metabolic network reconstructions

For each mammal, a draft reconstruction was created via two different approaches (Figure 1a). In approach A (modelA), all reactions linked to Recon 1 genes that did not have a homologous gene were removed from the reconstruction of the corresponding species. Therefore, modelA included all reactions linked to genes homologous to Recon 1 genes in addition to all non-gene associated reactions (1,514). Of the non-gene-associated reactions, 676 were transporter reactions, 452 were demand or exchange reactions, leaving 385 reactions within a metabolic pathway but without gene-association. In approach B (modelB), we included only genes and their reactions homologues to Recon 1 genes as well as transporter and demand reactions. All other reactions, including non-gene associated reactions, were removed from the reconstruction.

Figure 1 Creation of draft mammalian reconstructions. a) A schematic figure showing the two approaches used to generate draft mammalian reconstructions using Recon 1. Approach A removes all gene-associated reactions from Recon 1 without a homologous gene in the reconstructed animal, while keeping all non-gene-associated reactions. Approach B removes all gene-associated reactions from Recon 1 without a homologous gene in the reconstructed as well as non-gene-associated reactions (excluding transporters and demand reactions). GAR - gene-associated-reactions; nGAR - non-gene-associated reaction; transp/dem - transporters and demand reactions. b) Ratio of reactions (black bar) and genes (gray bar) that were successfully mapped from Recon 1 to the indicated mammalian draft reconstruction. c) A phylogenetic tree based on all transcripts of protein domain sequences from the SuperFamily database [64] for all reconstructed mammals. d) A phylogenetic tree based on flux variability analysis (FVA) of all reactions in all mammals reconstructed via approach A. e) A phylogenetic tree based on flux variability analysis (FVA) of all reactions in all mammals reconstructed via approach B. Full size image

The ratio of Recon 1 genes with a homologous gene as well as the ratio of reactions from Recon 1 included in the draft reconstructions for each mammal is shown in Figure 1b. The mammal with the highest ratio of mapped genes (97%) and reactions (98%) was the mouse while the chimpanzee (Pan troglodytes) had the lowest ratio of both mapped genes (84%) and reactions (91%). Furthermore, we created for each mammalian modelA and modelB a compartmentalized and non-compartmentalized version. The more complex, compartmentalized versions had more metabolic dead ends (321 vs. 150) and therefore more reactions with zero fluxes. The number of reactions with non-zero fluxes for both compartmentalized and non-compartmentalized versions of all draft mammalian models created using approach A and B are summarized in Table 1. We performed flux variability analysis (FVA) of models of these reconstructions, allowing uptake of all extracellular metabolites in all models in order to maximize likelihood of non-zero fluxes. Compared to humans, the mouse model had the highest percentage (84-89%) and the chimpanzee model had the lowest percentage (41-54%) of active reactions with non-zero fluxes for all model versions (Table 1). We then constructed a phylogenetic tree of the reconstructed mammals based on the FVA results (Figure 1d,e). The results indicated that the mouse draft model had the highest degree of similarity with the human model for both modelA (Figure 1d) and modelB (Figure 1e). These findings are surprising as the chimpanzee is the closest relative of humans with approximately 99% overall genome similarity [29]. It is also the closest ancestor to humans of the mammals reconstructed when protein coding sequence is compared (Figure 1c).

Table 1 Results from flux variability analysis of the draft mammalian models compared to the human model (H. sapiens, Recon 1). Full size table

Identification of unique metabolic functions in the mouse

Given the high number of reactions mapped to the draft mouse reconstruction, and the high number of reactions with non-zero fluxes resulting in the corresponding metabolic modelA and modelB, we sought to finalize the mouse reconstruction by gap analysis/filling. First, we aimed to understand the metabolic differences between human and mouse to ensure that the mouse reconstruction is not merely a modified version of human metabolism. This implies that also genes and pathways unique to mouse metabolism needed to be identified. Therefore, we employed the Comparative Pathway Analyzer 1.0 [30] relying on the KEGG database [31] to extract metabolic maps displaying the existence of enzymes in both mouse and human for 66 out of 99 Recon 1 subsystems (Additional file 2). We then used these maps to perform a manual search of potential gaps between human and mouse metabolism. Out of 1550 reactions present within these subsystems, 1492 (96%) reactions existed in both species, 46 (3%) reactions existed only in humans and 1 (1%) reactions existed only in the mouse (Additional file 2). For the 25 remaining Recon 1 subsystems (excluding transporters), we did an extensive literature search for differences between the two species. Of those, only cytochrome P 450 metabolism was reported to differ significantly between mouse and human [32]. These defined differences will guide the subsequent gap filling process. However, since most of the metabolic functions seemed to be present in both human and mouse, we can employ the validation tests which were designed to evaluate the predictive potential of Recon 1 [17].

Gap filling of the mouse metabolic reconstruction

First, we decided to test which of the four mouse metabolic models were able to produce biomass by optimizing for the corresponding reaction accounting for all known biomass precursors required for cell replication. The compartmentalized and non-compartmentalized draft mouse models from approach A produced biomass without any modifications. In contrast, both versions of modelB were unable to produce biomass. Therefore, we sought to gap fill the draft non-compartmentalized modelB to identify reactions that needed to be added to the model for production of all biomass components. The SMILEY gap filling algorithm identifies reactions from a database that need to be added to a model to fulfill the optimality condition (e.g. production of biomass) [7].. Using the SMILEY algorithm, we searched for the minimal number of reactions within the entire KEGG database necessary to add to the non-compartmentalized mouse modelB in order to enable the production of biomass.

The results from 40 iterations of the algorithm were checked manually for two criteria: i) the result did not suggest a reversible reaction for a known irreversible reaction and ii) the added reaction(s) are known to occur in mouse. Of the 40 iterations, 30 suggested adding the same reaction either alone or in addition to some other metabolic reactions. This reaction (KEGG ID R06522) exists in mouse and humans (according to KEGG and Entrez gene [33]) but was not included in Recon1. It is a phosphohydrolase reaction involved in sphingolipid metabolism. Adding this reaction resulted in the ability of the refined non-compartmentalized modelB model to produce biomass at a similar rate as the non-compartmentalized modelA. The addition did, however, not result in biomass production in the compartmentalized modelB. Subsequently, we decided to focus the remainder of the study on modelA, as it captures most of the known metabolic capabilities in the mouse. ModelB was not further developed since it is missing a significant fraction of metabolic reactions and therefore does not function.

Validation of the mouse metabolic reconstruction

To validate the mouse reconstruction and to ensure mouse-characteristic metabolic properties of the resulting models, we manually tested each of the 288 FBA validation tests that were developed for Recon 1 [17] using both versions of modelA. Our literature survey on mouse metabolism, also focusing on the mouse essentiality of the 288 human tests, revealed no tests that were essential only in the human metabolism. Furthermore we found no evidence of mouse-specific additional tests of essentiality that could be added to our validation process. Therefore, we evaluated manually if all required reactions for each test are present in the mouse metabolism using Comparative Pathway Analyzer 1.0. A total of 260 tests passed the requirements and were therefore used for validation of the compartmentalized mouse modelA (Additional file 3). For the non-compartmentalized version of modelA, six additional tests were removed since they were compartment specific, resulting in 254 validation tests.

We ran the validation tests on the compartmentalized mouse draft modelA in an iterative manner and evaluated particularly the failed tests. This process revealed that out of the 131 reactions initially removed due to missing homologous genes in the HomoloGene database, 36 had sequence and physiological evidence of existence in the mouse (according to KEGG and EntrezGene databases) (Additional file 4). Those 36 reactions were therefore added again to the model, but the remaining 95 reactions were left out of the final model due to missing physiologic or sequence evidence. This addition resulted in a finalized mouse modelA that passed all the 260 validation tests. Also, we added the reaction discovered by the gap filling of modelB (KEGG ID R06522). Furthermore, unique mouse reactions which do not lead to metabolic dead ends in the model were added (KEGG IDs R03184, R00647 and R01465). The non-compartmentalized modelA similarly passed 100% of its 254 validation tests after the addition of these reactions. We also determined the functionality of modelB, even though the compartmentalized version cannot produce biomass. The compartmentalized and non-compartmentalized models created via approach B passed 50% and 85% of the validation tests respectively.

Properties of iMM1415

The resulting compartmentalized mouse metabolic reconstruction (Additional file 5), created by curation of modelA, termed iMM1415, contains a total of 3,724 reactions and 1,415 metabolic genes (Additional file 6 contains detailed description of all reactions in iMM1415, including the biomass reaction). Analysis of the reactions removed from Recon 1 during the creation of iMM1415 revealed that the greatest absolute number of non-included reactions in the mouse reconstruction were within steroid metabolism (16 out of 46 reactions) and blood group biosynthesis (14 out of 46 reactions). However, the largest percentage of non-included reactions was within the subsystems of stilbene, coumarine and lignin biosynthesis (50%, 1 out of 2 reactions) and limonene and pinene degradation (50%, 3 out of 6 reactions). This data suggests that simulation results from these metabolism subsystems will be more likely to be inaccurate and points out areas for future improvement of the mouse model. In the remainder of the paper, we will use models derived from iMM1415 for computations.

Comparison of iMM1415 with published mouse metabolic networks

Table 2 provides a comparison between our reconstruction and other published reconstructions, the most detailed to date being the recently published mouse metabolic network by Selvarasu et al[24]. Our model functions in eight cellular compartments compared to three compartments in most reconstructions. It has the most detailed description of mitochondria metabolism to date but fewer reactions in the cytosol. Furthermore iMM1415 contains the first attempt to describe metabolism in the golgi apparatus, lysosome, ribosome, peroxisome and nucleus. Due to the compartmentalization, a much greater number of transport reactions was required in our reconstruction compared to the previous ones. As less biochemical data exists on transport reactions a larger number of non-gene associated reactions are needed in our model than in the other models. Albeit difficult to compare directly due to different definition of minimum growth medium and biomass, our model predicts fewer essential genes but a higher percentage of them have been experimentally verified than in the other models published to date (Table 2).

Table 2 Properties of iMM1415 and comparison with existing models Full size table

Essentiality of mouse metabolic genes

Given the high degree of functionality of iMM1415 we decided to employ it for phenotype simulations and to compare the in silico results to published experimental results from knockout mice or mice with mutations in metabolic genes. First, we performed a simulation of single gene knockouts for all genes in iMM1415 in order to determine in silico gene essentiality. A minimal growth medium supplemented with glucose was used for this simulation (as defined in additional file 7). A total of 53 genes were found to be essential as their deletion resulted in zero biomass production (Additional file 8). We found information on homozygous knockout phenotype for 17 of those genes in literature (Table 3). Of those, 14 (82%) genes had a confirmed lethal phenotype, whereas the remaining three (18%) had non-lethal phenotypes (Table 3). The majority of the genes with a predicted and confirmed essentiality status were within the cholesterol metabolism. This result indicates that cholesterol metabolism is especially vulnerable to mutations and environmental insults, perhaps due to the inability of the cholesterol metabolism to overcome disruption in metabolic reactions via alternative pathways.

Table 3 Results on gene essentiality predictions by the finalized mouse model. Full size table

Second, we searched the Mouse Genome Informatics database [27] for mouse reconstruction genes where a phenotype had been described with the word "lethality". Furthermore, we required that these genes were homozygous for a gene knockout and had either an embryonic or prenatal lethal phenotype. A total of 88 genes were identified this way. Five genes overlapped with the in silico essential genes (FDFT1, SPTLC1, PHGDH, HMGCR, CBS, SPTLC2). The observed discrepancy in lethality can be explained by either, i) growth environment simulations, ii) incomplete biomass reaction, iii) missing regulation, iv) wrongly included reactions, or v) any combination of the aforementioned possibilities. Wrongly included reactions could be identified by systematically eliminating non-gene-associated reactions from iMM1415 (e.g., using GrowMatch algorithm [34]) to improve the prediction of lethality. In contrast, the two genes that were essential in silico but in vivo non-essential for growth suggest missing functions in the metabolic network. Using SMILEY, or related algorithms, it might be possible to identify missing candidate genes.

Prediction of normal phenotypes

To further evaluate the predictive potential of iMM1415, we identified genes from the Mouse Genome Informatics database [27], for which a null mutation resulted in a normal phenotype. A total of eight such genes were found that were also present in the mouse reconstruction (OCRL, ACO1, PAFAH1B3, PGM1, FUT9, RHBG, ITPKC, SORD). For five of those, a gene deletion had no effect in the model, since isozymes existed within the reconstruction. For the three remaining genes (PGM1, FUT9, SORD), the deletion led to elimination of corresponding reactions. The effect of these deletions on growth depends on i) the presence of alternative reactions/pathways or ii) importance of reactions for biomass precursor synthesis. We investigated the effect of these three gene deletions on the overall network by performing a FVA. We then compared the FVA results of essential gene knockout (DHCR7) to the FVA results of wild type (Table 4) by considering only major sections of metabolism (Table 4). The largest perturbation resulted from the knockout simulation of the essential gene DHCR7, where seven major subsections of metabolism changed significantly. The knockout simulation of PGM1 led to significant changes within six major subsections of metabolism. The knockout simulation of the FUT9 gene led to significant changes within two major subsections of metabolism and the knockout simulation of SORD led to no significant changes within the major subsections of metabolism. Therefore, we suggest that six out of the eight genes are not likely to have noticeable metabolic phenotypes under our minimal medium condition due to presence of isozymes or alternative reactions/pathways.

Table 4 Results from Flux Variability Analysis (FVA) of 3 knockout models for genes with a confirmed normal phenotype (PGM1, FUT9, SORD) and one essential gene (DHCR7). Full size table

Analysis of lipoprotein lipase deficiency on in silico phenotype

Finally, we sought to simulate knockouts of genes with softer phenotypes. The LPL gene encodes lipoprotein lipase (EC 3.1.1.34) [35], an enzyme that hydrolyzes chylomicrons and very low-density lipoproteins (VLDL) into free fatty acids. Individuals born with lipoprotein lipase deficiency have elevated levels of triglycerides and VLDL and suffer from recurrent episodes of abdominal pain and pancreatitis as well as eruptive xanthomas of the skin [35]. Mutations have been associated with increased risk of ischemic heart disease in man [36]. Mice without lipoprotein lipase are born with greatly elevated levels of triglycerides and VLDL, and after nursing, triglyceride levels soon become extremely high. Heterozygotes for the null mutation of LPL survive until adulthood but with elevated triglyceride levels [37].

We simulated the knockout of the LPL gene and performed FVA to determine the effect of the gene deletion to the network properties. The largest perturbation was observed within i) the glycan metabolism, where 31 reactions had decreased and 312 reactions increased flux capacity (p < 10-16 against even probability of decreased and increased flux capacity); and ii) lipid metabolism, where 100 reactions with decreased and 253 reactions with increased flux capacity (p < 10-15 against even probability of decreased and increased flux capacity). Additionally, the flux capacity changed significantly within the triacylglycerol metabolism (Figure 2). The results suggest that there is increased flux capacity towards triglyceride synthesis while there is decreased flux capacity towards triglyceride degradation. The simulation results are therefore consistent with hypertriglyceridemia resulting from LPL mutations, as previously demonstrated with the mouse knockout model [37–51].