We explored the hypothesis that genomic subsystems of interacting genes are formed during brain development, thus extending the standard analysis previously based on single genes. We looked for correlations between gene‐expression patterns in the developing brain, assuming that they show underlying common molecular processes of gene regulation. Using time series of gene‐expression data from rat (Stead et al . 2006 ) and, to some extent, human brain (Somel et al . 2010 ), we (1) looked for occurrence of correlated groups of co‐expressed genes – coherent gene groups (CGGs) – that differ in their developmental evolution in rat cerebral cortex (CTX), hippocampus (HPC) and hypothalamus (HYP), (2) distinguished successive steps of gene‐expression patterns in CTX evolution that correlate with its anatomical development, postnatal synaptogenesis being considerably prolonged in humans, (3) elucidated the signaling pathways and TFs controlling CGGs assumed to behave as functionally coding units with distinct patterns, in particular those that occur at birth and postnatally and are controlled by the current activity of the neural network and thus (4) uncovered nested hierarchical interactions between TFs during brain development that possibly account for the distinct temporal evolution of CGGs in major brain disorders such as ASDs and schizophrenia. Building on these findings, we aim to develop a novel strategy of drug design to treat these disorders based on a hierarchical network of TFs, paving the way to new pharmacological tools: TF orthosteric and allosteric ligands administered at relevant sensitive stages of brain development in genetically predisposed patients.

Yet applying this paradigm to brain genes and relevant pathological phenotypes is not straightforward. The major difficulty resides in the hundreds of genetic determinants that predispose to brain disorders, such as autism‐spectrum disorders (ASDs) (Girirajan et al . 2011 ; Sebat et al . 2009 ) and schizophrenia (Bayès et al . 2011 ; Kirov et al . 2012 ). The lack of a simple relationship between gene sets and how brain connectivity becomes established requires, in our opinion, a deeper analysis of overall gene expression in brain development.

Attempts to model the relationship between genotype and phenotype in multicellular organisms are fundamentally based on extension of the bacterial operon scheme (Jacob & Monod 1961 ) to eukaryotic cell differentiation and embryonic development (Britten & Davidson 1969 ; Davidson 2010 ; Driever & Nüsslein‐Volhard 1998 ; Jiang & Levine 1993 ; Kerszberg & Changeux 1994a,b , 1998 ; Li E & Davidson 2009 ; Monod & Jacob 1961 ; Wolpert 1969 ). According to this scheme, transcription factors (TFs) serve as diffusible signaling allosteric proteins, binding to specific DNA elements in the promoter regions and triggering (or inhibiting) in cis the transcription of adjacent genes by DNA–RNA polymerase (Mannervik et al . 1999 ), depending on the organization of their complex with RNA polymerase and DNA (Liu et al . 2010 ), as with other TFs. Moreover, ligands that allosterically regulate TF conformations (or prevent their proper organization) can affect their function. Transcription factors may, in addition, control the transcription of their own structural gene (Noll 1993 ; Thayer et al . 1989 ; Walther & Grüss 1991 ), generating autocatalytic feedback loops, which may serve as gene switches in embryonic development (Kerszberg and Changeux 1994a,b , 1998 ). TFs also control other genes, including TFs genes, thus generating hierarchical trees of gene‐expression patterns in the course of development (Davidson 2010 ; Kerszberg and Changeux 1998 ; Larson et al . 2010 ).

Several difficulties impede our understanding of the relationships between the organization of the genome and the neural phenotype of the brain: (1) The human genome has only 22 000 structural genes and (2) the differences in sequences of mouse, rat, monkey, chimpanzee and human genomes are so small that genome evolution, as opposed to brain evolution, appears strikingly non‐linear ( Changeux 1983, 2004 ; Ebersberger et al . 2002 ; Jiang Z et al . 2007 ; Konopka et al . 2012 ; Liu et al . 2012 ). Plausible explanations for such non‐linear evolution include (1) the combinatorial co‐expression of genetic determinants creating networks of interacting genes accounting for normal brain development (Johnson et al . 2011 ; Kang et al . 2011 ) and its pathologies (Ben‐David and Shifman 2012 ; Torkamani et al . 2010 ), (2) the length of postnatal development (13–15 years in Homo sapiens ), accompanied by major increase of brain weight (about fivefold before adulthood) (Lagercrantz et al . 2010 ), (3) the selective stabilization and elimination (pruning) of synapses under the control of evoked or spontaneous activity elicited by interactions with the physical, biological, social and cultural environment (Benoit & Changeux 1975 , 1978 ; Changeux 2004 ; Changeux & Danchin 1976 ; Changeux et al . 1973 ; Luo & O'Leary 2005 ; Purves & Lichtman 1980 ; Stretavan et al . 1988 ); and eventually, (4) chromatin epigenesis (Lagercrantz 2012 ). Concomitantly, the development of brain transcriptome follows definite sequential patterns in the course of pre‐ and postnatal development (Bayès et al . 2011 ; Fu et al . 2011 ; Kang et al . 2011 ; Liu et al . 2012 ; Somel et al . 2009 ; Stead et al . 2006 ) that need further study.

Materials and methods

The strategy used applies the microarray data analysis and clustering with self‐organized maps (SOMs) consisted of the following steps: (A) initial processing of microarray data and first‐level SOM clustering (which continues by the three branches), (B) determination of gene zones associated with significant developmental changes and brain disorders, (C) analysis of stages of development and quick comparison between rat and human brain development and (D) pathway analysis, (E) functional and hierarchical network analysis, (F) elimination of genes associated with brain disorders – schizophrenia and ASD; and (G) directions to the future drug design. A flowchart of methods used is shown in Fig. S1, Supporting Information.

Step A included (1) calculation of mean from replicates, (2) generation of time series, (3) selecting set of genes by significance and (4) clustering genes into the set of SOMs, which we call the first‐level SOMs to distinguish them from the two following SOM clusterings using Gene Expression Dynamic Inspector (gedi) program (Eichler et al. 2003; Guo et al. 2006). In Step B we selected the top of differentially expressed genes and mapped them back into the first‐level SOMs, forming the gene zones that showed expression‐profile similarities. Step C contained two independent parallel processes: SOM informational entropy calculation and orthogonal second‐level SOM generation that resulted in obtaining maps each representing clusters of developmental days, or ‘coherent‐day groups’(CDG), for the each of 650 CGGs. Both of these parameters – informational entropy and resulting ‘density’ map (the gene density map displays the number of genes assigned to every particular CGG. In this case it is day‐of‐development density map and displays the number of days assigned to every particular CDG) – helped elucidating stages of brain development and were used along with obtained on Step D signaling pathways and biological processes in hierarchical network analysis represented by Step E. In Step D we elucidated the signaling pathways, grouping genes by SOM method in 30 CGGs and creating corresponding gene networks using Ingenuity Pathway Analysis (IPA®) program (Ingenuity Systems Inc., Redwood City, CA, USA), which assigned the conventional signaling pathways (CSPs) for each of 30 CGGs. Analysis of obtained pathways with IPA led to listing biological processes inherent in each of them. Step E was involved in multilevel hierarchical network clustering using BiologicalNetworks (Baitaluk et al. 2006b; Kozhenkov et al. 2010) – and data obtained by Steps B–D. Analysis of hierarchical networks showed gene–TF interaction on different levels. Results of Step E allowed analyzing specific gene–TF interactions, for example, for neurological disorders, expanded the former along the length of time series. This was implemented in Step F. Finally, Step G showed further directions for drug design based on analysis of hierarchical networks.

Sample preparation, microarray and data analysis Sample preparation and preliminary filtering and normalization were provided by J. D. H. Stead et al. (Carlton University, Ottawa, Canada) as described in the article (Stead et al. 2006). The samples were taken from three different regions of the brains of Sprague Dawley® rats from the Charles River Laboratories (Wilmington, MA, USA) – CTX, HPC and HYP – on different days of development. Each region included several time points: CTX – 11 (from prenatal day E16 to postnatal day P90), HPC – 7 (postnatal time points from day P01 to P90) and HYP – 9 (from prenatal day E18 to postnatal day P90). Six samples were taken for each time point. Affymetrix GeneChip® Array RAE230A (Santa Clara, CA, USA) was used. Each probe set originally included 10 063 genes that were subjected to statistical analysis (Dai et al. 2005; Stead et al. 2006). The microarray was validated by Stead et al. (2006) , using reverse transcription polymerase chain reaction of 33 genes. We calculated the average for each probe from the six sample replicas of each time point. The rat microarray data was obtained from J. D. H. Stead (J. D. H. Stead, personal communication). In the original article (Stead et al. 2006) authors filtered the initial set of around 10 000 genes using levels of detection for all six biological replicates. This filtration brought 6653 genes that undergone two‐way analysis of variance (anova). Total 87.8% of these genes (5844) showed significant differences in expression between brain regions, and 97.2% (6470) genes showed significant differences in expression over time. We deleted expressed sequence tag genes with 4819 genes remaining, and then the averaged expression values for each time point were normalized along the time series separately for each probe for each region of the brain. This operation allowed for comparing gene expression within different expression ranges; so that genes with low expressions would not be discriminated. The human microarray data (Somel et al. 2010) was obtained from the NCBI GEO Datasets, ID GSE 18069 and subjected to the processing described above.

Self‐organized map generation For clustering microarray data we selected the method of SOMs (Kohonen 2000) because it is a two‐way clustering method that allows clustering of both genes and samples (i.e. time points), taking into account the relationship between both gene and sample clusters. SOMs allow visualization, multidimensional scaling and data reduction from thousands of genes to hundreds of gene groups [we used implementation of GEDI (http://www.childrenshospital.org/research/ingber/GEDI/gedihome.htm) (Eichler et al. 2003; Guo et al. 2006)], while clustering performance is comparable with other clustering techniques (Tsigelny et al. 2008). To generate SOMs one needs to set several parameters. First of all it is a number of clusters that is represented as a two‐dimensional grid map and number of training iterations. Also other parameters such as neighborhood, learning factor and distance metrics, should be assigned. GEDI program projects genes with the same expression profile along time points in the same location of the map, creating ‘coherent‐gene group’ (CGG) and placing genes with minimal distance metrics in neighborhood vicinity (Eichler et al. 2003) (Fig. S2, left). Moreover, GEDI calculates expression centroid for each CGG and assigns a color according to a color spectrum of gene expression. Derived heatmap‐like image was used to calculate informational entropy for each time point. Using a 26 × 25 map and Euclidean distance metric, we generated 650 CGGs for each day of development.

Elucidation of zones with highly‐expressed genes Differentially expressed genes within each brain region were derived by calculating expression fold changes (the ratios between the minimal and maximal expressions along the time series) for each probe along all time points. Because not every brain region contained the same time points, we examined only seven time points for postnatal period; 1086 genes with a fold change ≥2.5 were considered differentially expressed and used for further analysis. Such a cutoff value is used often in microarray study (Avasarala et al. 2008; Li et al. 2007; Sarkar et al. 2008). These 1086 genes were then mapped into the SOM, where they fit to 138 (out of 650 totally) CGGs that were grouped in several zones (outlined in white in Figs 1 and S3). We considered CGG ‘active’ if it contained 50% and more of genes having 2.5‐fold change of expression. This percent had been chosen after analysis of the dependency of the average number of genes from percentage of active genes in a CGG. The inflexion point of the curve was found if 50% of CGG's genes were active (see upper right chart of Fig. S3).We conducted additional two‐way anova analysis of the initial gene set using the program MeV (Saeed et al. 2006). In the zones selected (including both the genes with 2.5‐fold change and other genes) 97.6% of genes for zone 1 and 98.3% for zone 2 (that were used in further analysis) were considered significant by anova analysis. The mapped SOMs for CTX on day P01 are shown in Fig. S3. Distribution of differentially expressed CGGs and genes along the time series in the rat cortex is presented in Tables S1 and S2. Figure 1 Open in figure viewer PowerPoint Self‐organized maps ( SOMs) of gene expression: (a) during postnatal development of rat brain for CTX, HPC and HYP. The method of SOMs identifies from microarray data the genes exhibiting correlated expression profiles referred to as ‘CGG’. Each pixel of the SOM represents a group of genes having correlated profile of expression in time. A color code (dark‐blue to dark‐red) denotes the centroid (average) of expression values for the entire CGG at this pixel. The program further distributes them on two dimensional maps (26 × 25 grid), positioning closer to each other the groups with the gene‐expression profiles with the highest similarity. Outlined by white lines are the CGGs in which the activity of more than 50% of the genes change their expression more than 2.5 folds during development. It is clearly seen that these CGGs create several connected areas that we called ‘zones’ enumerated with figures near them. Such display visualizes the CGGs with the most profound change in expression during brain development. (b) Higher‐level ‘second‐order’ analysis of SOMs where the relationships of rat CTX, HPC and HYP transcriptome during development are represented as a single square map. A single building block (pixel) gives these relationships for each day of development. Close‐up of the pixel [1,3] at the left explains the contents of each of the pixels, including the entire expression profile for the specified region of the brain during development. At late prenatal stages HYP data are close to CTX data but segregate from CTX and HPC at early postnatal stages. Between days P7 and P14 the transcriptome of CTX and HPC evolve in parallel, in distinction to HYP. (c) The same analysis for the development of human CTX. The data at postnatal days P004 and P034 are close to each other as well as those for postnatal days P094, P204, P443 and P787, while data of the day P002 (which is close to birth) are distinct.

Informational entropy calculations To estimate the complexity of brain growth and organization during development, informational entropy was used, which value is reciprocal to each system organization level (Kouznetsova & Rakov 1987). From the obtained set of SOMs, image histograms were generated for each time point (Davies 2005; Kapur et al. 1985). On the image histograms the horizontal axis represents the intensity variations of the image and the vertical axis represents the number of pixels captured in each intensity zone. The transcription intensities of gene groups were calculated as centroids that are the multivariate equivalent of the mean intensities. There were 11 histograms for the CTX, 7 for the HPC and 9 for the HYP. The histogram for the SOM of day P07 is shown in the top left corner of Fig. S2. The two peaks of intensity in two groups appeared in this histogram: 47 units collecting 81 pixels and 170 units collecting 35 pixels of the total 650. p i = n/N, where n is the number of intensities that fell in the i‐th sampling and N is the total number of intensities. These probabilities were used in the Tsallis entropy calculation (eqn 1 below) (Tsallis 1988 (1) p denotes the probability distribution of the SOM and q is a parameter that measures non‐extensivity. This definition is useful in distinguishing developmental changes related to the general growth of the system from those related to its self‐organization. When a parameter is ‘extensive’, it is dependent on the size of the system (e.g. the mass of developing brain); when a parameter is ‘non‐extensive’, it is not dependent on the size of the system (e.g. brain‐tissue differentiation and organization). The developing brain possesses both properties. It is constantly growing in size (extensivity) and simultaneously developing functionally important structures, including anatomical changes in shape, cell‐type diversification and synapse formation (non‐extensivity). The maximum gene‐expression centroid value for CGGs in the SOM dataset was 10.85 and the minimum 4.91, a span of 5.94. The span was divided by 0.5 intervals, providing 13 samplings, and plotted across the number of intensities for each sampling. The 0.5 intervals for the sampling were chosen to simplify the calculations; further study indicated no significant difference in entropy profiles between 13 and the higher‐value samplings, and explained the image patterns. From these histograms, probability was calculated for each sampling using the formula, whereis the number of intensities that fell in the‐th sampling andis the total number of intensities. These probabilities were used in the Tsallis entropy calculation (eqn 1 below) (Tsallis).In the equation,denotes the probability distribution of the SOM andis a parameter that measures non‐extensivity. This definition is useful in distinguishing developmental changes related to the general growth of the system from those related to its self‐organization. When a parameter is ‘extensive’, it is dependent on the size of the system (e.g. the mass of developing brain); when a parameter is ‘non‐extensive’, it is not dependent on the size of the system (e.g. brain‐tissue differentiation and organization). The developing brain possesses both properties. It is constantly growing in size (extensivity) and simultaneously developing functionally important structures, including anatomical changes in shape, cell‐type diversification and synapse formation (non‐extensivity). Tsallis entropy was chosen because of its suitability for calculation of non‐extensive or mixed entropy. Parameter q is thought to quantify the degree of departure from extensivity: If q = 0, the system is non‐extensive, whereas if q = 1, the system is extensive, and its entropy is equivalent to extensive entropies such as Boltzmann–Gibbs or Shannon entropy. An entropic index of q = 0.25 was selected because lower the q lesser the extensivity it has and less the system entropy depends on its size; the change in entropy therefore corresponds to the change in morphology and other function‐related characteristics of the brain but not to the increased physical dimensions. Tsallis entropy was calculated separately for each set of gene‐expression data for the rat CTX, HPC and HYP as described above to create the profile chart.

Signaling pathways analysis The sets of genes over‐expressed during each day of development (extracted from 1 of 11 6 × 5 SOMs generated by GEDI for the rat CTX – Fig. S5) were submitted to the IPA® program (Ingenuity Systems Inc.), which assigned the conventional signaling pathways (CSP) for each CGG. The criterion for selection of a specific pathway as corresponding to the genes on a given day of development was the P value threshold, so that the selected set of over‐expressed genes from this day was not randomly included in the existing scheme of the CSP.