According to the above mentioned methodology we constructed a map of science that visualizes the relationships between journals according to user clickstreams. We first discuss the visual structure of the map, and then attempt to validate the structural features of its underlying clickstream model by comparing the latter to journal centrality rankings and an alternative model of journal relations derived from classification data.

The connections between the journals in the map's rim cross multiple domains. For example, alternative energy (rim, 3PM) connects to pharmaceutical research and chemical engineering, which itself further connects to toxicology studies and biotechnology. Brain research (rim, 6PM) is connected to genetics, biology, animal behavior, and social and personality psychology. Human geography studies connects to geography, plant genetics, and finally agriculture. A number of clusters are well-connected to both the natural science and social science clusters. For example, ecology and biodiversity (5PM) connects the domains of biology (rim, 5PM) and architecture and design (hub, 5PM). Production and manufacturing (12PM) bridge the domains of physics and engineering (rim, 2PM) and economics (hub, 11PM).

To provide a visual frame of reference, we summarize the overall visual appearance of the map of science in Fig. 5 in terms of a wheel metaphor. The wheel's hub consists of a large inner cluster of tightly connected social sciences and humanities journals (white, yellow and gray). Domain classifications for the journals in this cluster include international studies, Asian studies, religion, music, architecture and design, classical studies, archeology, psychology, anthropology, education, philosophy, statistics, sociology, economics, and finance. The wheel's outer rim results from a myriad of connections in between journals in the natural sciences (red, green, blue). In clockwise order, starting at 1PM, the rim contains physics, chemistry, biology, brain research, health care and clinical trials journals. Finally, the wheel's spokes are given by connections in that point from journals in the central hub to the outer rim.

In summary, the connections between journals and small-scale clusters in the network visualization in Fig. 5 are determined by . They are not artifacts of the visualization. However, one can not draw conclusions from the exact, geometrical coordinates of journals and clusters in the map.

Although the positions of journals and clusters relative to each other are shaped by their connections in , their exact geometric coordinates vary depending on the layout algorithm and are thus indeed considered artifacts of the visualization.

The FR algorithm will pull together small-scale clusters of journals that are strongly connected in . The appearance of small-scale journal clusters is thus directly related to the entries of and they are thus not considered artifacts of the visualization.

The journal connections shown in the map are given by , not the FR algorithm. They are thus not artifacts of the visualization.

The FR algorithm can converge on different visualizations of the same network data. We do not claim Fig. 5 is the only or best possible visualization. It was selected because it represents a particularly clear and uncluttered visualization of the connections between journals in , and most importantly, its main structural features were stable across many different iterations of the FR algorithm.

Any interpretation of the visual structure of the map in Fig. 5 will be governed by the following considerations:

Regardless of their use for cross-validating features of the produced map of science, the rankings in Table 3 and Table 4 illustrate the possibility of ranking journals according to various aspects of their centrality in clickstream data. For example, we note that Nature and Science are among the 15 top-ranked journals in both Table 3 and Table 4 . This indicates that they have considerable interdisciplinary appeal as well as high prestige among users. The betweenness centrality and PageRank of PNAS diverge more strongly; PNAS was ranked 2nd in the betweenness centrality ranking, but 24th according to its PageRank. This suggests that PNAS has strong interdisciplinary appeal among users, but a slightly smaller degree of prestige compared to other top 15 journals.

PageRank favors prestigious journals that are well-connected to other well-connected journals. Table 3 list the 15 journals which the highest PageRank values in ; this ranking indeed favors more specialized, prestigious journals, such as Applied Physics Letters, Ecology, Physical Review B and American Anthropologist. The presence of social science and humanities journals in the PageRank ranking, such as American Historical review and Annals of the American Academy of Political and Social Science, indicates their connectedness to other highly ranking journals and subsequently their centrality in .

The PageRank of a journal is calculated by an iterative procedure in which the PageRank of a journal is continuously recalculated as a function of the PageRank of its predecessors in the graph, according to Equation 2. (2)where denotes the PageRank of journal , the number of nodes in , and the out-degree of the predecessor journal . PageRank values converge from a set of random initial values toward a stable ranking after a given number of iterations.

Journals with high betweenness centrality values are those that frequently sit on paths that connect a large number of other journals and journal clusters; they will often be interdisciplinary journals that serve as connectors between various domains. Table 3 lists the 15 journals with highest betweenness centrality; most of these journals are indeed highly inter-disciplinary such as Nature, Science, PNAS, Milbank Quarterly, Behavioral Ecology and Sociobiology. The presence of social science journals, such as Child Development and American Anthropologist, in this ranking confirms their interdisciplinary natures and overlaps with their central position in the map.

The betweenness centrality of a journal is defined as the number of geodesics (shortest paths) in that pass through . Let be the number of weighted shortest paths between journals and in the graph and be the number of those shortest paths that pass through node . The weighted betweenness centrality of node is then given by Equation 1: (1)

To verify this, we calculated the betweenness centrality [25] ( Table 3 ) and PageRank [26] , [27] ( Table 4 ) of all journals in . Each ranking highlights a different interpretation of a particular journal's centrality in .

The map displays a dense, centrally located cluster of social science and humanities journals (hub). The question arises whether the central position of the social sciences and humanities journals is merely an artifact of the visualization, or whether these journals are in fact also central to the network topology of .

Cross-validation of the clickstream model and map to the AAT

The clickstream model represented by matrix expresses the relations between pairs of journals. An inspection of the individual journal relationships in Table 5 may provide an informal sense of the validity of journal relations in . We selected 6 prominent journals, i.e. those with high values, and retrieved the 5 journals with which they have the highest highest probability connection. All journal relations in Table 5 seem highly valid, but this is a subjective observation.

However, we can cross-validate the map's structure, represented by matrix , in a more objective manner by comparing it to an independent set of journal relations as demonstrated by [28]. Assume we create an alternative matrix of journal relations from an independent, yet trusted data source unrelated to our usage data. If 's entries correspond to the structure of , that finding corroborates the validity of the structure of matrix .

To perform such cross-validation two conditions need to be satisfied:

The AAT classification matches these requirements. First, the journal classifications in the AAT are derived from two well-established, commonly used classification schemes, namely Dewey Decimal and JCR classification codes. These were defined independent of our usage data and thus the relationships in . Second, the AAT expresses the classification of journals at various levels of granularity to which the structural features of our map can be compared.

We derived a model of journal relations, represented by matrix , from the AAT as follows. We denote the AAT classification of journal as . Since journal classifications can be retrieved from the AAT at various distances from the root of the taxonomy, we denote the journal classification of journal at root distance as .

For each journal pair we can retrieve the corresponding AAT classification pair . We thus define the match function such that maps each journal pair in to a binary value depending on whether their AAT classifications match at the particular root distance .

We then define the AAT classification match matrix whose entries are given by ; they represent a binary indication of journal relationships according to their AAT classifications. We can generate matrices at any root distance . However, not all branches of the AAT taxonomy are equally represented at . We therefore chose 4 values that provided a consistent range of classification granularities, namely each of which corresponds to an increasingly detailed classification level with 4 being the most specific. The root distances and the number of distinct classifications at that level in the taxonomy are listed in Table 6.

We now formulate the null-hypothesis as follows:

= “Over all non-zero entries of , the magnitude of is not related to the probability that .”

The probability of rejecting increases as decreases, since classifications are being retrieved closer to the AAT root and thus result in increasingly general associations.

We test the stated null-hypothesis by performing a Pearson's analysis (with Yates' continuity correction) on four 2×2 contingency tables constructed over a pairwise comparison of the non-zero entries of and at each .

For each non-zero entry in we thus compare the following two factors for the corresponding journal pair :

Factor 1 is either above or below the median of values, denoted

vs.

Factor 2 is either 0 or 1

where denotes the the set of all non-zero entries in .

If the set of journal connections in are unrelated to those given by their AAT classifications, i.e. if holds, we expect the frequencies in the cells of the 2×2 contingency tables to match those predicted from their sum- and row-totals on the assumption of statistical independence.

However, values were found at all levels, i.e. for , , , and . We can thus reject the null-hypothesis at high levels of confidence for each level, and conclude that the entries of are indeed related to the AAT classifications of the journals thereby corroborating the validity of at least to the degree that the AAT can be considered a valid taxonomy.

Fig. 6 provides summary of the above mentioned procedure.

At level the AAT distinguishes between 4 classifications: natural sciences, social sciences, humanities and interdisciplinary science. The null-hypothesis was rejected at this level indicating a statistically significant relation between journal relations in and the AAT classifications of the corresponding journals. To visually illustrate the overlap between journal relations in and their AAT classifications at , we assigned each journal a color according to its classification. The natural sciences were assigned the color blue, while the social sciences and humanities combined were assigned the color yellow. Since only a small fraction of journals (3%) were classified as inter-disciplinary they were colored gray along with all other journals that could not be classified.

Fig. 7 results from this procedure; it shows the overlap between the AAT subject classifications and the map's layout of journals in the mentioned hub, rim and spokes, confirming that the visual separation of these domain effectively follows their separation according to the AAT subject classification.

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 7. Cross-validating the map of science's layout by retrieving each journal's top-level AAT classification (natural sciences vs. social sciences and humanities). This map colors journals according to whether the AAT classifies them as either social sciences and humanities journals (yellow) vs. natural science journals (blue). Highly connected clusters corresponding to biology and psychology contain a mix of journals classified in either the social and natural sciences. https://doi.org/10.1371/journal.pone.0004803.g007

The map shown in Fig. 7 also shows blue circles connected to journal in the central yellow hub, and yellow circles connected to journals in the blue rim. These discrepancies indicate a divergence between the AAT classification scheme compiled by experts vs. how journals are connected in the map according to , i.e. user clickstreams. For example, the AAT assigns numerous journals in biology, neurology and hydrology to the social sciences and humanities whereas their connections in place them within the cluster of natural sciences (rim, 6PM). Conversely, several journals in clinical pharmacology and statistics are assigned to the natural sciences by the AAT although their connections place them within the cluster of social science and humanities journals (hub, 10PM). Psychology (rim, hub 8PM) is an example of a domain whose connections place it on the intersection of the social sciences and natural sciences. Psychology journals are nearly equally classified within both the natural sciences and the social sciences by the AAT.