Space shaping networks: the polynomial volume law

Networks are in many cases shaped by space: the edges of the network, which represent thematic information, relate in some way to distance in space. If this relation between network and space is strong, both expose similar characteristics. In this case, the network is called to be spatial. We can accordingly expect the size of a neighbourhood in the network to depend on its ‘radius’ in a similar way than the volume of a ball does in Euclidean space. Next, we concretize the concept of a neighbourhood in a network, which is called ball in this case. The ball B n (r) of radius r centred at a node n is defined as the nodes within distance r of the node n, i.e., as the nodes that can be reached by traversing at most r edges or, in a weighted network, as the nodes that can be reached by traversing edges of total weight not more than r. The volume |B n (r)|, in turn, is defined as the number of nodes contained in B n (r). Now assume a network that is embedded in an Euclidean space. The volume of a ball in Euclidean space scales as rd, where r denotes the radius and d the dimension of the embedding space. The volume of a ball in the network can be expected to scale in a similar way, in case of the edges of the network being related to the distance between the nodes. In fact, many real-world networks statistically expose the polynomial volume law (Fig. 1a–p):

$$|{B}_{n}(r)|=1+k\cdot {r}^{d}$$ (1)

where k and d are some positive real numbers. The left side of the polynomial volume law (Equation 1) refers to the volume in the network, while the right side refers to the volume of a ball in Euclidean space incremented by 1, reflecting that the ball of radius 0 in the network contains exactly one node. This law has been discussed previously by Song et al.13 and Shanker14 but has to my knowledge never been examined in detail with respect to hierarchies inside the network.

Figure 1 Volumes of real-world networks. Depicted are the arithmetic means of the volumes for 10,000 randomly chosen nodes. The data (dots) in the white part of the plot are fitted by the polynomial volume law (lines) in case of a–p, and by an exponential volume law in case of q–s. The volume is restricted by the total number of nodes, which is represented by a horizontal line. The radius is provided in minutes (a–f) or by the number of traversed edges (g–u). In case of polynomial fits, the estimated dimension including the standard deviation is given. (a–c) Volumes in the bus network of Manhattan, NYC3 (with logarithmic and linear scaled axis), and the Metro North Railroad in NY3. These transport networks are defined by stops as nodes, and pairs of successive stops as edges with the travel time as weights. (d, e) Corresponding transport networks. (f) Volumes in the subway network of NYC3. (g–i) Volumes in the road network of California, Pennsylvania, and Texas4. (j–o) Volumes in a brain network31,40, the metabolic network of Caenorhabditis elegans33,34, the Youtube social network4, the email communication network from Enron4, the collaboration network of the Arxiv astro physics category4, and a network of social circles in Facebook4. (p) Volumes in a Barabási-Albert model6 with 20,000 nodes. (q–t) Volumes in a Erdős-Rényi model35,36 with 10,000 nodes and a probability of 6.495 · 10−4, the Gnutella peer-to-peer computer network4, and the Amazon product co-purchasing network4. The data is fitted by an exponential volume law. (u) Volumes in the web graph from a Google programming contest4, and the Stanford web graph4. The data do neither follow a polynomial nor an exponential volume law. Full size image

When statistically fitting the volume of a ball for different radii r in a network by Equation 1, the parameter d can, in contrast to Euclidean space, be a non-integer. This is in particular the case if the network does not equally ‘extend’ to every dimension of the embedding space. While a meaningful embedding in space is able to explain why many networks expose the polynomial volume law, such an embedding is not needed to compute the volumes |B n (r)| for different nodes n and different radii r, and, in turn, for determining whether a network follows the law and which real number d – called the dimension of the network – fits best.

Comparison to the fractal dimension

The dimension derived by the polynomial volume law is in many aspects similar to other approaches that relate a network to the dimension of the space it is embedded in. Most notable, the box counting dimension, also called Minkowski-Bouligand dimension or fractal dimension, establishes a relation between the embedding space and the network by comparing the complexity at different scales13,15. Thereby, space is tessellated with a grid of boxes and the number of boxes containing at least one node (or alternatively, the number of boxes intersecting at least one edge of the network) is determined. As a result, one is able to conclude the dimension by the relation between the number of such boxes and their side lengths. The box counting dimension has been discussed in various articles, among others, in respect to self similarity in networks16,17,18. Efficient algorithms for the computation of this dimension have been published19. A comparison of such algorithms has been provided by Song et al.20. Even the idea of the box counting dimension has been subject to advancements17,21. The box counting dimension has been discussed in various contexts, among others, in the geographical context22,23.

Approaches similar to the dimension defined by the polynomial volume law have been discussed in literature. For instance, Daqing et al.24 have considered the average Euclidean distance in space E n (r) from a centre node n to all nodes inside a ball B n (r), i.e., to all nodes that can be reached by traversing at most r edges of the network. This average Euclidean distance has been compared to the volume of the ball in the network, as defined previously24. Thereby, a number referred to as the dimension is assigned to the network, much similar as in case of the polynomial volume law. The comparison of the volume of a ball B n (r) to the average distance E n (r) instead as to 1 + k · rd has two major consequences. First, the average distance E n (r) explicitly includes the concept of Euclidean distance, which presumes the network to be explicitly embedded in an Euclidean space. The comparison to 1 + k · rd can though also be performed for an abstract network, without any knowledge about the potential location of a node. Secondly, the comparison of the volume of a ball B n (r) to the average distance E n (r) examines how topological and Euclidean aspects of the very network relate, while the comparison to 1 + k · rd how the topological aspects of the network relate to the universal polynomial law that describes the Euclidean volume of a ball in general. In short, the considerations of Daqing et al.24 include an explicit Euclidean embedding of the network, while the polynomial volume law 1 + k · rd only compares to Euclidean spaces in general.

Further approaches exist to characterize networks by their dimension. Daqing et al.24 examine the root mean square displacement by a random walk. Song et al.13 have pointed out that the different estimations of the dimension of space do not coincide in some cases, e.g., in case of small-world networks.

Figure 2 compares different types of network dimensions for two real-world networks, the Bus network of Manhattan and the Metro North Railroad in NY. The figure shows the dimensions resulting from the polynomial volume law in four variants. First, the volumes by the distance in the network are determined by the distance in an unweighted network. Secondly, the distance in Euclidean space between two adjacent nodes is used as weight, and the volumes are computed for the weighted network. Thirdly, the distances between adjacent nodes is computed in the embedded network, i.e., the weights correspond to the distance a bus or train needs to travel. Fourthly, travel times are used as weights. In addition to these dimensions resulting from the polynomial volume law, the box counting dimension is computed by counting the boxes that contain a node of the network, or by counting the boxes that intersect an edge of the network.

Figure 2 Different concepts of network dimension. (a–d, f–i, k–n, p–s) Estimation of the dimension by the volume in the unweighted network, or in the weighted network considering distance in space, distance in network, or travel time respectively. For each computation of a dimension, the arithmetic means of the volumes for 10,000 randomly chosen nodes have been examined. The data (dots) in the white part of the plot are fitted by the polynomial volume law (lines). The volume is restricted by the total number of nodes, which is represented by a horizontal line. The estimated dimension includes information about the standard deviation. (e, j, o, t) Estimations of the dimension by the box counting method. For each computation of a dimension, the average of 500 grids of boxes randomly translated in space has been examined. The data (dots) in the white part of the plot are fitted by a double logarithmic law (lines). The boxes are, in case of (e, o) restricted by the total number of nodes, which is represented by a horizontal line. The estimated dimension includes information about the standard deviation. Full size image

The different concepts of dimension result in different values, as can be seen in Fig. 2. The estimated dimensions by the volumes in the weighted network are very similar, both in the example of the bus network as well as of the railroad network. In the unweighted network however, the estimation of the dimension is higher and is subject to a large standard deviation. In the case of the Metro North Railroad, a fit can hardly be made because the effective diameter of the network is small, which is reflected by the small range of the fit. The box counting dimension provides lower estimations when referring to nodes compared to when referring to edges. Both, variants of the box counting dimension provide lower estimates than the polynomial volume law, which is much likely an artefact of the dimensions to reflect different concepts: the box counting dimension compares complexity at different scales while the polynomial volume law carries over concepts from Euclidean space to the network. Despite of this difference, the box counting dimension is higher in case of the Bus network compared to the Metro North Railroad, which is consistent with the polynomial volume law.

Local and global optimization principles

Many generation principles are known to guide the emergence of networks. Among them are principles that avoid edges between distant nodes in space, leading to a large diameter of the network, as well as principles that minimize the average distance between the nodes of the network and thus lead to small-world networks. In the following, we discuss factors that lead to these principles and how they relate.

The polynomial volume law is often the result of a local optimization principle: assuming that the costs of an edge depend on its length, how can a node be adjacent to as many nodes of the network as possible? This principle is of local nature because it can be answered independently for each node. In the resulting network, a node is obviously adjacent to the nodes of its neighbourhood in space while being non-adjacent to more distant nodes. This local optimization principle has been resembled by different models. An approach is to introduce edges with a probability that depends on the distance of the nodes in space, e.g., with a probability of P(l) = α exp(−l/l 0 ) with positive values α and l 0 25, or with a probability of P(l) = 1 if l < l 0 and P(l) = 0 otherwise26. Another model, which we refer to as the spatial network model or Mocnik model, has been proposed by Mocnik27,28. Assume a number of nodes being embedded in space. We then introduce a directed edge (n 1 , n 2 ) if and only if

$${\rm{dist}}({n}_{1},{n}_{2})\le \rho \cdot \mathop{{\rm{\min }}}\limits_{m

e {n}_{1}}\,{\rm{dist}}({n}_{1},m)$$ (2)

where dist denotes the Euclidean distance and ρ > 1 a parameter that influences the density of the network (Fig. 3a). The model prototypically resembles Tobler’s first law of geography: ‘everything is related to everything else, but near things are more related than distant things’9,10,11. Despite this, the model applies to other scales than the geographical scale as well.

Figure 3 Hierarchical spatial networks. In (b–c) and (e–f) the arithmetic means of the volumes for 10,000 arbitrary chosen nodes are depicted. The estimated dimension including the standard deviation is given. (a) Mocnik model with 13 nodes and ρ = 1.5. (b) Volumes of the undirected network associated to a Mocnik model with 10,000 nodes in two-dimensional space and ρ = 1.8. (c) Volumes in the public transport network of Sweden41, which is a multi-modal and hierarchical network. The data is fitted by the polynomial volume law. (d) Hierarchical Mocnik model, with the base layer depicted in grey and layer 1, in black. (e) Volumes of the undirected network associated to two-dimensional hierarchical Mocnik models with ρ = 1.8. (In fact, the value of ρ is below 2 for most real-world networks.) The following hierarchies are depicted: no hierarchy (N 0 = 10000), flat hierarchy (N 0 = 10000, N 1 = 1000), steep hierarchy (N 0 = 10000, N 1 = 100), and two-layered hierarchy (N 0 = 10000, N 1 = 1000, N 2 = 100). (f) Volumes of the undirected network associated to two-dimensional weighted hierarchical Mocnik models with ρ = 1.8. The following hierarchies are depicted: no hierarchy (N 0 = 10000; w 0 = 1), flat hierarchy (N 0 = 10000, N 1 = 3000; w 0 = 1, w 1 = 0.375), steep hierarchy (N 0 = 10000, N 1 = 100; w 0 = 1, w 1 = 0.25), and two-layered hierarchy (N 0 = 10000, N 1 = 3000, N 2 = 100; w 0 = 1, w 1 = 0.375, w 2 = 0.25). Full size image

The Mocnik model follows the polynomial volume law. When the nodes are randomly distributed in space with a uniform distribution, the edges introduced by the model reflect properties of space, e.g., the existence of proximity. As a consequence, the number of edges is expected to be linear in the number of nodes27; and the dimension of space has an impact on the configuration of the edges. In fact, the volume of the undirected network associated to this model follows the polynomial volume law (Fig. 3b), in which the exponent d resembles the dimension of space. The parameter ρ determines the density of the network, i.e., the ratio of the number of actual edges to the maximal number of edges in a simple network. Thereby, ρ has an impact on the configuration of edges as well, but in a model with an infinite number of nodes ρ does not influence the exponent d when fitting by the polynomial volume law. Even in middle and large size networks, the influence of ρ does practically not mask the impact of the dimension28. The Mocnik model – a network embedded in space with only short-distance edges – can thus serve as an explanation of the polynomial volume law by local optimization.

In contrast to the local optimization principle that maximizes the number of adjacent nodes, global optimization principles often play a major role: assuming that a network shall only contain a limited number of edges, how can the average distance between pairs of nodes statistically be minimized for the entire network? This principle does not refer to some nodes only rather than to the entire network. If a network complies with this optimization principle, most shortest paths between two randomly chosen nodes are, in fact, very short, but single nodes may suffer from a longer distance to large parts of the network. Among the models that create such small-world networks are the Watts-Strogatz model2 and the Barabási-Albert model6.

Real-world networks are often organized by both local and global optimization principles. Local optimization principles naturally occur when the costs of an edge positively correlate to its length, which is the case for physical networks (road and railway networks, etc.) but also for many types of communication networks (network of postal delivery services, the telephone network, etc.). Global optimization, in contrast, often minimizes the average length of shortest paths by introducing edges between distant nodes in space. Such global principles naturally occur for networks that are, at least in large parts, of virtual nature, e.g., to friend networks in social media. Most networks are though guided by a combination of local and global optimization to achieve a balance between costs and the length of shortest paths in the network. In the remainder of the article, we explore the interaction between local and global optimization and discuss its effect on the polynomial volume law.

A model of hierarchical spatial networks

The Mocnik model27 is guided by a local optimization principle, as becomes apparent by Equation 2. In order to study the interaction between local and global optimization principles, we extend the Mocnik model in the following to a hierarchical Mocnik model. Thereby, the hierarchical model aims at including a global optimization principle by introducing different layers in the network. This hierarchy is, as we show later, to some extent compatible with local optimization principles. If the layers of the hierarchy share nodes, i.e., if they are connected, shortest paths in the network become shorter in comparison to the non-hierarchical model, because shortest paths often traverse higher layers of the hierarchy, which are more efficient in bridging space.

Hierarchies and the principle of layered networks can be found in many transport networks. For instance, many road networks expose layers: motorways, primary, secondary and tertiary roads, residential roads, etc. Railway networks often consist of long-distance and of local trains, the former which usually have less stops and are much faster than the latter. The shortest route in a railway network incorporates thus often a local train to a larger station, then long-distance trains, and potentially another local train. The universal nature of this principle has been widely recognized, and important routing algorithms take thus advantage of hierarchies inside the data29.

The hierarchical Mocnik model makes use of the non-hierarchical model in every layer of the hierarchy. Assume a number of node sets \({N}_{l}\subset {N}_{l-1}\subset \ldots \subset {N}_{0}\) to be embedded in space, which each correspond to one layer of the network. Then, for each layer consisting of nodes N i , edges E i are created in accordance to Equation 2. The nodes N i of a layer together with the corresponding edges E i are, accordingly, a Mocnik model. The lowest layer N 0 will in the following be referred to as the base layer of the network.

Local and global optimization coexist in the hierarchical Mocnik model. The base layer is guided by local optimization in the same way as the non-hierarchical Mocnik model: in each neighbourhood, the constellation of edges is optimized for a high number of adjacent nodes. In addition to the non-hierarchical model, the layers of the hierarchical variant expose different degrees of local and global optimization. The less nodes a layer contains, i.e., the higher the layer in the hierarchy, the more global the optimization becomes. The optimization in a higher layer of the hierarchy only involves some nodes of the network while ignoring many other ones, which means that the optimization is not any longer performed in spatial neighbourhoods.

Even a weighted variant of the hierarchical Mocnik model can be introduced, in which the edges are complemented by weights. Thereby, the weight of an edge corresponds to its lengths in Euclidean space. The introduction of a new layer to an existing Mocnik model makes shortest paths potentially shorter even in case of the weighted model. While the nodes stay untouched, new edges are introduced in each layer of the network but none is removed. Accordingly, some nodes are directly connected in some layer E i while the shortest path in E j with j < i is potentially longer – triangle inequality applies.

The weights of the weighted variant of the hierarchical Mocnik model can even be systematically adjusted in respect to the hierarchy. The weights w i for each layer of the hierarchy reflect that the layers are of different speed, require different communication costs, etc. In the weighted hierarchical Mocnik model, the weight of an edge in layer i is defined as the length of the edge in Euclidean space, multiplied by w i . If all w i are equal to 1, the weights are, accordingly, equal to the length of the edges. The resulting network usually consists of many edges with low weights in the base layer, and only some edges with slightly greater weights in higher layers (Fig. 3d). Such a weighted hierarchical model is very similar to transport networks, in which local transport connects adjacent places, and more distant places are connected by motorways or long-distance trains operating at a higher speed.

Synthesis of local and global optimization principles

The hierarchical Mocnik model is characterized by local optimization in each layer and global optimization by hierarchies. In case of one layer only, the model is prototypically characterized by local optimization. In case of several layers there exist shortcuts, which expose characteristics of global optimization and lead to small-world properties in the network. Here, we examine the impact of coexisting local and global optimization principles on the polynomial volume law.

The volume of a non-weighted hierarchical network is always larger than the volume in the base layer of a network. In fact, the volume increases when a layer with 3000 nodes (flat hierarchy) is introduced, or when a layer with 100 nodes (steep hierarchy) is introduced on top of the base layer in the example of Fig. 3e. For smaller radii, the volume is larger for flatter hierarchies, because more nodes are adjacent to the higher layer and shortest paths between two nodes of the same small spatial neighbourhood more often traverse nodes of a higher layer. At larger radii, the increase in volume is, though, larger for steeper hierarchies, because the shortcuts introduced by the hierarchy are more efficient. If several layers are added on top of each other, the increase of volume at smaller radii is guided by the lower layers of the hierarchy, and the increase of volume at larger radii is guided by the higher layers of the hierarchy.

While the influence of the hierarchies are obvious in case that the base layer of the network is known, such comparisons can hardy be drawn in general. Instead, we may ask how the hierarchies affect the measured volumes in comparison to the fit (to the polynomial volume law), because the difference between the fit and the actual data can be examined without any knowledge about prevailing layers. In fact, the fit underestimates the volume and overestimates the exponent d, the dimension, in different ways. For a steep hierarchy with much less nodes in a higher layer than in the base layer, the fit underestimates the volume at smaller radii (Fig. 3e). This effect is independent of whether there exist additional layers in the hierarchy in case of a non-weighted model, i e., the number of nodes in the highest layer of the hierarchy has a major impact on the underestimation. At the same time, the rate of growth is higher in case of a steep hierarchy for larger radii, leading to higher estimates of the dimension. If the hierarchy is flatter, the estimated dimension is lower than in case of a steep hierarchy but higher than for the base layer alone.

The underestimation of the volume and overestimation of the dimension can also be observed in case of the weighted hierarchical Mocnik model (Fig. 3f). The effect is though less significant because the lengths of the edges is taken into account, and higher layers provide less effective shortcuts than in the non-weighted model. The presence of a layer with more nodes can even obfuscate the effect of a layer with much less nodes in case of the weighted hierarchical Mocnik model. The fact that both kinds of hierarchical Mocnik models follow a polynomial volume law, despite being layered networks with several hierarchies, suggests that the polynomial power law is robust and not necessarily masked by other structures inside the network.