In this section of the paper we present the results of our visual and algorithmic analyses. In the first part, we look at the extent to which a visual approach to understanding the ACS commuting and workplace data can help us identify natural “communities” or “regions” of interaction within which journeys to work take place. These provide a visual approximation of labor market areas of the kind discussed in the academic literature for decades [20, 35–38]. Rather than attempt to do this at the scale of the whole United States, we focus our visual heuristic on California and the Minneapolis-St. Paul metro area. The former provides a good example of an apparently polycentric urban network, with multiple centers and a more complex commuting structure, as described previously by Cervero and Wu [39,40]. The latter provides a good example of an apparently monocentric urban commute structure, dominated by a core employment zone in the center, of the kind studied in the past by, inter alia, Bogart and Ferry [41] and Richardson [42]. Both approaches can help us understand more about the underlying economic geography of major metropolitan areas of the United States.

The second part of this section is comprised of the results of our algorithmic approach to assigning census tracts into ‘natural’ communities based on their relational position within the network data set. We show the promising results of such an algorithmic evaluation and the success of the Combo software package in making fine-scale discrimination between the edges of megaregions. We also show some of the difficulties and data outliers produced by such an approach, and suggest some qualifications which must be attached to purely statistical analyses.

Having said this, a visual approach provides a very useful method in and of itself because it allows us to interpret data spatially, quickly identify anomalous connections and possible sources of error (cf. Anselin [ 43 ]). The next section of the paper now focuses on algorithmic community partitioning and allows us to gauge whether the processes of visual cognition (which we can think of here as a ‘manual modularity’ approach) initiated by the flow maps above are reflected in the results of the algorithmic approach. It would appear that we can identify natural economic communities or regions from a visual inspection alone, but for real-world applications such as regional transit planning, where statistical accuracy is required, this is not sufficient.

Before doing so, we present the case of Minneapolis-St. Paul, which represents a rather different case to that of California and the Bay Area. In Fig 4 we have plotted tract to tract commutes centered on Minneapolis-St. Paul, which appears to form a major monocentric employment zone in Minnesota, also extending into western Wisconsin. More distant satellite cities such as St. Cloud to the north west and Rochester to the south east are less strongly connected to this dominant urban employment destination, but it is difficult to know the extent to which they are functionally separate from a visual inspection alone, and this is the point: cognitively, the viewer makes assumptions about the modularity of a network based on visual representations like the commute maps shown here, but this is imprecise and somewhat subjective. The mapping of flows alone can only take us so far if we are interested in knowing more about the underlying network structure of the data.

This initial plotting of commutes is quite useful in that it provides a simple visual depiction of economic linkages and we can begin to understand the spatial structure of commuting in California. If we refine this representation still further, as in Fig 3 where we focus on the San Francisco Bay Area, a more detailed representation of a polycentric urban region emerges. In this representation we display longer, lower volume commutes in darker shades of red and shorter journeys in lighter shades of orange, in order to help the viewer identify the main employment centers. In Fig 3 they include San Francisco, Oakland and Sacramento, but also Stockton, Modesto and Santa Rosa. Nonetheless, it is difficult to determine from this view the extent to which these links are statistically significant and whether this nexus of economic activity constitutes a single functional zone in and of itself. We deal with this issue in the next section of the paper through the algorithmic method.

The simple question we first hope to answer by taking a visual heuristic approach to commute data is whether it is possible to divide geographic space by taking an iterative approach to filtering and visualizing the ACS commuting and workplace dataset. Fig 2 represents a first step towards this objective. We present all journey to work flows of 50 miles (80.5 km) or less which begin or end in California. Immediately, we can see what appear to be a number of separate functional economic zones. For example, we can see a large interconnected urban region spanning from San Luis Obispo on the California coast, extending through Los Angeles and San Diego in the south and Palm Springs in the east. We can also observe another large commuter region which takes in the metropolitan areas of San Francisco, Sacramento, San Jose and Monterey. Additionally, we observe smaller, separate interconnected commuter regions including Eureka and Redding in the North, Fresno and Bakersfield in the Central Valley and El Centro in the south. These patterns do, of course, map onto underlying patterns of population density but they also provide valuable additional information in relation to the connectivity of these areas with each other.

Algorithmic community partitioning

In their algorithmic partitioning of data from telephone calls, Ratti et al. [44] and Sobolevsky et al. [45] found promising results which exhibited a high degree of geographic contiguity. Kallus et al. report similar results using social media interactions [46]. We sought to test whether similarly robust conclusions could be drawn from applying these same partitioning algorithms onto commutes which, unlike telephone calls and social media interactions, are far more closely bound to the physical structure of existing places.

In a trial run of the ACS data set limited to commutes both originating and concluding within the state of Massachusetts, Nelson produced a Combo-generated partition of that state into nine communities [47]. These nine communities were all geographically contiguous, and, moreover, matched closely with both lay interpretations and existing administrative divisions of that state’s regions. This initial test lent credence to the theory that commuter patterns would exhibit an algorithmically-legible grouping into “natural” communities centered on major economic/employment hubs.

In our first run of the national data, Combo produced a partitioning with a modularity score of > 0.9, and we found a large number of strong communities centered on major cities. However, the initial output communities also exhibited a considerable amount of “noise” when evaluated visually. Certain census tracts were assigned into communities that displayed little or no geographic sensibility and were confusingly scattered across the entire United States.

We proceeded through several steps to achieve a more accurate partitioning—“accuracy,” in this case, determined according to an interpretive standard of where geographic clusterings “should” be. First, we corrected a data error wherein the FIPS codes for tracts were being mishandled due to the loss of leading zeroes. Second, we stripped all commutes with origins or destinations in Alaska, Hawaii, and Puerto Rico, under the logic that these areas are not functionally integrated into the mainland United States through commuter behaviors. Third, we stripped the data of all “same-origin” commutes; that is, commutes whose origins and destinations lie in the same census tract. Fourth, we stripped the data of “orphan” nodes, that is, census tracts which are not the origin or destination of any commutes in the ACS data set. Fifth, we experimented with different tolerances of maximum commute length, in order to eliminate “ultra-commutes,” like those which stretch across the entire continent, from the data set. Such commutes may reveal significant economic ties between places (such as New York and Los Angeles) but for the purposes of identifying geographically coherent megaregions they must be excluded. Sixth, we experimented with different ways of assigning connection weight based on the Census’s variables of commute volume and margin of error. Seventh, we experimented with limiting Combo to a maximum number of total output communities.

Each time we iteratively modified these input parameters, we compared the output results with the visual heuristic method of regionalization. Our goal was to minimize the number of output regions which exhibited spatial incoherence. In general, Combo produces the most successful partitioning in areas where nodes are well-linked in the data set to many other nodes, such as is found in major metropolitan regions. Difficulties arise in nodes which are weakly connected to each other or which, due to small populations, have only a small number of commuters traveling to locations which are weakly centralized on major employment hubs. Because these weakly-linked nodes could be assigned into almost any different community with little result on the achieved modularity score, these nodes often caused trouble with geographic coherence, as the algorithm assigned them to far-flung communities which made little sense from an interpretive standpoint. By iterating through various stages of parameterization with the input data, we sought to minimize the scattering effect of these weakly-linked nodes. For a more detailed explanation of the impact of parameter modification on network modularity, Sobolevsky et al. provides a useful additional point of reference [14]. In general, however, reducing the distance parameter to a level which matched realistic commute distances, the results were improved.

After several iterations, we found that the most successful national-level partitioning was produced by a data set which stripped all commutes with a Euclidean distance ≥ 262 kilometers, assigned a connection weight w where w = (estimated commutes)/(margin of error), and limited Combo to 50 output communities. The modularity score of this partitioning was 0.948469: extremely high according to the expected variation of such a reported by Newman and Girvan [32]. Fig 5 shows the results of this computation, with census tracts in the contiguous United States color-coded according to their assigned community.

As is evident in this visualization, Combo was able to divide the contiguous United States into geographically-contiguous regions which are interpretively recognizable as 'megaregions' with major cities at their centers: for example, Greater Chicago (blue in Fig 5), Washington D.C-Baltimore (forest green), Greater Miami (sky blue), Dallas-Fort Worth (teal), or Seattle (goldenrod). This offers strong evidence that commuter patterns really do divide functionally in space according to the clustering of regional labor markets, and that the structure of 'megaregions' can be detected algorithmically.

Importantly, several of these algorithmically-detected megaregions also show spatial divisions which are not immediately evident in visual interpretation. Consider, for example, the detected community in the state of Connecticut (purple). Southwestern Connecticut is strongly linked to the New York City commuter region, and most visual heuristic regionalizations would merge this area into Greater New York. Yet Combo assigned an almost perfect break at the New York-Connecticut state border, creating a discrete Connecticut region which encompasses the state of Connecticut together with the Connecticut River Valley corridor running through western Massachusetts, from the city of Springfield to the Vermont border. Again, since Combo does not know “where” these nodes are in space, and does not know which state each node belongs to, the emergence of a community border which almost perfectly follows the real jurisdictional border between Connecticut and New York is highly suggestive, indicating perhaps that commuting decisions are being modified by factors that have to do with crossing this state border. To be clear, there are still many commutes crossing between Connecticut and New York; what the algorithm finds, however, is that there is a stronger internal than external matrix of connections on either side of this edge. A similar pattern is evident along the Delaware River between New Jersey and Pennsylvania, where the New York City region breaks almost perfectly into the Philadelphia region.

There are many similar interesting conclusions from this algorithmic partitioning scheme—conclusions which would not necessarily be legible from an interpretive visual heuristic method. Just a few other examples include: the merger of most of Iowa together with a corridor stretching through western Illinois to Springfield (grass green); the absorption of Toledo into the Michigan community (red); the merger of the Columbus and Cincinnati metropolitan areas (slate gray) the merger of Florida’s panhandle into the Alabama commuting region (salmon); the merging of the Little Rock and Memphis commuter areas (clay brown); and the absorption of Sacramento into the California Bay Area (navy blue).

Instead of mapping only nodes coded according to their assigned community, we can get a better sense of the complexity of the community assignments by mapping connections. However, this raises the question of whether commutes should be classified according to the community assignment of their origin node or their destination node. Although the majority of commutes occur within an assigned community, some commutes stretch from one assigned community to another. Thus commutes may have one community assignment, if their origin and destination points lie within the same algorithmically-assigned community, or two assignments, if they cross between communities. On the assumption that community structure is stronger at a central point, which in commuting terms is the job end of the route, we color-coded flows according to the assigned community of their destination node. Fig 6 shows this at the national scale.

By coloring connections according to the assigned community of their destination node, we can see cases where neighboring communities are strongly interlinked, and also cases where communities are fairly autarchic in terms of their commuting patterns. Fig 7 shows the relative density of interconnections between the Los Angeles and San Diego detected regions (inter-community connections to mid-coast California and Las Vegas are also evident). This can be compared to Fig 8, which shows how the detected community in the Minneapolis-St. Paul region is more self-contained in terms of its commuter flows, with relatively few commutes stretching to or from neighboring communities.

The interweaving colors evident in such maps show just how difficult it is to discover a perfect natural break within the pattern of commuter geography. The high modularity score of Combo’s output shows that the algorithm has produced a partitioning scheme in which the vast majority of commutes are contained within a single community. However, this still leaves thousands of commutes which cross communities. Fig 9 shows every commute in the Lower 48 states where the assigned community of the origin and the assigned community of the destination are different. This gives a sense of just how incorrect it is to call these partitioned communities truly ‘independent’ or autarchic in terms of their economic geography. For instance, the northeastern seaboard, the Great Lakes, and California are heavily interlinked by commutes which stretch across regions. A large number of east-to-west flows connect between the Miami and Central Florida regions. By contrast, not a single commute to New Orleans originates from outside of the Combo-assigned New Orleans community; the Twin Cities, similarly, pulls relatively few commuters from outside its own assigned region.

Although this algorithmic partitioning of the contiguous United States is highly satisfactory, it is not perfect. Several cases remain where nodes have been assigned to confusing, non-contiguous communities, in some cases stretching haphazardly across the entire United States. Such artifacts are especially common in less-dense areas of the country where the network structure of commutes is far weaker than in urban megaregions, and nodes are consequently less well-integrated into functional clusters. Consider, for example, the community which Combo has assigned in eastern Kentucky (jungle green). An examination of the structure of the network data (Fig 10) shows that this area really is coherent and independent in a certain sense, for it has very weak commuter relations with neighboring communities, and a reasonably strong internal structure of commuter relations. However, does it deserve its “own” unitary region? The algorithm believes that it does, whereas an interpretive method might well have included this area together with rest of Kentucky (pale brown) or the Columbus-Cincinnati-West Virginia region (gray-blue).

In such cases, it becomes clear that the dream of a regionalization based purely on statistical analysis is unviable; any division of space into unit areas will have to take into account a “common sense” interpretation of the validity and cohesion of the regions resulting from an algorithmic approach. For this reason, the visual heuristic method coupled with the algorithmic method offers a good combination of human interpretation and statistical precision. The algorithm is able to detect subtle boundary definitions and evaluate edges where the human eye would struggle to draw a clear line. However, the visual method has an advantage in matching coherent regions to an interpretive understanding of regions conjoined by cultural, political, or other similarities which are not captured in the data structure of the commute patterns. Fig 11 shows the result of a combined computational-visual approach. To produce this map of U.S. megaregions, we began by tracing convex hulls around communities as assigned by the partitioning algorithm. We then overlaid these shapes onto the flow map and interpretively cleaned up boundary lines, eliminating outliers and emphasizing geographic contiguity. In some places, like the High Plains, the relatively limited level of commuter activity meant that coherent communities could not be constructed. However, the result offers what we consider to be a compelling new regionalization of the United States: one which is grounded in empirical analysis but clarified using interpretive cartographic methods.