General patterns in labor flow

Workers tend to change their jobs between geographically close firms with similar skill requirements20,21,22,23. This tendency leads to knowledge spillover and innovation, serving as a prominent feedback mechanism in the formation of geo-industrial clusters9,24,25,26,27. As geo-industrial clusters form, they also affect labor flow by attracting the workers with pertinent skills, creating a strong concentration of skills and knowledge locally. This feedback, where geo-industrial clusters and workers are influencing each other, produces concentrated job movements, which in turn can be leveraged to identify clusters as network communities, groups of cohesively interconnected nodes on a network28,29; in a labor flow network, the cluster of firms would manifest as network communities, tied together by concentrated labor flow (see Fig. 1).

From our data, relevant geo-industrial clusters can easily be found across domains, from technology firms of distinct flavors and ages (Fig. 1b) to clusters of travel and hospitality industries (Fig. 1c), which are concentrated with respect to both specialization (e.g., airlines, promotional credit cards, food service, or cruise lines) and geography. The hierarchical structure of these geo-industrial clusters is evident in the makeup of the non-US airline geo-industrial cluster, which, itself, is comprised of smaller sub-modules corresponding to serving geographically distinct markets such as Europe and the Middle East.

The concentration based on the industrial and geographic proximity can be separately observed through an industry-wise and a region-wise transition matrix. We calculate two normalized transition matrices between industries and US states respectively (Fig. 1d, e; see Methods and Supplementary Note 1 for details). Industries are split into two large clusters, which roughly correspond to production (upper left) and public and consumer services (bottom right). In the context of the three-sector theory30,31, or rather a more recent four-sector framework32, the upper-left cluster is organized around the primary, secondary, and some of the tertiary sector (infrastructure and business support), whereas the bottom-right cluster consists of industries mostly in the quaternary sector, including higher education, government, law, healthcare, leisure, and media (see Supplementary Figure 1 for all original industry labels). Although finance and information technology are often classified into the quaternary sector, here they are clustered with production and manufacturing, highlighting their strong connection to engineering and production. Retail, on the other hand, is clustered more closely with other quaternary services, as opposed to tertiary services.

The abundance of off-diagonal interactions emphasizes the complex interconnected nature of the economy. For instance, the law and government sectors are more likely to generate a cluster with military, trade, and environment sectors than other sectors of the economy, although such connections cross the boundary of the two largest industry clusters. Curiously, the leisure industry is one of the most widely connected, exhibiting strong connections to many other sectors, including healthcare, education, art, media, and manufacturing. The labor flow network also displays strong geographical clustering, as shown in Fig. 1e.

Industry versus geography

The clear presence of clustering with respect to both industry and geography prompts the following questions: which factor is more important in determining the structure of geo-industrial clusters, industry, or geography? How do these factors shape the hierarchical structure of these clusters? If the composition of a geo-industrial cluster is heavily constrained by industrial or geographical proximity, we expect to see clusters form around an industry or a location, respectively. Therefore, measuring cluster homogeneity in terms of industry and region not only allows us to evaluate the validity of clustering but also allows us to estimate the strength of each constraint. In doing so, we assess the relevance of the clusters as well as the strength of industrial or geographical constraints.

We quantify the homogeneity of network communities by calculating the Shannon entropy of cluster feature vectors that document the fraction of people in the geo-industrial cluster who belong to each industry or region (see Methods). We quantify the relative importance of industry and geography by calculating the ratio between the number of geo-industrial clusters at each level with a greater reduction in industrial entropy and those with a greater reduction in geographical entropy. Our measurement in Fig. 2b, c shows that the industry tends to play a more important role than geography in constraining labor flow and its strength is strongest at the middle of the hierarchy (see Methods, Supplementary Figure 2, and Supplementary Note 2 for more details). In other words, network communities tend to be broken down into smaller communities mainly based on industrial categories. As shown in Fig. 2d, the average entropy reduction is larger than expected by chance throughout the hierarchy, indicating that the identified clusters are cohesive and meaningful. Then, how are they organized within the global network?

Fig. 2 The impact of geography and industry across scales. The top-level communities of this network found through the Louvain method (see Methods) have a modularity of 0.47. a We recursively apply network community detection to discover the labor flow network’s hierarchical structure. See Methods for more details. b Both industry and geography affect job transition across all scales, but industry has a more important role in the middle of the hierarchy as seen by the proportion of communities with a greater reduction in industry entropy (ρ ind ). c The average reduction of metadata entropy \(\ {\bar{{\!\!}\boldsymbol d}}\) (see Methods for the definition) at each level of the hierarchical community structure, calculated with respect to the whole network. The monotonic increase indicates that smaller communities are more homogeneous as expected. d This entropy reduction is greater than expected by a null model. The difference between the observed entropy reduction and the reduction in a randomized hierarchical null model is denoted as Δ. Positive Δ indicates that the homogeneity of clusters is stronger than expected Full size image

Hierarchical structure of the labor flow network

We visualize the network of geo-industrial clusters in Fig. 3a (see Methods for details), where each circle represents a geo-industrial cluster, colored based on the highest-level community membership. We label each highest-level cluster based on the dominant industry or geographical region (See Methods). The map exhibits both industry- and geography-dominated clusters. Cultural and regional economic blocs, such as Northern Europe, stand out, whereas industrial clustering is also evident. For instance, engineering and machinery are associated with automotive clusters, and food production and chemicals are associated with pharmaceutical and medical devices. The map also reveals geographical specializations. Firms located in the Midwest of the United States closely interact with retail and consumer goods industries worldwide, whereas India-based clusters are strongly associated with information technology.

Fig. 3 Example of hierarchical structure. a The large-scale organization of geo-industrial clusters in the labor flow network. Each circle represents a geo-industrial cluster, with size proportional to its number of employees. The colors represent the highest-level community membership. b–e Two examples of hierarchical sub-structures in the labor flow network are illustrated. Each circle represents a firm and the bar charts show the reduction in industry and region entropy within the cluster as a proportion of the parent cluster’s entropy. b, c the organization of banking and financial geo-industrial clusters are affected more by industries than geography. d, e the geo-industrial clusters in US Midwest and South region form strong geographical clusters Full size image

Zooming into lower levels of the geo-industrial hierarchy reveals more intricate structures (See Fig. 3b–e). Two high-level clusters are shown: one focused on banking and financial services in the US, and the other with higher education, healthcare, and retail industries in the US. The banking and financial cluster is broken into more specific industries, such as investment banking and real estate (Fig. 3b). The entropy reduction measure confirms that this hierarchical structure is dominated by industrial categories rather than geographical clustering. On the other hand, the Higher Education, Health Care, and Retail cluster is mostly divided along regional lines. These examples depict the structure of the labor flow network as a complex tapestry of industry and geography.

Association with economic performance

If geo-industrial clusters can effectively capture both industrial and geographical proximity, can they serve as a useful framework to study the effects of strategic advantage on economic performance? The competition for highly desirable jobs implies that well-educated individuals who are equipped with strong skill sets would be attracted to the sectors and regions that can pay premium wages or rapidly growing ones that may in the future. Furthermore, the industries and regions that attract well-educated people are more likely to benefit from accumulated human capital and spillover effects33,34,35,36,37,38,39,40. Motivated by these insights as well as other studies on the effect of labor market integration and knowledge spillover within geo-industrial clusters12,13,14, we examine the labor flow of college-degree workers across regions, industries, and geo-industrial clusters.

We test how well the influx of educated labor correlates with financial performance when aggregated into different units of analysis. Focusing on the firms in the S&P 500 Index and a time window between 2011 and 2014, we compare their market capitalization growth—measured by the linear temporal trend of log-scaled market capitalization—to the labor flux growth—measured by the linear temporal trend of the log ratio of college-degree labor influx to outflux aggregated in each grouping (see Fig. 4 and Methods).

Fig. 4 The influx of educated labor force is linked to the growth of geo-industrial clusters. The horizontal axis represents the 5-year trend in college-degree labor flux from 2010 to 2014. Similarly, the vertical axis represents the 5-year trend in log-scaled market capitalization within the cluster over time. a The trends for individual firms. b The trends for geographical regions. c The trends for industries, and d the trends for geo-industrial clusters, which displays the strongest relationship Full size image

Overall, we see a positive relationship between the acceleration of college-degree employment growth and market capitalization growth although the strength of the relationship depends on the aggregation used (see Fig. 4). At the level of individual firms, the data is too noisy to establish any clear patterns (Fig. 4a). Geographical aggregation similarly shows little association between labor growth and market capitalization growth, suggesting that location-based grouping is also not a good approach, probably because each location hosts a multitude of disparate industries. Although the industry-level aggregation in Fig. 4c shows a stronger relationship, the strongest correlation can be found in the geo-industrial cluster-based aggregation (see Fig. 4d). These results hold for more complex bayesian models and are robust to the selection of time window, or the inclusion or exclusion of first-job influx and last-job outflux (see Supplementary Figures 3–7, with Supplementary Note 3). The stronger association between the influx of educated labor and economic growth in the geo-industrial cluster level, in comparison with traditional industry- or region-based aggregation, suggests that firms that share labor also share economic growth or decline. This is perhaps driven by shared competitive advantages due to labor market integration and knowledge spillover effects1,2,12,13,14.

Emerging geo-industrial clusters

We see that the influx of educated workers to a geo-industrial cluster is a meaningful signal of growth, so we can ask which regions, industries, and geo-industrial clusters are seeing that growth. We measure the total growth in terms of influx during a period from 2010 to 2014, using the log ratio of influx to outflux of college-educated workers for each region, industry and geo-industrial cluster, log(Sin/Sout) (See Figure 5a–c and Methods). We then estimate the change of this growth, denoted β, by estimating the linear trend in time of the influx log-ratio during the same period. If a region, an industry, or a geo-industrial cluster exhibits a positive net influx and a positive β, it means that it has been growing and the growth has been increasing during this period.

Fig. 5 Growth of regions, industries, and geo-industrial clusters and associated skills in growing and declining geo-industrial clusters. a–c The log-ratio of influx to outflux and its growth over time, aggregated by region a, industry b, and geo-industrial cluster c. The amount of growth is calculated by the log-ratio of influx to outflux (log(Sin/Sout)) during each year from 2010 to 2014; its linear time trend (β i ) is estimated by the linear regression coefficient of influx ratios to time over this period. The size of a circle represents the number of total transitions either into or out of a corresponding category. d, e Over-represented skills in geo-industrial clusters in the top and bottom quartiles of log-ratio influx to outflux d and its linear time trend e. The fraction of people who have a certain skill in the top \(\left( {P_q^t} \right)\) and bottom \(\left( {P_q^b} \right)\) geo-industrial clusters reveals that specialized and business-oriented skills are more common in growing geo-industrial clusters than declining geo-industrial clusters Full size image

Figure 5a shows that most regions are located in the fourth quadrant, with decelerating growth following a strong bounce-back from the Great Recession of 2007–200941. The San Francisco Bay area and the Greater Seattle Area exhibited the strongest growth, whereas places such as San Antonio have been losing educated population. Similarly, most industries also show a slowing growth out of the recession (see Fig. 5b). In this period, the Computer Software industry has been showing the strongest growth, whereas Retail has been losing its educated labor force. This trend has been accelerating. Also note that the Mining & Metals industry has been growing but decelerating, and the Internet and Oil & Energy industries experienced large growth during this period. These employment growth patterns match the relative growth projections from the US Bureau of Labor Statistics’ Occupational Handbook42, except that our analysis detects a loss of Retail jobs among the college-educated, and a pronounced deceleration in growth across many fields.

Although these region- and industry-based views paint a rough picture that fits the known recent trends of the global economy, it is the geo-industrial cluster-based analysis that provides the best snapshot of the evolution of the economy. The fact that the San Francisco Bay area has been rapidly growing does not tell us which industry propelled the growth; likewise, the growth of the computer software and internet industries does not inform us where this growth has occurred. By contrast, a cluster-based comparison in Fig. 5c reveals nuanced information about the growth of geo-industrial clusters, completing the picture of economic evolution during this period. The clusters that are based on internet and computer software companies in the San Francisco area, real estate companies in the Los Angeles area, and computer software companies in the Seattle area experienced some of the strongest growth with respect to college-degree workers, while military-related firms and organizations in Washington D.C. and retail companies in the Chicago area experienced the largest decline.

Emerging skills

This pattern of productivity growth can be supplemented with an even more detailed analysis of associated skills. Here, we identify over- and under-represented skills in emerging and declining geo-industrial clusters (see Supplementary Figure 7 for similar analysis with regions and industries). We compare the aggregated skill distribution of geo-industrial clusters in the top quartile of total influx (log(Sin/Sout); Fig. 5d) or growth (β; Fig. 5e) during this period against those in the bottom quartile. The vertical axis represents the fraction of employees with each skill within the top quartile, and the horizontal axis represents the proportion in the bottom quartile. The intensity of the color represents the degree to which each skill is concentrated in the top (red) or bottom (blue) quartile, as measured by the z score of the log-odds ratio between the top and bottom skill distributions (see Methods). With respect to the total influx, the over-represented skills in the top geo-industrial clusters are concentrated around management skills, such as Management, Project Management, and Team Management. These results concur with studies on the importance of cognitive-social skills and the prevalence of management-related jobs in high-wage occupations43,44,45. In addition, oil and energy-related skills such as Petroleum, Oil & Gas, Gas, and Onshore are more prevalent in the top quartile, which captures the recent growth of oil and natural gas industry, driven by the new drilling and fracking technologies applied in the US during this period46,47,48,49.

On the other hand, the most over-represented skills in geo-industrial clusters in the bottom quartile feature widely available, common skills such as Customer service and Microsoft Office, or vague skills such as Leadership. This bias towards common and vague skills in the bottom quartile remains consistent regardless of the focus on the total influx or its growth (Fig. 5e). Although the Leadership skill is more common in the bottom quartile, related, but more specific skills, such as Cross-functional Team Leadership or Process Improvement are over-represented in the top growing geo-industrial clusters. The over-represented skills in the top quartile of influx growth feature newer skills, such as Pharmaceuticals, Biotechnology, and Cloud Computing, capturing new innovations that are attracting educated labor flow.