We study a sample of 100,000 ties drawn randomly from the Call Detail Records (CDR) of 20 million people from a single mobile phone operator over a period of 19 months. As in [17] we divide the time interval in three periods: the 7 months in the middle Ω define our observation and measurement period for the ties. We only select 60,592 ties in which there are at least 5 calls in Ω between users, and among those calls there has been at least one call in each direction. We only consider ties which have been observed at least for 50 days, to prevent very short ties. As in [17], the first and last periods of 6 months \(\Omega_{\mathrm{before}}\) and \(\Omega_{\mathrm{after}}\) are used to assess whether the tie has formed and/or decayed. In our particular case and since there is no explicit information about whether social interactions stop, we will say that the tie between user i and j has decayed if there are no calls between them in \(\Omega_{\mathrm{after}}\). This functional definition of the existence of a tie underestimates the possibility of having another call after those 6 months, but as it was shown in [17], only 3% of ties contain such long inter-event times \(\delta_{ij}\) between calls (see Figure 1), which shows that our method is subject only to a small error. It is important to understand that since activity within ties is bursty, large inter-events between interactions are likely and thus they might be mistaken as tie decay. In particular, in our database we find that the average time between calls in a tie is \(\overline{ \delta }_{ij} = 14\) days (with a standard deviation of 18 days), and thus we might get spurious effects if \(\Omega_{\mathrm{after}}\) is of the order of a month, as interactions may fall outside the \(\Omega_{\mathrm{after}}\) period. See the Methods section for further description of the mobile phone dataset. We have also considered another (smaller) database of Facebook communication through wall posts. Since the results on both databases are similar we discuss here only the mobile phone database and refer to the Methods section for further details about the Facebook database analysis.

Figure 1 Detecting tie decay and strength. Definition of observation periods and examples of call activity for 4 given ties. Any vertical segment is a call between the users in a particular tie. Our 19 months database is divided in three periods, where the 7 months in the middle Ω is our observation period where all the tie features will be measured. The period \(\Omega_{\mathrm{after}}\) is used to asses if ties are persistent, i.e. if there is activity in the tie. For example, ties (A) and (D) are persistent, while ties (B) and (C) are said to have decayed in \(\Omega_{\mathrm{after}}\). All ties have similar values of number of calls in the observation period with \(w_{ij} \in [30,40]\). We also show specific examples of one inter-event time \(\delta_{ij}\) (tie (B)) and freshness \(f_{ij}\) (tie (C)). Full size image

To characterize the strength of the tie we will find those features that can anticipate its persistence. Thus, we will implicitly identify strong relationships with persistency, while weak ties are those more likely to decay. This dynamical definition of strength is then a much more functional form of describing its utility in present and future social processes and operationalizes Granovetter’s idea that strong ties are those which are more likely to persist. To describe which tie features are related with its dynamical strength (persistence), we will also follow Granovetter’s notion of static strength of an interpersonal tie [20]: ‘the strength of a tie is a combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services which characterize the tie’. Within that framework, we define four categories of tie features: intensity, temporal, structural and intimacy features, and we will try to characterize which ties are the strongest (more persistent) according to these variables. Intensity, frequency and intimacy features will refer to properties of the communication patterns between users, while structural variables are those derived by understanding how the tie is embedded in the rest of the social network. Given the nature of our data, our features will be constructed solely taking into account the information about call events between users. Our working assumption is that there is enough information in those events to predict the persistence of the tie.

Some of the variables are adapted from previous works both in tie formation and decay prediction [12, 15, 17, 21], but others are introduced for the first time in this work. Specifically we introduce a number of variables that take into account the temporal patterns of the communication between users [1, 17]. Contrary to the static and aggregated version of relationships and networks, ties and networks are always evolving: not only communication between users is highly bursty and correlated in time [6, 7], but also the dynamical strategies by which users create and destroy ties are very different [17, 22]. The hypothesis we investigate in this paper is whether those patterns convey information about the fate of a social relationship. For example, if the periodicity or burstiness of how two people communicate or if they are involved in very fast social creation and destruction of ties can inform us about the persistence of social ties.

Intensity features

The first group of variables describe the amount of communication between users. Stronger relations imply a more frequent relationship which we can quantify by the number of calls \(w_{ij}\) between users. This variable is highly heterogeneous in our database in a similar way as other similar works in the literature [23] (see Figure 5). Specifically we find that the average number of calls is \(\overline{w_{ij}} = 76\) while it varies from a minimum of 5 and a maximum of 2468 calls per tie. To take into account this heterogeneity, the rest of the variables we will consider are calculated with respect to that level of activity per tie. For example, instead of considering the total duration of calls per tie we will consider the average duration \(d_{ij}\). On the other hand, several works have found that if the tie is highly reciprocal, the relationship is stronger and thus is less likely to decay [8, 12, 24]. Our database contains information about which user initiates the call so we can measure \(w^{\rightarrow }_{ij}\), the number of calls between i to j initiated by i. Using this, we define the level of reciprocity in between users i and j as

$$ r_{ij} = \biggl\vert \frac{w^{\rightarrow }_{ij}}{w_{ij}}-\frac{1}{2}\biggr\vert . $$ (1)

Note that this variable take values between 0 and \(1/2\). When user i initiates most of the calls in the tie, then \(w^{\rightarrow } _{ij} \simeq w_{ij}\) and \(r_{ij} \simeq 1/2\). On the contrary, when the number of calls from i to j is equal to the number of calls from i to j, we have that \(w^{\rightarrow }_{ij} \simeq w_{ij}/2\) and then \(r_{ij}=0\). Thus larger values of \(r_{ij}\) indicate less reciprocity.

Structural features

Formation and decay of a tie is also related with the social structure around it. People tend to form groups and in particular, people tend to form relationships with friends of friends (triadic closure) which leads to high clustering around a tie [10]. This is the reasoning behind Granovetter’s influential ‘strength of weak ties’ argument which implies that not also structural embedded ties are more likely to arise in a social network but they are also more persistent, a result corroborated by Burt in different works [13, 25]. Although there are many metrics to quantify embeddedness of a tie within the social network, we will use the topological overlap \(o_{ij}\) defined as the fraction of neighbors of i and j which are commonly shared [23]. Specifically,

$$ o_{ij}=\frac{\vert n_{i} \cap n_{j}\vert }{\vert n_{i} \cup n_{j}\vert }, $$ (2)

where \(n_{i}\) and \(n_{j}\) are respectively the set of neighbors of nodes i and j and \(\vert n_{i}\vert \) indicates the number of them. Note that, this variable takes values between 0 and 1, because if i and j have no common neighbors, then \(o_{ij}\) will take value 0. On the contrary, if i and j call to the same circle of id’s \(o_{ij}\) will take value 1. The topological overlap is then a variable measuring the (normalized) number of ‘common friends’ between two nodes.

The topological overlap is a particular way to measure the structural information around a tie. Another metric we will consider is the level of social connectivity around a tie. In particular, if \(k_{i}\) and \(k_{j}\) are the number of neighbors of i and j we will construct the geometric mean of connectivity \(k_{ij} = \sqrt{k_{i} k_{j}}\). This variable is introduced to take into account the effect of the different importance of a tie for the users involved in the relationship. If \(k_{ij}\) is small, the tie between i and j is important for both or one of them, while if \(k_{ij}\) is large, then it is just another tie among the many they have. Variations of structural connectivity around a tie have been considered in other works studying tie strength and dynamics [12, 19].

Intimacy features

Following Granovetter’s hypothesis of a strong tie, the intimacy (mutual confidence) between two nodes could provide a better characterization of the tie and allow a more accurate prediction of its dynamics. As opposed to other studies in social networks [19] our mobile phone database does not contain any information about the context and content of the call. Thus we quantify the mutual confidence by the day or hour when the calls are made. Specifically, we consider the fraction of calls within a tie that are made after 8 pm and during the weekend, \(\mu^{\mathrm{int}}_{ij}\). As was shown recently, calls made in the evening and at night are typically focused on a small number of emotionally intense relationship [26] and thus, quantifying the amount of communication happening at that time of the day can give us a proxy for intimacy.

On the other hand, demographic differences between users have an impact in tie dynamics. For example, the temporal communication patterns formed by groups of males or females are different [27], and those patterns can be associated with the different preference strategies of both sexes across the lifespan [28]. To quantify those relationship preferences, we consider the age and gender difference between the users participating in a tie. Age difference \(\mathit{age}_{ij}\) is measured as the absolute value of the difference in years while gender difference is a dichotomous variable where \(\mathit{gender}_{ij} = 1\) if both users have same gender and \(\mathit{gender}_{ij} = 0\) if they are different.

Temporal features

Finally we characterize the temporal patterns within and around the tie. Since communication within the tie is very heterogeneuous (see Figure 1), we want to understand whether that heterogeneity might reveal something about the persistence of the tie. The first variable we consider is the freshness of the tie \(f_{ij}\), i.e. the time since the last call between i and j at the end of Ω [12, 19]. Since activity within ties is very heterogeneous, we consider the relative freshness as the relative time elapsed from the last call compared to the typical time between calls in the tie \(\hat{f}_{ij} = f_{ij} / \overline{\delta }_{ij}\) where \(\overline{\delta }_{ij}\) is the average inter-event time between calls. At the same time we also consider the age of the tie as the time of the first call between users in our database \(t^{\mathrm{min}}_{ij}\) measured in days.

Another feature we consider is the burstiness of the communication patterns. The hypothesis we want to test is whether more regular communication patterns could reflect stronger/more persistent ties. For example, strong relationships like family and close friends require constant communication and thus they might have more regular patterns than acquaintances (see [29] and references therein). Although there are many ways to characterize burstiness of events [30], we will use two simple metrics. The first one is the coefficient of variation of the inter-event times \(\mathit{cv}_{ij} = \sigma_{ij}/\overline{\delta }_{ij}\), where \(\overline{ \delta }_{ij}\) is the average inter-event time between two calls and \(\sigma_{ij}\) is their standard deviation [2]. If \(\mathit{cv}_{ij} \gg 1\) then communication is very bursty, with large untypical periods of time in which users didn’t communicate (see for example tie B in Figure 1), while if \(\mathit{cv}_{ij} \ll 1\), communication was very regular, happening almost at the same time intervals (see tie A in Figure 1). The value \(\mathit{cv}_{ij} = 1\) correspond to the Poissonian homogenoeus case in which inter-event times are distributed randomly along the Ω period [30]. Another way to characterize the burstiness is to quantify how many communication events happened in bursts or rapid consecutive successions of calls (we will call them chats) [6, 31]. To do that we calculate the fraction of calls \(\mu^{\mathrm{chats}}_{ij}\) that happened only with 5 minutes difference between them.

Finally, another reason why a tie decays is simply because users involved in the tie have very different dynamical social strategies. As was found in [17] humans constantly create and destroy ties and they have different strategies to do that. While some individuals create and destroy a lot of ties (explorers), others tend to maintain their social circle (keepers). If both users in a tie are explorers, the probability for the tie to decay is high. To measure how dynamical are the strategies of users in a tie we consider \(a_{i}\), the number of ties created by user i in period Ω. As in [17] we say that a tie is created in Ω if there is no call between users in \(\Omega_{\mathrm{before}}\). The ratio between the number of created ties and the total number of ties \(a_{i}/k_{i} \in [0,1]\) describe how frequent user i changes her social neighborhood. If \(a_{i}/k_{i} \simeq 1\) it means that most of the ties of user i where created during Ω (i.e. the user social explorer), while if \(a_{i}/k_{i} \ll 1\) most of the ties are stable (social keeper). To characterize how dynamical are the strategies of both i and j we consider the geometrical mean

$$ a_{ij}=\sqrt{\frac{a_{i}}{k_{i}}\cdot \frac{a_{j}}{k_{j}}}. $$ (3)

If both i and j are explorers, \(a_{ij} \simeq 1\) and the tie is more likely to decay since it connects users with highly dynamical social strategies, while if they are both keepers, \(a_{ij} \simeq 0\) and the tie most likely will persist.

Table 1 summarizes the features considered to assess the dynamical strength of persistent ties. Before constructing our models and because of the large heterogeneity found in connectivity, activity and burstiness across ties in social networks, we scale and normalize our variables before using them in a model. For example, we consider \(\log w_{ij}\) instead of \(w_{ij}\) since the distribution of number of calls per tie is heavy skewed in mobile phone databases [23]. On the other hand burstiness within ties make variables like \(\mathit{cv}_{ij}\) or \(\hat{f}_{ij}\) also very heavy-tailed across our dataset. Thus we also use a logarithmic scaling for them. Although they are logarithmically scaled, in the rest of the paper we denote them by its original name for sake of clarity, unless were numerical values are given (for example in Figure 3). Finally, since the correlation between the variables is small, we keep all features in our analysis excepting \(t^{\mathrm{min}}_{ij}\) which is moderately correlated with \(w_{ij}\) (see Methods section to learn about the preprocessing and selection of variables).