We will take as our starting point measurements from a stochastic process. This could be a vector of chemical concentrations over time, the abundance of various cell types, or probabilities of observing coherent behaviors. We use coarse-grained or quantized information-theoretic filters the quantize the measurements. Some of these filters will reveal a coordinated pattern of behavior, whereas others will filter out all signal and detect nothing. Thus, signal amplitude given an appropriate filter becomes a means of discovering different forms of individuality. This is somewhat analogous to observing patterns in infrared that would be invisible using the wavelengths of visible light—individuality is revealed through characteristic patterns of information flow.

The basis for this approach to aggregation comes from information theory, and throughout this paper we assume that individuals are best thought of in terms of dynamical processes and not as stationary objects that leave information-theoretic traces. In this respect, our approach might reasonably be framed through the lens of “process philosophy” (Rescher 2007) which makes the elucidation of the dynamical and coupled properties of natural phenomena the primary explanatory challenge. From the perspective of “process philosophy,” the tendency of starting with objects and then listing their properties—“substance metaphysics”—places the cart before the horse.

The origin of information

Our proposal that individuals are aggregates that propagate information from the past to the future and have temporal integrity can be viewed as a pragmatic operational definition that captures the idea there is something persistent about individuals. However, our motivation for defining individuality this way is actually much deeper. It lies in the information-theoretic interpretation of entropy, its connection to the physical theory of thermodynamics, and formal definition of work introduced by Clausius in the 1860s [see (Müller 2007) for an introduction to this history].

Briefly, work (displacement of a physical system) is produced by transferring thermal energy from one body to another (heat). Entropy captures, or measures, the loss in temperature over the range of motion of the working body. In other words, entropy measures the energy lost from the total available energy available for performing work. The insights of Clausius were formalized and placed in a mathematical framework by Gibbs in 1876.

In 1877, Boltzmann provided in his kinetic theory of gasses an alternative interpretation of entropy. For Boltzmann entropy is a measure of the potential disorder in a system. This definition shifts the emphasis from energy dissipated through work to the number of unobservable configurations (microstates) of a system, e.g., particle velocities consistent with an observable measurement (macrostate), e.g., temperature. The thermodynamic and Boltzmann definitions are closely related as Boltzmann entropy increases following the loss of energy available for work attendant upon the collision of particles in motion during heat flow. There are many different microscopic configurations of individual particles compatible with the same macroscopic measurement, and only a few of which are useful.

In 1948, encouraged by John von Neumann, Claude Shannon used the thermodynamical term entropy to capture the information capacity of a communication channel. A string of a given length (macrostate) is compatible with a large number of different sequences of symbols (microstates). A target word will be disordered during transmission in proportion to the noise in a channel. If there were no noise, each and every microstate could be resolved and the entropy would define an upper limit on the number of signals that could be transmitted. The study of the maximum number of states that can be transmitted from one point to another across a channel, in the face of noise and when efficiently encoded, is called information theory.

Shannon did not describe entropy in terms of heat flow and work but in terms of information shared through a channel transmitted from a signaler to a receiver. The power of information theory derives in part from the incredible generality of Shannon’s scheme. The signaler can be a phone in Madison and the receiver a phone in Madrid, or the signaler can be a parent and the receiver its offspring. For phones, the channel is a fiber-optic cable and the signal pulses of light. For organisms the channel is the germ line and the signal the sequence of DNA or RNA polynucleotides in the genome. Increasing entropy for a phone-call corresponds to the loss or disruption of light-pulses, whereas increasing entropy during inheritance corresponds to mutation or developmental noise. The same scheme can be applied to development, in which case the signaler is an organism in the past and the receiver the same organism in the future. One way in which we might identify individuals is to check to see whether we are dealing with the same aggregation at time t and \(t+1\). If the information transmitted forward in time is close to maximal, we take that as evidence for individuality.

In its simplest form, Shannon made use of the following formal measures when defining information. The entropy H of a random variable S measures the uncertainty or information of the states that it can adopt:

$$\begin{aligned} H(S) = -\sum _i P(s_i)\log _2 P(s_i) \end{aligned}$$

where \(s_i\) are the possible values of the state and \(P(s_i)\) the probabilities of these states. For a coin there would be two possible values for S, heads and tails, and the values of these states for a fair coin would be the probability 0.5, yielding a metric entropy value of 1. Deviation from a fair coin corresponds to a reduction in information, as in the limit of bias where only one side of the coin is favored, the outcome is known in advance and any toss of the coin is perfectly predictable. This produces an entropy value of 0. Hence, information is minimized when predictability is maximized.

To capture the communication value of information Shannon introduced a signaler–receiver structure, which is now typically described using two random variables S and R. The maximum information transmitted between signaler and receiver is given by the Mutual Information (I). The I can be written in several different forms. One intuitive expression is:

$$\begin{aligned} I(S;R) = H(S) + H(R) - H(S,R) \end{aligned}$$

where H(S) and H(R) are the entropies of the signals, and H(S; R) the joint entropy of the two variables,

$$\begin{aligned} H(S;R) = -\sum _i \sum _j P(s_i, r_j)\log _2 P(s_i, r_j) \end{aligned}$$

The joint entropy is at a maximum when there is no relationship between the S and R variables. The I is therefore high when the information in S and R are high and they are strongly coupled in their values (H(S; R) is low). The I measures the information shared between S and R over a communication channel, because the only source of structure in R is assumed to come from S.

Another conventional way of writing I is,

$$\begin{aligned} I(S;R) = H(R) - H(R|S) \end{aligned}$$

where H(S|R) is the conditional entropy of R or the amount of information in R that is not in S. Hence, if all the information in R comes from S then H(R|S) will be zero, and \(I(S;R) = H(R)\). If one of the random variables, for example the sender S consists of two parts \(S=\{S_1,S_2\}\), we can decompose the mutual information using the chain rule (Cover and Thomas 1991)

$$\begin{aligned} I(S_1,S_2;\,R)=I(S_1;\,R)+I(S_2;\,R|S_1) \end{aligned}$$

with the second term being the conditional mutual information

$$\begin{aligned} I(S_2;\,R|S_1):=H(R|S_1)-H(R|S_1,S_2). \end{aligned}$$

These measures provide the necessary statistics for an informational theory of the individual.

Fig. 1 The causal diagram of the system–environment interaction Full size image

When we model the interaction between a system and its environment we have to consider a more complicated situation which involves two channels. To be more precise, let \({{\mathcal {S}}}\) and \({{\mathcal {E}}}\) be the state set of the system and the environment. For simplicity, we assume that \({{\mathcal {S}}}\) and \({{\mathcal {E}}}\) are finite. The dynamics of the system is influenced by its own state, but it can also be influenced by the state of the environment. This can be modeled in terms of a channel \(\varphi : {{\mathcal {E}}} \times {{\mathcal {S}}} \rightarrow {{\mathcal {S}}}\), where \(\varphi (e,s;s')\) denotes the probability of the next system state \(s'\) given that the current system state is s and the environment is in state e. In particular, we assume that \(\varphi (e,s;s') \ge 0\) for all \(e,s,s'\), and \(\sum _{s'} \varphi (e,s;s') = 1\) for all e, s. We can model the dynamics of the environment in the same way, using a Markov kernel \(\psi : {{\mathcal {S}}} \times {{\mathcal {E}}} \rightarrow {{\mathcal {E}}}\), where \(\psi (s,e;e')\) denotes the probability for the next state \(e'\) of the environment given the current states e and s of the environment and the system, respectively. The kernels \(\varphi\) and \(\psi\) model the mechanisms that constitute the system–environment interaction. If we start this interaction process by selecting a states s and e according to some probability distribution \(\mu\), we obtain a process \((S_k,E_k)\), \(k = 1,2,\ldots\), in \({{\mathcal {S}}} \times {{\mathcal {E}}}\) that satisfies

$$\begin{aligned}&{\mathbb P}(S_1 = s_1, E_1 = e_1, S_2 = s_2, E_2 = e_2, \ldots , S_n \\&\quad = s_n, E_n = e_n) \\&\quad = \mu (s_1,e_1) \, \varphi (e_1,s_1;s_2) \psi (s_1,e_1;e_2) \, \ldots \, \\&\qquad \varphi (e_{n - 1} , s_{n - 1} ; s_n) \psi (s_{n - 1} , e_{n - 1} ; e_n), \qquad n = 1, 2, \ldots . \end{aligned}$$

Clearly, we can recover the mechanisms from the distribution of the process \((S_k, E_k)\), \(k = 1,\ldots ,\)

$$\begin{aligned} {\mathbb P}(S_1&= {} s, E_1 = e) = \mu (s,e), \\ {\mathbb P}(S_k&= {} s' \, | \, E_{k - 1} = e, S_{k - 1} = s) = \varphi (e , s ; s' ), \\ {\mathbb P}(E_k &= {} e' \, | \, S_{k - 1} = s, E_{k - 1} = e) = \psi (s , e ; e' ). \end{aligned}$$

We apply information-theoretic quantities, such as the mutual information, to variables of the process \((S_k, E_k)\), thereby quantifying information flows between the system and the environment. The causal structure of the process, as shown in Fig. 1, implies a number of conditional independence statements. For instance \(E_{n + 1}\) is conditionally independent of \(S_n, E_n\) given \(S_{n-1}, E_{n-1}\).

The informational individual

In the previous section, we set down the information-theoretic foundations for our formalism. Here, we discuss the additional mathematical properties required of the formalism if it is to capture the concept of individuality we developed in “A way forward” section.

We remind the reader our starting point is the assumption that biological individuality can usefully be understood as an “informational individual.” We further remind the reader this is not to be confused with Dawkin’s replicator, as we want to allow the possibility that replication is not a fundamental feature of individuality and be able to ask what role individuality plays in facilitating replication. What is fundamental in our view is the idea that information can be propagated forward through time, meaning that uncertainty is reduced over time. In this way, and returning to our opening remarks in “A way forward” section, we suggest individuality is a natural extension of the ideas of Boltzmann and Von Neumann, and as such has foundations in statistical mechanics and thermodynamics, which consider the conditions required for a persistently ordered states.

Defining properties and implications of the formalism

1 The system environment decomposition Consider a dynamical set of quantifiable measurements that we coarse grain into components of a system and components of an environment. We seek a way of establishing whether this partition is justifiable, and whether the individuality concept is relevant. We wish to allow for a hierarchy of such partitions in order to capture biological examples such as organelles within cells, and cells within bodies within populations, where in each case the target entity and the environment assume a different identity. We retain those partitions that meet our information-theoretic inclusion criteria and then can ask which among the natural, intuitive categories of biology—e.g., cells, organelles, organisms, populations, etc., are recovered. 2 Informational individuals In the pursuit of generality, we consider a discrete, stochastic process where the state of the system in the future is determined by some subset of states in the present. If we arbitrarily divide these states into system and environment, we should like to be able to determine how the current system state \(S_n\) and the current state of the environment \(E_n\) together are sufficient to determine the next system state \(S_{n+1}\). Formally, the predictability of the next state of the system is quantified via the mutual information: $$\begin{aligned} I(S_n, E_n; \,S_{n+1}) = H(S_{n+1}) - H(S_{n+1} | S_n, E_n) . \end{aligned}$$ This expression seeks to capture how much information at time \(n+1\)\(S_{n+1}\) comes from the system itself at a previous time step (or generation) \(S_n\)—the individual—versus from the environment at a previous time \(E_n\). This mutual information can now be decomposed in two ways $$\begin{aligned} I(S_n, E_n; S_{n+1})&= {} I(S_{n+1}; S_n) + I(S_{n+1}; E_n | S_n) \\&= {} I(S_{n+1}; E_n) + I(S_{n+1}; S_n | E_n) \end{aligned}$$ Each decomposition can be interpreted as different allocation for distributing the observed past regularities between the system and environment. Each of these will allow us to define different forms of individuality. a Endogenous determination Consider \(I(S_{n+1}; S_n) + I(S_{n+1}; E_n | S_n)\): Here, we measure the influence of the system state onto itself (at the next generation or time step). For a preferred interval of time, all observed dependencies between successive system states are attributed to the system. The quantity \(I(S_{n+1}; S_n)\) has been called autonomy in Krakauer and Zanotto (2006) and will be denoted as \(A^*\) in the following. It should be high when the system is largely control of its environment. The influence of the environment, as measured by \(I(S_{n+1}; E_n | S_n)\), can be interpreted as new information for the system flowing from the environment into the system. When this information flow vanishes completely, a system can be said to be informationally closed. So this quantity measures the degree to which the system is controlled by the environment nC. Note that closure does not require causal independence, it only states that all influences from the environment are predictable by the system. b Environmentally driven An alternative to endogenous determination is structure imposed largely through environmental gradients driving the system. In other words, the history of the system is not as consequential as the history of the environment that impose strong boundary conditions on the system. Consider \(I(S_{n+1}; E_n) + I(S_{n+1}; S_n | E_n)\): Here, the observed influences are attributed to the environment (as far as possible according to \(I(S_{n+1}; E_n)\)). Only the remaining influence \(I(S_{n+1}; S_n |E_n)\) is due to the system. This can be interpreted as an alternative concept of system autonomy (Bertschinger et al. 2008) and will be denoted as A in the following. It is valid under the assumption that all dependencies between the states of the system and the environment are attributed to the environment. These properties allow us to identify three quantities, each corresponding to a type of individuality: $$\begin{aligned} \text{ Colonial } \text{ Individuality } \quad A&:=I(S_{n+1};S_n|E_n) \\ \text{ Organismal } \text{ Individuality } \quad A^{*}&:=I(S_{n+1};S_n) \\ \text{ Environmental } \text{ Determined } \text{ Individuality } \quad nC&:=I(S_{n+1};E_n|S_n) \end{aligned}$$ To rigorously formalize these different types of individuality, however, we need to consider them on a more fine-grained scale.

Fine-grained decomposition

Using the chain rule for mutual information, we encounter an ambiguity in attributing influence to the environment or to the system. The partial information decomposition (Williams and Beer 2010; Bertschinger et al. 2013). allows us to resolve this ambiguity by introducing notions of unique, shared and complementary information.Footnote 1

The mutual information between the future state of the system at time \(n+1\) and the joint state of system and environment at time n is decomposed into four terms:

$$\begin{aligned} I(S_{n+1};S_n,E_n)&= \underbrace{SI(S_{n+1};S_n,E_n)}_{\text {shared}} + \underbrace{CI(S_{n+1}; S_n,E_n)}_{\text {complementary}} \\&\quad+ \underbrace{UI(S_{n+1};S_n\backslash E_n)}_{\text {unique }(S_n\text { wrt }E_n)} + \underbrace{UI(S_{n+1};E_n\backslash S_n)}_{\text {unique }(E_n\text { wrt } S_n)}. \end{aligned}$$ (1)

Those four terms appear in the pairwise mutual information and conditional mutual information that we obtained from the chain rule:

$$\begin{aligned} I(S_{n+1};S_n)&= SI(S_{n+1};S_n,E_n) + UI(S_{n+1};S_n\backslash E_n) , \end{aligned}$$ (2)

$$\begin{aligned} I(S_{n+1};E_n|S_n)&= CI(S_{n+1};S_n,E_n) + UI(S_{n+1};E_n\backslash S_n) , \end{aligned}$$ (3)

$$\begin{aligned} I(S_{n+1};E_n)&= SI(S_{n+1};S_n,E_n) + UI(S_{n+1};E_n\backslash S_n) ,\end{aligned}$$ (4)

$$\begin{aligned} I(S_{n+1};S_n|E_n)&= CI(S_{n+1};S_n,E_n) + UI(S_{n+1};S_n\backslash E_n) , \end{aligned}$$ (5)

In our context the four terms have the following meaning:

a The unique information from the system \(UI(S_{n+1};S_n \setminus E_n)\). This is information maintained by the system. b The shared information between the system and environment \(SI(S_{n+1},S_n,E_n)\). c The unique information from the environment \(UI(S_{n+1};E_n \setminus S_n)\). This quantifies the influence of the environment on the system. (Information flow in the narrow sense). d The complementary or synergistic information. Information that is only present in the interaction of systems and environment.

It is important to emphasize that these decompositions are a means of supporting our formal intuition and do not correspond to a specification of the information-theoretic quantities. This choice remains disputed and several alternative proposals have been published. These are reviewed in a special issue of the journal Entropy (Lizier et al. 2018). Nevertheless, the measures that we derive fully accord with the conceptual decomposition.

Forms of individuality

With a good understanding of the implications of partial information decomposition in hand in hand, we can now rigorously define three forms of individuality and an additional measure quantifying contribution of each in the case of hybrid types. These measures are defined in terms of the information that is shared by system and environment (e.g., adaptive information), information that is unique to either the system or the environment (e.g., memory in each), and information that depends in some complicated way on both the system and the environment (e.g., regulatory information).

Organismal Individuality \(A^*\) $$\begin{aligned} A^{*}=SI(S_{n+1};S_n,E_n) + UI(S_{n+1};S_n\backslash E_n) \end{aligned}$$ Organisms are well adapted when they share through adaptation or learning significant information with the environment in which they live. In addition, they contain a large amount of private information required for effective function. By maximizing this measure, we are able to identify complex organisms in their environments.

Colonial Individuality A $$\begin{aligned} A=CI(S_{n+1};S_n,E_n) + UI(S_{n+1};S_n\backslash E_n) \end{aligned}$$ Many organisms such as microbes share only a small amount of information with the environment in which they live. They contain regulatory mechanisms that allow for adaptation through ongoing interaction between their biotic and abiotic environment. By maximizing this measure, we are able to identify “environmentally regulated aggregations,” which we call “colonial individuals.”

Environmental determination nC $$\begin{aligned} nC&= {} I(S_{n+1};E_n|S_n) = CI(S_{n+1};S_n,E_n) \\&\quad + UI(S_{n+1};E_n\backslash S_n) \end{aligned}$$ This measure quantifies the degree of environmental determinism on the temporal evolution of an individual. When this measure is minimized an individual becomes completely insensitive to the environment—and hence is neither in the organismal or colonial form—and not in any real sense adaptive. It represents the persistence of an environmental memory capable through interaction with the system of generating structure, such as temperature gradients in a fluid that produce vortices.

Environmental Coding $$\begin{aligned} NTIC=SI(S_{n+1};S_n,E_n) - CI(S_{n+1};S_n,E_n) \end{aligned}$$ The intuition behind this measure is to quantify the difference between a colonial and organismal measure of individuality. The difference is captured by the difference between shared information (e.g., adaptive information) and the interaction of individual and environment (e.g., regulatory information). One way to think about this is how much information can be encoded about the environment in the system innately (e.g., inherited information) versus how much information needs to be encoded through ongoing interaction. When the measure is large nature dominates nurture. As the measure declines, nurture begins to dominate nature.

Individuality measures in an illustrative example

To gain a better understanding of each of these measures, we work through a quantitative example.

We consider two binary units \(E_n\) and \(S_n\), with state sets \(\{-1, + 1\}\). Following the general structure introduced in sect. 2.1 and Fig. 1, these states are synchronously updated according to the following conditional distribution:

$$\begin{aligned} p(s_{n+1},e_{n+1} | s_n,e_n) \, = \, p_S(s_{n+1} | s_n, e_n) \cdot p_E(e_{n+1} | s_n,e_n), \end{aligned}$$

where

$$\begin{aligned} p_S(s_{n+1} | s_n, e_n)= {} \frac{1}{1 + \mathrm{e}^{- 2 s_{n+1} \left( \delta _S + \alpha _S s_n + \beta _S e_n + \gamma _S s_n e_n\right) }} \end{aligned}$$ (6)

$$\begin{aligned} p_E(e_{n+1} | s_n, e_n)= \frac{1}{1 + \mathrm{e}^{- 2 e_{n+1} \left( \delta _E + \alpha _E e_n + \beta _E s_n + \gamma _E s_n e_n\right) }}. \end{aligned}$$ (7)

Evaluating the individual conditional distributions, we obtain

$$\begin{aligned} p_S(+1 | +1, +1)&= \frac{1}{1 + \mathrm{e}^{- 2 \left( \delta _S + \alpha _S + \beta _S + \gamma _S \right) }} =: a_S \\ p_S(+ 1 | -1, +1)&= \frac{1}{1 + \mathrm{e}^{- 2 \left( \delta _S - \alpha _S + \beta _S - \gamma _S \right) }} =: b_S \\ p_S(+ 1 | +1,-1)&= \frac{1}{1 + \mathrm{e}^{- 2 \left( \delta _S + \alpha _S - \beta _S - \gamma _S \right) }} =: c_S \\ p_S(+ 1 | -1,-1)&= \frac{1}{1 + \mathrm{e}^{- 2 \left( \delta _S - \alpha _S - \beta _S + \gamma _S \right) }} =: d_S \end{aligned}$$

and correspondingly

$$\begin{aligned} p_E(+1 | +1, +1)&= {} \frac{1}{1 + \mathrm{e}^{- 2 \left( \delta _E + \alpha _E + \beta _E + \gamma _E \right) }} =: a_E \\ p_E(+ 1 | -1, +1)&= {} \frac{1}{1 + \mathrm{e}^{- 2 \left( \delta _E + \alpha _E - \beta _E - \gamma _E \right) }} =: b_E \\ p_E(+ 1 | +1,-1)&= {} \frac{1}{1 + \mathrm{e}^{- 2 \left( \delta _E - \alpha _E + \beta _E - \gamma _E \right) }} =: c_E \\ p_E(+ 1 | -1,-1)&= {} \frac{1}{1 + \mathrm{e}^{- 2 \left( \delta _E - \alpha _E - \beta _E + \gamma _E \right) }} =: d_E \end{aligned}$$

Finally, this yields the following stochastic matrix with entries \(p(s_{n+1},e_{n+1} | s_n,e_n)\):

$$\begin{aligned} \begin{array}{ c || c | c | c | c |} &{} (+ 1, + 1) &{} (- 1, + 1) &{} (+ 1, -1) &{} (-1 , -1) \\ \hline \hline (+ 1, + 1) &{} a_S a_E &{} (1 - a_S) a_E &{} a_S (1 - a_E) &{} (1 - a_S) (1 - a_E) \\ (- 1, + 1) &{} b_S b_E &{} (1 - b_S) b_E &{} b_S (1 - b_E) &{} (1 - b_S) (1 - b_E) \\ (+ 1, - 1) &{} c_S c_E &{} (1 - c_S) c_E &{} c_S (1 - c_E) &{} (1 - c_S) (1 - c_E) \\ (- 1, - 1) &{} d_S d_E &{} (1 - d_S) d_E &{} d_S (1 - d_E) &{} (1 - d_S) (1 - d_E) \\ \hline \end{array} \end{aligned}$$

Fig. 2 Mutual information between two time steps (Total_MI), Entropy of the system (H_sys), colonial (A) and organismal (A_star) individuality, and environmental determination (nC) for different values of \(\alpha _S\),\(\beta _S\), and for \(\gamma _S\) (subscript “S” omitted in the figure) with a random environment \(\alpha _E=\beta _E=\gamma _E=0\) Full size image

Fig. 3 Mutual information between two time steps (Total_MI), Entropy of the system (H_sys), colonial (A) and organismal (A_star) individuality, and environmental determination (nC) for different values of \(\alpha _S\),\(\beta _S\), and for \(\gamma _S\) with a correlated environment \(\alpha _E=2 \quad \beta _E=\gamma _E=0\) Full size image

We apply each individuality measure to this stochastic process. The results of this analysis are shown in Fig. 2 for a random environment and in Fig. 3 for an environment with memory. The panel sweeps through three coupling parameters for the systems state \(s_{n+1}\): \(\alpha _S\)—the coupling parameter of the system state to its previous state \(s_n\), \(\beta _S\)—the coupling parameter to the environment, and \(\gamma _S\) the coupling parameter mediating the combined influence of the previous system and environmental states. When \(\gamma _S=0\), we are not imposing any higher-order correlations on the time series.

When the value of \(\gamma _S=0\), we detect colonial individuals as well as organismal individuals most readily at high values of \(\alpha _S\) and \(\beta _S\). When there are no higher-order interactions between system and environment then these two types of individual become indistinguishable in this parameter region and represent unique information in the system state. Both forms of individuality become more visible as more information is transmitted into the future. In the case of a non-random environment, the system can adapt to the environment and we observe high values of \(A^\star\) together with low values of A and nC for high values of \(|\beta _S|\) and low values of \(\gamma _S\). Thus, the information flow from the environment into the system represented by high values of nC in the case of the random environment gets now internalized into the system.

As the value of \(\gamma _S\) increases, the signatures of the organismal and colonial individuals diverge. Colonial individuals are most apparent at low values of \(\alpha _S\) and \(\beta _S\) where most of the information persistence derives from ongoing interactions between system and environment. Organismal individuals begin to disappear at high \(\gamma\) as autonomy is lost. It is preserved only at high levels of \(\alpha\).

The environmentally determined information transforms into colonial individuality at low \(\gamma\) to becoming almost indistinguishable from it at high values of \(\gamma _S\). This is because when the system and environment become strongly coupled, complementary information comes to dominate the signal, and the environment on its own becomes less predictive of the future state of the system.

The effect of \(\gamma _S\) is to reduce the total entropy of the system (by creating systematic correlations and hence regularities in the information channel), and to reverse the pattern of total mutual information between successive time steps. This value is a minimum for low \(\alpha _S\) and \(\beta _S\) when \(\gamma _S=0\) and a maximum when \(\gamma _S=5.\)

From the previous empirical example, we discern a process for identifying different forms of informational individuals in a more general setting. We find that system–environment distinctions increase in those parameters that increase independent memory (\(\alpha , \beta\)) when higher-order coupling is low. When this coupling increases, organismal individuals disappear and colonial individuals appear with reduced independent memories.

Let as assume that the transition parameters are held constant and we vary the system states. By systematically increasing the number of variables that we assign to the target system while reducing the environmental states, we can deduce whether this procedure leads to an increase in a suitable individuality measure.

If the expansion of the boundary of the system does not lead to an increase in information, then we have incorporated an environmental variable needlessly. In this way, individuals maximize their prediction of the future while minimizing their coding capacity. If individuality increases as we expand our system and environmental determination decreases, then we have grounds for the belief that we are capturing more of the individual by including more processes formerly treated as environmental.

Let us denote the original system with S and the part of the initial environment which becomes the system by \(\Delta S\). The remaining environment should be denoted by \(E'\).

E nvironmental determination We get the two information flows $$\begin{aligned} nC&= I(S_{n+1};E_n|S_n) \\&= I(S_{n+1};E'_n \Delta S|S_n) \end{aligned}$$ and $$\begin{aligned} nC'=I(S_{n+1} \Delta S_{n+1};E'_n| \Delta S_n,S_n) \end{aligned}$$ Using some algebra we get $$\begin{aligned} nC'&= nC-I(S_{n+1};\Delta S_n|S_n)\\&\quad+I(\Delta S_{n+1};E'_n|\Delta S_n, S_{n+1},S_n) \end{aligned}$$ The first term subtracts the information flow which is now internalized and the second term adds the flow which resided previously in the environment. Clearly the system becomes more closed if the former, now internalized flow, is larger than the latter.

O rganismal individuality Let us start with the simpler measure \(A^*\), the mutual information between subsequent states: $$\begin{aligned} A^*=I(S_{n+1};S_n) \end{aligned}$$ and $$\begin{aligned} A'^*&= I(S_{n+1} \Delta S_{n+1};S_n \Delta S_n) \\&= A^*+I(\Delta S_{n+1};S_n|S_{n+1})+I(S_{n+1} \Delta S_{n+1};\Delta S_n|S_n) . \end{aligned}$$

Colonial individuality $$\begin{aligned} A&= I(S_{n+1};S_n|E_n) \\&= I(S_{n+1};S_n|E'_n \Delta S_n) \\ A'&= I(S_{n+1} \Delta S_{n+1};S_n \Delta S_{n}|E'_n) \\&= A+I(S_{n+1} \Delta S_{n+1};\Delta S_n|E'_n)\\&\quad +I(\Delta S_{n+1};S_n|E'_n \Delta S_n S_{n+1}) \\&= A+I(S_{n+1};\Delta S_n|E'_n)+I(\Delta S_{n+1};S_n \Delta S_n|E'_n S_{n+1}) \end{aligned}$$

Both individuality measures can only grow or stay constant with increasing system size when information is available but they never decrease. Thus, they are not sufficient to detect the precise boundaries between individuals. In order to obtain precise boundaries we would need to impose a cost function—or regularizer—on system size to establish a threshold for termination. Our objectives here are not to find the optimal partition but present different informational “windows” on individuality.