The first problem: What determines to what extent a system has conscious experience?

We all know that our own consciousness waxes when we awaken and wanes when we fall asleep. We may also know first-hand that we can "lose consciousness" after receiving a blow on the head, or after taking certain drugs, such as general anesthetics. Thus, everyday experience indicates that consciousness has a physical substrate, and that that physical substrate must be working in the proper way for us to be fully conscious. It also prompts us to ask, more generally, what may be the conditions that determine to what extent consciousness is present. For example, are newborn babies conscious, and to what extent? Are animals conscious? If so, are some animals more conscious than others? And can they feel pain? Can a conscious artifact be constructed with non-neural ingredients? Is a person with akinetic mutism – awake with eyes open, but mute, immobile, and nearly unresponsive – conscious or not? And how much consciousness is there during sleepwalking or psychomotor seizures? It would seem that, to address these questions and obtain a genuine understanding of consciousness, empirical studies must be complemented by a theoretical analysis.

Consciousness as information integration

The theory presented here claims that consciousness has to do with the capacity to integrate information. This claim may not seem self-evident, perhaps because, being endowed with consciousness for most of our existence, we take it for granted. To gain some perspective, it is useful to resort to some thought experiments that illustrate key properties of subjective experience: its informativeness, its unity, and its spatio-temporal scale.

Information

Consider the following thought experiment. You are facing a blank screen that is alternately on and off, and you have been instructed to say "light" when the screen turns on and "dark" when it turns off. A photodiode – a very simple light-sensitive device – has also been placed in front of the screen, and is set up to beep when the screen emits light and to stay silent when the screen does not. The first problem of consciousness boils down to this. When you differentiate between the screen being on or off, you have the conscious experience of "seeing" light or dark. The photodiode can also differentiate between the screen being on or off, but presumably it does not consciously "see" light and dark. What is the key difference between you and the photodiode that makes you "see" light consciously? (see Appendix, i)

According to the theory, the key difference between you and the photodiode has to do with how much information is generated when that differentiation is made. Information is classically defined as reduction of uncertainty among a number of alternatives outcomes when one of them occurs [4]. It can be measured by the entropy function, which is the weighted sum of the logarithm of the probability (p) of alternatives outcomes (i): H = - Σp i log 2 p i . Thus, tossing a fair coin and obtaining heads corresponds to 1 bit of information, because there are just two alternatives; throwing a fair die yields log 2 (6) ≈ 2.59 bits of information, because there are six equally likely alternatives (H decreases if some of the outcomes are more likely than others, as would be the case with a loaded die).

When the blank screen turns on, the photodiode enters one of its two possible alternative states and beeps. As with the coin, this corresponds to 1 bit of information. However, when you see the blank screen turn on, the state you enter, unlike the photodiode, is one out of an extraordinarily large number of possible states. That is, the photodiode's repertoire is minimally differentiated, while yours is immensely so. It is not difficult to see this. For example, imagine that, instead of turning homogeneously on, the screen were to display at random every frame from every movie that was or could ever be produced. Without any effort, each of these frames would cause you to enter a different state and "see" a different image. This means that when you enter the particular state ("seeing light") you rule out not just "dark", but an extraordinarily large number of alternative possibilities. Whether you think or not of the bewildering number of alternatives (and you typically don't), this corresponds to an extraordinary amount of information (see Appendix, ii). This point is so simple that its importance has been overlooked.

Integration

While the ability to differentiate among a very large number of states is a major difference between you and the lowly photodiode, by itself it is not enough to account for the presence of conscious experience. To see why, consider an idealized one megapixel digital camera, whose sensor chip is essentially a collection of one million photodiodes. Even if each photodiode in the sensor chip were just binary, the camera as such could differentiate among 21,000,000 states, an immense number, corresponding to 1,000,000 bits of information. Indeed, the camera would easily enter a different state for every frame from every movie that was or could ever be produced. Yet nobody would believe that the camera is conscious. What is the key difference between you and the camera?

According to the theory, the key difference between you and the camera has to do with information integration. From the perspective of an external observer, the camera chip can certainly enter a very large number of different states, as could easily be demonstrated by presenting it with all possible input signals. However, the sensor chip can be considered just as well as a collection of one million photodiodes with a repertoire of two states each, rather than as a single integrated system with a repertoire of 21,000,000 states. This is because, due to the absence of interactions among the photodiodes within the sensory chip, the state of each element is causally independent of that of the other elements, and no information can be integrated among them. Indeed, if the sensor chip were literally cut down into its individual photodiodes, the performance of the camera would not change at all.

By contrast, the repertoire of states available to you cannot be subdivided into the repertoire of states available to independent components. This is because, due to the multitude of causal interactions among the elements of your brain, the state of each element is causally dependent on that of other elements, which is why information can be integrated among them. Indeed, unlike disconnecting the photodiodes in a camera sensor, disconnecting the elements of your brain that underlie consciousness has disastrous effects. The integration of information in conscious experience is evident phenomenologically: when you consciously "see" a certain image, that image is experienced as an integrated whole and cannot be subdivided into component images that are experienced independently. For example, no matter how hard you try, for example, you cannot experience colors independent of shapes, or the left half of the visual field of view independently of the right half. And indeed, the only way to do so is to physically split the brain in two to prevent information integration between the two hemispheres. But then, such split-brain operations yield two separate subjects of conscious experience, each of them having a smaller repertoire of available states and more limited performance [5].

Spatio-temporal characteristics

Finally, it is important to appreciate that conscious experience unfolds at a characteristic spatio-temporal scale. For instance, it flows in time at a characteristic speed and cannot be much faster or much slower. No matter how hard you try, you cannot speed up experience to follow a move accelerated a hundred times, not can you slow it down if the movie has decelerated. Studies of how a percept is progressively specified and stabilized – a process called microgenesis – indicate that it takes up to 100–200 milliseconds to develop a fully formed sensory experience, and that the surfacing of a conscious thought may take even longer [6]. In fact, the emergence of a visual percept is somewhat similar to the developing of a photographic print: first there is just the awareness that something has changed, then that it is something visual rather than, say, auditory, later some elementary features become apparent, such as motion, localization, and rough size, then colors and shapes emerge, followed by the formation of a full object and its recognition – a sequence that clearly goes from less to more differentiated [6]. Other evidence indicates that a single conscious moment does not extend beyond 2–3 seconds [7]. While it is arguable whether conscious experience unfolds more akin to a series of discrete snapshots or to a continuous flow, its time scale is certainly comprised between these lower and upper limits. Thus, a phenomenological analysis indicates that consciousness has to do with the ability to integrate a large amount of information, and that such integration occurs at a characteristic spatio-temporal scale.

Measuring the capacity to integrate information: The Φ of a complex

If consciousness corresponds to the capacity to integrate information, then a physical system should be able to generate consciousness to the extent that it has a large repertoire of available states (information), yet it cannot be decomposed into a collection of causally independent subsystems (integration). How can one identify such an integrated system, and how can one measure its repertoire of available states [2, 8]?

As was mentioned above, to measure the repertoire of states that are available to a system, one can use the entropy function, but this way of measuring information is completely insensitive to whether the information is integrated. Thus, measuring entropy would not allow us to distinguish between one million photodiodes with a repertoire of two states each, and a single integrated system with a repertoire of 21,000,000 states. To measure information integration, it is essential to know whether a set of elements constitute a causally integrated system, or they can be broken down into a number of independent or quasi-independent subsets among which no information can be integrated.

To see how one can achieve this goal, consider an extremely simplified system constituted of a set of elements. To make matters slightly more concrete, assume that we are dealing with a neural system. Each element could represent, for instance, a group of locally interconnected neurons that share inputs and outputs, such as a cortical minicolumn. Assume further that each element can go through discrete activity states, corresponding to different firing levels, each of which lasts for a few hundred milliseconds. Finally, for the present purposes, let us imagine that the system is disconnected from external inputs, just as the brain is virtually disconnected from the environment when it is dreaming.

Effective information

Consider now a subset S of elements taken from such a system, and the diagram of causal interactions among them (Fig. 1a). We want to measure the information generated when S enters a particular state out of its repertoire, but only to the extent that such information can be integrated, i.e. each state results from causal interactions within the system. How can one do so? One way is to divide S into two complementary parts A and B, and evaluate the responses of B that can be caused by all possible inputs originating from A. In neural terms, we try out all possible combinations of firing patterns as outputs from A, and establish how differentiated is the repertoire of firing patterns they produce in B. In information-theoretical terms, we give maximum entropy to the outputs from A (AHmax), i.e. we substitute its elements with independent noise sources, and we determine the entropy of the responses of B that can be induced by inputs from A. Specifically, we define the effective information between A and B as EI(A→B) = MI(AHmax;B). Here MI(A;B) = H(A) + H(B) - H(AB) stands for mutual information, a measure of the entropy or information shared between a source (A) and a target (B). Note that since A is substituted by independent noise sources, there are no causal effects of B on A; therefore the entropy shared by B and A is necessarily due to causal effects of A on B. Moreover, EI(A→B) measures all possible effects of A on B, not just those that are observed if the system were left to itself. Also, EI(A→B) and EI(B→A) in general are not symmetric. Finally, note that the value of EI(A→B) is bounded by AHmax and BHmax, whichever is less. In summary, to measure EI(B→A), one needs to apply maximum entropy to the outputs from B, and determine the entropy of the responses of B that are induced by inputs from A. It should be apparent from the definition that EI(A→B) will be high if the connections between A and B are strong and specialized, such that different outputs from A will induce different firing patterns in B. On the other hand, EI(A→B) will be low or zero if the connections between A and B are such that different outputs from A produce scarce effects, or if the effect is always the same. For a given bipartition of a subset, then, the sum of the effective information for both directions is indicated as EI(A B) = EI(A→B) + EI(B→A). Thus, EI(A B) measures the repertoire of possible causal effects of A on B and of B on A.

Figure 1 Effective information, minimum information bipartition, and complexes. a. Effective information. Shown is a single subset S of 4 elements ({1,2,3,4}, blue circle), forming part of a larger system X (black ellipse). This subset is bisected into A and B by a bipartition ({1,3}/{2,4}, indicated by the dotted grey line). Arrows indicate causally effective connections linking A to B and B to A across the bipartition (other connections may link both A and B to the rest of the system X). To measure EI(A→B), maximum entropy Hmax is injected into the outgoing connections from A (corresponding to independent noise sources). The entropy of the states of B that is due to the input from A is then measured. Note that A can affect B directly through connections linking the two subsets, as well as indirectly via X. Applying maximum entropy to B allows one to measure EI(B→A). The effective information for this bipartition is EI(A B) = EI(A→B) + EI(B→A). b. Minimum information bipartition. For subset S = {1,2,3,4}, the horizontal bipartition {1,3}/{2,4} yields a positive value of EI. However, the bipartition {1,2}/{3,4} yields EI = 0 and is a minimum information bipartition (MIB) for this subset. The other bipartitions of subset S = {1,2,3,4} are {1,4}/{2,3}, {1}/{2,3,4}, {2}/{1,3,4}, {3}/{1,2,4}, {4}/{1,2,3}, all with EI>0. c. Analysis of complexes. By considering all subsets of system X one can identify its complexes and rank them by the respective values of Φ – the value of EI for their minimum information bipartition. Assuming that other elements in X are disconnected, it is easy to see that Φ>0 for subset {3,4} and {1,2}, but Φ = 0 for subsets {1,3}, {1,4}, {2,3}, {2,4}, {1,2,3}, {1,2,4}, {1,3,4}, {2,3,4}, and {1,2,3,4}. Subsets {3,4} and {1,2} are not part of a larger subset having higher Φ, and therefore they constitute complexes. This is indicated schematically by having them encircled by a grey oval (darker grey indicates higher Φ). Methodological note. In order to identify complexes and their Φ(S) for systems with many different connection patterns, each system X was implemented as a stationary multidimensional Gaussian process such that values for effective information could be obtained analytically (details in [8]). Briefly, in order to identify complexes and their Φ(S) for systems with many different connection patterns, we implemented numerous model systems X composed of n neural elements with connections CON ij specified by a connection matrix CON(X) (no self-connections). In order to compare different architectures, CON(X) was normalized so that the absolute value of the sum of the afferent synaptic weights per element corresponded to a constant value w<1 (here w = 0.5). If the system's dynamics corresponds to a multivariate Gaussian random process, its covariance matrix COV(X) can be derived analytically. As in previous work, we consider the vector X of random variables that represents the activity of the elements of X, subject to independent Gaussian noise R of magnitude c. We have that, when the elements settle under stationary conditions, X = X * CON(X) + cR. By defining Q = (1-CON(X))-1 and averaging over the states produced by successive values of R, we obtain the covariance matrix COV(X) = <X*X> = <Qt * Rt * R * Q> = Qt * Q, where the superscript t refers to the transpose. Under Gaussian assumptions, all deviations from independence among the two complementary parts A and B of a subset S of X are expressed by the covariances among the respective elements. Given these covariances, values for the individual entropies H(A) and H(B), as well as for the joint entropy of the subset H(S) = H(AB) can be obtained as, for example, H(A) = (1/2)ln [(2π e)n|COV(A)|], where |•| denotes the determinant. The mutual information between A and B is then given by MI(A;B) = H(A) + H(B) - H(AB). Note that MI(A:B) is symmetric and positive. To obtain the effective information between A and B within model systems, independent noise sources in A are enforced by setting to zero strength the connections within A and afferent to A. Then the covariance matrix for A is equal to the identity matrix (given independent Gaussian noise), and any statistical dependence between A and B must be due to the causal effects of A on B, mediated by the efferent connections of A. Moreover, all possible outputs from A that could affect B are evaluated. Under these conditions, EI(A→B) = MI(AHmax;B). The independent Gaussian noise R applied to A is multiplied by c p , the perturbation coefficient, while the independent Gaussian noise applied to the rest of the system is given by c i , the intrinsic noise coefficient. Here c p = 1 and c i = 0.00001 in order to emphasize the role of the connectivity and minimize that of noise. To identify complexes and obtain their capacity for information integration, one considers every subset S of X composed of k elements, with k = 2,..., n. For each subset S, we consider all bipartitions and calculate EI(A B) for each of them. We find the minimum information bipartition MIB(S), the bipartition for which the normalized effective information reaches a minimum, and the corresponding value of Φ(S). We then find the complexes of X as those subsets S with Φ>0 that are not included within a subset having higher Φ and rank them based on their Φ(S) value. The complex with the maximum value of Φ(S) is the main complex. MATLAB functions used for calculating effective information and complexes are at http://tononi.psychiatry.wisc.edu/informationintegration/toolbox.html. Full size image

Information integration

Based on the notion of effective information for a bipartition, we can assess how much information can be integrated within a system of elements. To this end, we note that a subset S of elements cannot integrate any information (as a subset) if there is a way to partition S in two parts A and B such that EI(A

B) = 0 (Fig. 1b, vertical bipartition). In such a case, in fact, we would clearly be dealing with at least two causally independent subsets, rather than with a single, integrated subset. This is exactly what would happen with the photodiodes making up the sensor of a digital camera: perturbing the state of some of the photodiodes would make no difference to the state of the others. Similarly, a subset can integrate little information if there is a way to partition it in two parts A and B such that EI(A B) is low: the effective information across that bipartition is the limiting factor on the subset's information integration capacity. Therefore in order to measure the information integration capacity of a subset S, we should search for the bipartition(s) of S for which EI(A B) reaches a minimum (the informational "weakest link")." Since EI(A B) is necessarily bounded by the maximum entropy available to A or B, min{EI(A B)}, to be comparable over bipartitions, should be normalized by Hmax(A B) = min{Hmax(A); Hmax(B)}, the maximum information capacity for each bipartition. The minimum information bipartition MIBA B of subset S – its 'weakest link' – is its bipartition for which the normalized effective information reaches a minimum, corresponding to min{EI(A B)/Hmax(A B)}. The information integration for subset S, or Φ(S), is simply the (non-normalized) value of EI(A B) for the minimum information bipartition: Φ(S) = EI(MIBA B). The symbol Φ is meant to indicate that the information (the vertical bar "I") is integrated within a single entity (the circle "O", see Appendix, iii).

Complexes

We are now in a position to establish which subsets are actually capable of integrating information, and how much of it (Fig. 1c). To do so, we consider every possible subset S of m elements out of the n elements of a system, starting with subsets of two elements (m = 2) and ending with a subset corresponding to the entire system (m = n). For each of them, we measure the value of Φ, and rank them from highest to lowest. Finally, we discard all those subsets that are included in larger subsets having higher Φ (since they are merely parts of a larger whole). What we are left with are complexes – individual entities that can integrate information. Specifically, a complex is a subset S having Φ>0 that is not included within a larger subset having higher Φ. For a complex, and only for a complex, it is appropriate to say that, when it enters a particular state out if its repertoire, it generates and amount of integrated information corresponding to its Φ value. Of the complexes that make up a given system, the one with the maximum value of Φ(S) is called the main complex (the maximum is taken over all combinations of m>1 out of n elements of the system). Some properties of complexes worth pointing out are, for instance, that a complex can be causally connected to elements that are not part of it (the input and output elements of a complex are called ports-in and ports-out, respectively). Also, the same element can belong to more than one complex, and complexes can overlap.

In summary, a system can be analyzed to identify its complexes – those subsets of elements that can integrate information, and each complex will have an associated value of Φ – the amount of information it can integrate (see Appendix, iv). To the extent that consciousness corresponds to the capacity to integrate information, complexes are the "subjects" of experience, being the locus where information can be integrated. Since information can only be integrated within a complex and not outside its boundaries, consciousness as information integration is necessarily subjective, private, and related to a single point of view or perspective [1, 9]. It follows that elements that are part of a complex contribute to its conscious experience, while elements that are not part of it do not, even though they may be connected to it and exchange information with it through ports-in and ports-out.

Information integration over space and time

The Φ value of a complex is dependent on both spatial and temporal scales that determine what counts as a state of the underlying system. In general, there will be a "grain size", in both space and time, at which Φ reaches a maximum. In the brain, for example, synchronous firing of heavily interconnected groups of neurons sharing inputs and outputs, such as cortical minicolumns, may produce significant effects in the rest of the brain, while asynchronous firing of various combinations of individual neurons may be less effective. Thus, Φ values may be higher when considering as elements cortical minicolumns rather than individual neurons, even if their number is lower. On the other hand, Φ values would be extremely low with elements the size of brain areas. Time wise, Φ values in the brain are likely to show a maximum between tens and hundreds of milliseconds. It is clear, for example, that if one were to stimulate one half of the brain by inducing many different firing patterns, and examine what effects this produces on the other half, no stimulation pattern would produce any effect whatsoever after just a tenth of a millisecond, and Φ would be equal to zero. After say 100 milliseconds, however, there is enough time for differential effects to be manifested, and Φ would grow. On the other hand, given the duration of conduction delays and of postsynaptic currents, much longer intervals are not going to increase Φ values. Indeed, a neural system will soon settle down into states that become progressively more independent of the stimulation. Thus, the search for complexes of maximum Φ should occur over subsets at critical spatial and temporal scales.

To recapitulate, the theory claims that consciousness corresponds to the capacity to integrate information. This capacity, corresponding to the quantity of consciousness, is given by the Φ value of a complex. Φ is the amount of effective information that can be exchanged across the minimum information bipartition of a complex. A complex is a subset of elements with Φ>0 and with no inclusive subset of higher Φ. The spatial and temporal scales defining the elements of a complex and the time course of their interactions are those that jointly maximize Φ.

The second problem: What determines the kind of consciousness a system has?

Even if we were reasonably sure that a system is conscious, it is not immediately obvious what kind of consciousness it would have. As was mentioned early on, our own consciousness comes in specific and seemingly irreducible qualities, exemplified by different modalities (e.g. vision, audition, pain), submodalities (e.g. visual color and motion), and dimensions (e.g. blue and red). What determines that colors look the way they do, and different from the way music sounds, or pain feels? And why can we not even imagine what a "sixth" sense would feel like? Or consider the conscious experience of others. Does a gifted musician experience the sound of an orchestra the same way you do, or is his experience richer? And what about bats [10]? Assuming that they are conscious, how do they experience the world they sense through echolocation? Is their experience of the world vision-like, audition-like, or completely alien to us? Unless we accept that the kind of consciousness a system has is arbitrary, there must be some necessary and sufficient conditions that determine exactly what kind of experiences it can have. This is the second problem of consciousness.

While it may not be obvious how best to address this problem, we do know that, just as the quantity of our consciousness depends on the proper functioning of a physical substrate – the brain, so does the quality of consciousness. Consider for example the acquisition of new discriminatory abilities, such as becoming expert at wine tasting. Careful studies have shown that we do not learn to distinguish among a large number of different wines merely by attaching the appropriate labels to different sensations that we had had all along. Rather, it seems that we actually enlarge and refine the set of sensations triggered by tasting wines. Similar observations have been made by people who, for professional reasons, learn to discriminate among perfumes, colors, sounds, tactile sensations, and so on. Or consider perceptual learning during development. While infants experience more than just a "buzzing confusion", there is no doubt that perceptual abilities undergo considerable refinement – just consider what your favorite red wine must have tasted like when all you had experienced was milk and water.

These examples indicate that the quality and repertoire of our conscious experience can change as a result of learning. What matters here is that such perceptual learning depends upon specific changes in the physical substrate of our consciousness – notably a refinement and rearranging of connections patterns among neurons in appropriate parts of the thalamocortical system (e.g [11]). Further evidence for a strict association between the quality of conscious experience and brain organization comes from countless neurological studies. Thus, we know that damage to certain parts of the cerebral cortex forever eliminates our ability to perceive visual motion, while leaving the rest of our consciousness seemingly intact. By contrast, damage to other parts selectively eliminates our ability to perceive colors. [12]. There is obviously something about the organization of those cortical areas that makes them contribute different qualities – visual motion and color – to conscious experience. In this regard, it is especially important that the same cortical lesion that eliminates the ability to perceive color or motion also eliminates the ability to remember, imagine, and dream in color or motion. By contrast, lesions of the retina, while making us blind, do not prevent us from remembering, imagining, and dreaming in color (unless they are congenital). Thus, it is something having to do with the organization of certain cortical areas – and not with their inputs from the sensory periphery – that determines the quality of conscious experiences we can have. What is this something?

Characterizing the quality of consciousness as a space of informational relationships: The effective information matrix

According to the theory, just as the quantity of consciousness associated with a complex is determined by the amount of information that can be integrated among its elements, the quality of its consciousness is determined by the informational relationships that causally link its elements [13]. That is, the way information can be integrated within a complex determines not only how much consciousness is has, but also what kind of consciousness. More precisely, the theory claims that the elements of a complex constitute the dimensions of an abstract relational space, the qualia space. The values of effective information among the elements of a complex, by defining the relationships among these dimensions, specify the structure of this space (in a simplified, Cartesian analogue, each element is a Cartesian axis, and the effective information values between elements define the angles between the axes, see Appendix, v). This relational space is sufficient to specify the quality of conscious experience. Thus, the reason why certain cortical areas contribute to conscious experience of color and other parts to that of visual motion has to do with differences in the informational relationships both within each area and between each area and the rest of the main complex. By contrast, the informational relationships that exist outside the main complex – including those involving sensory afferents – do not contribute either to the quantity or to the quality of consciousness.

To exemplify, consider two very simple linear systems of four elements each (Fig. 2). Fig. 2a shows the diagram of causal interactions for the two systems. The system on the left is organized as a divergent digraph: element number 1 sends connections of equal strength to the other three elements. The analysis of complexes shows that this system forms a single complex having a Φ value of 10 bits. The system on the right is organized as a chain: element number 1 is connected to 2, which is connected to 3, which is connected to 4. This system also constitutes a single complex having a Φ value of 10 bits. Fig. 2b shows the effective information matrix for both complexes. This contains the values of EI between each subset of elements and every other subset, corresponding to all informational relationships among the elements (the first row shows the values in one direction, the second row in the reciprocal direction). The elements themselves define the dimensions of the qualia space of each complex, in this case four. The effective information matrix defines the relational structure of the space. This can be thought of as a kind of topology, in that the entries in the matrix can be considered to represent how close such dimensions are to each other (see Appendix, vi). It is apparent that, despite the identical value of Φ and the same number of dimensions, the informational relationships that define the space are different for the two complexes. For example, the divergent complex has many more zero entries, while the chain complex has one entry (subset {1 3} to subset {2 4}) that is twice as strong as all other non-zero entries.

Figure 2 Effective information matrix and activity states for two complexes having the same value of Φ. a. Causal interactions diagram and analysis of complexes. Shown are two systems, one with a "divergent" architecture (left) and one with a "chain" architecture (right). The analysis of complexes shows that both contain a complex of four elements having a Φ value of 10. b. Effective information matrix. Shown is the effective information matrix for the two complexes above. For each complex, all bipartitions are indicated by listing one part (subset A) on the upper row and the complementary part (subset B) on the lower row. In between are the values of effective information from A to B and from B to A for each bipartition, color-coded as black (zero), red (intermediate value) and yellow (high value). Note that the effective information matrix is different for the two complexes, even though Φ is the same. The effective information matrix defines the set of informational relationships, or "qualia space" for each complex. Note that the effective information matrix refers exclusively to the informational relationships within the main complex (relationships with elements outside the main complex, represented here by empty circles, do not contribute to qualia space). c. State diagram. Shown are five representative states for the two complexes. Each is represented by the activity state of the four elements of each complex arranged in a column (blue: active elements; black: inactive ones). The five states can be thought of, for instance, as evolving in time due the intrinsic dynamics of the system or to inputs from the environment. Although the states are identical for the two complexes, their meaning is different because of the difference in the effective information matrix. The last four columns represent four special states, those corresponding to the activation of one element at a time. Such states, if achievable, would correspond most closely to the specific "quale" contributed by that particular element in that particular complex. Full size image

These two examples are purely meant to illustrate how the space of informational relationships within a complex can be captured by the effective information matrix, and how that space can differ for two complexes having similar amounts of Φ and the same number of dimensions. Of course, for a complex having high values of Φ, such as the one underlying our own consciousness, qualia space would be extraordinarily large and intricately structured. Nevertheless, it is a central claim of the theory that the structure of phenomenological relationships should reflect directly that of informational relationships. For example, the conscious experiences of blue and red appear irreducible (red is not simply less of blue). They may therefore correspond to different dimensions of qualia space (different elements of the complex). We also know that, as different as blue and red may be subjectively, they are much closer to each other than they are, say, to the blaring of a trumpet. EI values between the neuronal groups underlying the respective dimensions should behave accordingly, being higher between visual elements than between visual and auditory elements. As to the specific quality of different modalities and submodalities, the theory predicts that they are due to differences in the set of informational relationships within the respective cortical areas and between each area and the rest of the main complex. For example, areas that are organized topographically and areas that are organized according to a "winner takes all" arrangement should contribute different kinds of experiences. Another prediction is that changes in the quality and repertoire of sensations as a result of perceptual learning would also correspond to a refinement of the informational relationships within and between the appropriate cortical areas belonging to the main complex. By contrast, the theory predicts that informational relationships outside a complex – including those among sensory afferents – should not contribute directly to the quality of conscious experience of that complex. Of course, sensory afferents, sensory organs, and ultimately the nature and statistics of external stimuli, play an essential role in shaping the informational relationships among the elements of the main complex – but such role is an indirect and historical one – played out through evolution, development, and learning [14] (see Appendix, vii).

Specifying each conscious experience: The state of the interaction variables

According to the theory, once the quantity and quality of conscious experience that a complex can have are specified, the particular conscious state or experience that the complex will have at any given time is specified by the activity state of its elements at that time (in a Cartesian analogue, if each element of the complex corresponds to an axis of qualia space, and effective information values between elements define the angles between the axes specifying the structure of the space, then the activity state of each element provides a coordinate along its axis, and each conscious state is defined by the set of all its coordinates). The relevant activity variables are those that mediate the informational relationships among the elements, that is, those that mediate effective information. For example, if the elements are local groups of neurons, then the relevant variables are their firing patterns over tens to hundreds of milliseconds.

The state of a complex at different times can be represented schematically by a state diagram as in Fig. 2c (for the divergent complex on the left and the chain complex on the right). Each column in the state diagram shows the activity values of all elements of a complex (here between 0 and 1). Different conscious states correspond to different patterns of activity distributed over all the elements of a complex, with no contribution from elements outside the complex. Each conscious state can thus be thought of as a different point in the multidimensional qualia space defined by the effective information matrix of a complex (see Appendix, viii). Therefore, a succession or flow of conscious states over time can be thought of as a trajectory of points in qualia space. The state diagram also illustrates some states that have particular significance (second to fifth column). These are the states with just one active element, and all other elements silent (or active at some baseline level). It is not clear whether such highly selective states can be achieved within a large neural complex of high Φ, such as that one that is postulated to underlie human consciousness. To the extent that this is possible, such highly selective states would represent the closest approximation to experiencing that element's specific contribution to consciousness – its quality or "quale". However, because of the differences in the qualia space between the two complexes, the same state over the four elements would correspond to different experiences (and mean different things) for the two complexes. It should also be emphasized that, in every case, it is the activity state of all elements of the complex that defines a given conscious state, and both active and inactive elements count.

To recapitulate, the theory claims that the quality of consciousness associated with a complex is determined by its effective information matrix. The effective information matrix specifies all informational relationships among the elements of a complex. The values of the variables mediating informational interactions among the elements of a complex specify the particular conscious experience at any given time.