Simultaneously measuring the activities of all neurons in a mammalian brain at millisecond resolution is a challenge beyond the limits of existing techniques in neuroscience. Entirely new approaches may be required, motivating an analysis of the fundamental physical constraints on the problem. We outline the physical principles governing brain activity mapping using optical, electrical, magnetic resonance, and molecular modalities of neural recording. Focusing on the mouse brain, we analyze the scalability of each method, concentrating on the limitations imposed by spatiotemporal resolution, energy dissipation, and volume displacement. Based on this analysis, all existing approaches require orders of magnitude improvement in key parameters. Electrical recording is limited by the low multiplexing capacity of electrodes and their lack of intrinsic spatial resolution, optical methods are constrained by the scattering of visible light in brain tissue, magnetic resonance is hindered by the diffusion and relaxation timescales of water protons, and the implementation of molecular recording is complicated by the stochastic kinetics of enzymes. Understanding the physical limits of brain activity mapping may provide insight into opportunities for novel solutions. For example, unconventional methods for delivering electrodes may enable unprecedented numbers of recording sites, embedded optical devices could allow optical detectors to be placed within a few scattering lengths of the measured neurons, and new classes of molecularly engineered sensors might obviate cumbersome hardware architectures. We also study the physics of powering and communicating with microscale devices embedded in brain tissue and find that, while radio-frequency electromagnetic data transmission suffers from a severe power–bandwidth tradeoff, communication via infrared light or ultrasound may allow high data rates due to the possibility of spatial multiplexing. The use of embedded local recording and wireless data transmission would only be viable, however, given major improvements to the power efficiency of microelectronic devices.

“To understand in depth what is going on in a brain, we need tools that can fit inside or between neurons and transmit reports of neural events to receivers outside. We need observing instruments that are local, non-destructive and non-invasive, with rapid response, high band-width and high spatial resolution… There is no law of physics that declares such an observational tool to be impossible.”

Freeman Dyson, Imagined Worlds, 1997

1. Introduction

Neuroscience depends on monitoring the electrical activities of neurons within functioning brains (Alivisatos et al., 2012; Bansal et al., 2012; Gerhard et al., 2013) and has advanced through steady improvements in the underlying observational tools. The number of neurons simultaneously recorded using wired electrodes, for example, has doubled every 7 years since the 1950s, currently allowing electrical observation of hundreds of neurons at sub-millisecond timescales (Stevenson and Kording, 2011). Recording techniques have also diversified: activity-dependent optical signals from neurons endowed with fluorescent indicators can be measured by photodetectors, and radio-frequency emissions from excited nuclear spins allow the construction of magnetic resonance images modulated by activity-dependent contrast mechanisms. Ideas for alternative methods have been proposed, including the direct recording of neural activities into information-bearing biopolymers (Kording, 2011; Zamft et al., 2012; Glaser et al., 2013).

Each modality of neural recording has characteristic advantages and disadvantages. Multi-electrode arrays enable the recording of ~250 neurons at sub-millisecond temporal resolutions. Optical microscopy can currently record ~100,000 neurons at a 1.25 s timescale in behaving larval zebrafish using light-sheet illumination (Ahrens et al., 2013), or hundreds to thousands of neurons at a ~100 ms timescale in behaving mice using a 1-photon fiber scope (Ziv et al., 2013). Magnetic resonance imaging (MRI) allows non-invasive whole brain recordings at a 1 s timescale, but is far from single neuron spatial resolution, in part due to the use of hemodynamic contrast. Finally, molecular recording devices have been proposed for scalable physiological signal recording but have not yet been demonstrated in neurons (Kording, 2011; Zamft et al., 2012; Glaser et al., 2013).

Figure 1 illustrates the recording modalities studied here. While further development of these methods promises to be a crucial driver for future neuroscience research (Kandel et al., 2013), their fundamental scaling limits are not immediately obvious. Furthermore, inventing new technologies for scalable neural recording requires a quantitative understanding of the engineering problems that such technologies must solve, a landscape of constraints which should inform design decisions.

FIGURE 1

Figure 1. Four generalized neural recording modalities. (A) Extracellular electrical recording probes the voltage due to nearby neurons. (B) Optical microscopy detects light emission from activity-dependent indicators. In two-photon laser scanning microscopy, shown here, an excitation beam at 2× the peak excitation wavelength of the fluorescent indicator is scanned across the sample, while an integrating detector captures the emitted fluorescence. (C) Magnetic resonance imaging detects radio-frequency magnetic induction signals from aqueous protons, after weak thermal alignment of the proton spins by a static magnetic field. A resonant radio-frequency pulse tips the spins into a plane perpendicular to the static field, causing the net magnetization to precess. The resulting signals are affected by the local chemical and magnetic environment, which can be altered dynamically by imaging agents in response to neural activity. Activity-dependent contrast agents are necessary to transduce neural activity into an MRI readout, whereas current functional MRI methods rely on blood oxygenation signals which cannot reach single-neuron resolution. (D) Molecular recording devices have been proposed, in which a “ticker tape” - record of neural activity is encoded in the monomer sequence of a biomolecular polymer – a form of nano-scale local data storage. This could be achieved by coupling correlates of neural activity to the nucleotide misincorporation probabilities of a DNA or RNA polymerase as it replicates or transcribes a known DNA strand.

Our analysis is predicated on assumptions that enable us to estimate scaling limits. These include assumptions about basic properties of the brain, which are treated in Section 2, as well as those pertaining to the required measurement resolution and the limits to which a neural recording method may perturb brain tissue, which are treated in Section 3. Together, these considerations form the basis for our estimates of the prospects for scaling of neural recording technologies. We analyze four modalities of brain activity mapping—electrical, optical, magnetic resonance and molecular—in light of these assumptions, and conclude with a discussion on opportunities for new developments.

Importantly, our assumptions, analyses and the conclusions thereof are intended as first approximations and are subject to debate. We anticipate that as much can be learned from where our logic breaks down as from where it succeeds, and from methods to work around the limits imposed by our assumptions.

2. Basic Constraints

2.1. Mouse Brain

The mouse brain contains ~7.5 × 107 neurons in a volume of ~420 mm3 (Vincent et al., 2010) and weighs about 0.5 g. The packing density of neurons varies widely between brain regions. In the below, we will use a cell density of ρ neurons ≈ 92,000/mm3, as measured for mouse cortex (Braitenberg and Schüz, 1991). This corresponds roughly to one neuron per 22 μm voxel. The density of cortical synapses, on the other hand, approaches 109/mm3, i.e., one synapse per 1 μm3 voxel. For comparison, the human brain has roughly 8 × 1010 neurons (Azevedo et al., 2009) in a volume of 1200 cm3 (Allen et al., 2002).

The human brain consumes ~15 W of power (performing, at synapses, a rough equivalent of at least 1017 floating point computational operations per second on that power budget, according to one definition (Sarpeshkar, 2010), although the analogy with digital computers should not be taken literally). Because power consumption scales approximately linearly with the number of neurons (Herculano-Houzel, 2011), the mouse brain is expected to utilize ~15 mW. For comparison, the metabolic rate of the ~20–30 g mouse is ~200–600 mW depending on its degree of physical activity (Speakman, 2013).

2.2. Neural Activities

Action potentials (spikes) last ~2 ms. The rate of neuronal spiking is highly variable. Some authors have assumed an average rate of 5 Hz (Sarpeshkar, 2010; Harris et al., 2012), but certain neurons spike at 500 Hz or faster (Gittis et al., 2010), while many neurons spike much more slowly. For example, cerebellar granule cells, which make up half of the neurons in the brain, have spontaneous firing rates of ~0.5 Hz (Chadderton et al., 2004). In neocortex, one analysis estimated 0.16 spikes per second per neuron (in primate) as energetically sustainable (Lennie, 2003). There may be as much as a two-fold change in metabolism and hence firing rate across brain states (Howarth et al., 2012). Certain neurons (possibly up to 90% for some neuron types in some brain areas) may be effectively silent (Shoham et al., 2006; Barth and Poulet, 2012), e.g., spiking less than once every 10 s. Some studies have attempted to measure the distribution of neural firing rates in various cortical areas (as opposed to just the average rate), and have observed that these distributions are often long-tailed: a small minority of the neurons fires a majority of the spikes (Shafi et al., 2007; Hromádka et al., 2008; O'Connor et al., 2010; Roxin et al., 2011).

While these estimates of typical firing rates are useful numbers to have in mind, in the below we aim to sample all neurons at 1 kHz rates (or higher for techniques requiring observation of detailed spike waveforms). This choice is informed by several factors. First, measuring spike timing with millisecond precision is relevant for understanding network function, due to the possibilities for timing codes, spike-timing dependent plasticity mechanisms, and other effects relying on temporally-precise spiking patterns (Markram et al., 2011; Babadi and Abbott, 2013; Gire et al., 2013; Taillefumier and Magnasco, 2013). In this regard, it is also important for a recording method to maintain precise temporal phasing between measurements at different brain locations: activity measurements should be locked to precise global clocks, perhaps with a tolerable phase imprecision between any two measurements in the range of 1 2 π × 1 ms ≈ 100 − 200 ms . Furthermore, the activities of neurons can be highly correlated locally or across large networks (Schneidman et al., 2006), suggesting that local activity sensors may be subjected to high instantaneous total firing rates due to simultaneously-active neurons.

2.3. Absorption and Scattering of Radiation

All existing methods of neural recording utilize electromagnetic waves, from the near-DC frequencies of wired electrical recordings (~1 kHz) to the radio-frequencies of wireless electronics and fMRI (MHz–GHz) to visible light in optical approaches (~500 THz). These electromagnetic waves are attenuated in brain tissue by absorption and scattering. As an approximation to the electromagnetic absorption by brain tissue, we treat the absorption by water, the brain's main constituent (68–80% by mass in humans Dobbing and Sands, 1973; Fatouros and Marmarou, 1999). At visible and near-IR wavelengths, scattering dominates absorption: absorption lengths are in the ~1 mm range, while scattering lengths are ~25–200 μm (Wilt et al., 2009). The combined effect of absorption and scattering is measured by the attenuation length, the distance over which the signal strength is reduced by a factor of 1/e along a path. Figure 2 shows the absorption length of water (Kou et al., 1993), and the attenuation length in a Mie scattering model from Horton et al. (2013) intended to approximate the scattering properties of cortical tissue [and see (Gabriel et al., 1996) for tissue skin depth measurements in the 10 Hz to 100 GHz range]. This gives a preliminary indication of which wavelengths can be used to measure deep-brain signals with external detectors. Note that the attenuation length is only one of several relevant metrics: for example, scattering not only causes signal attenuation, but also causes noise and impairs signal separation, so the magnitude of the scattering is a key figure of merit.

FIGURE 2

Figure 2. Penetration depth (attenuation length) of electromagnetic radiation in water vs. wavelength [data from Jonasz (2007)]. The approximate diameter of the mouse brain is shown as a black dashed line. Inset: approximate tissue model based on Mie scattering theory and water absorption. Absorption length of water (Kou et al., 1993) (blue), approximate tissue scattering length in a simple Mie scattering model (red) and the resulting attenuation length (green) of infrared light [inset reproduced from Horton et al. (2013), with permission].

3. Challenges for Brain Activity Mapping

Any activity mapping technology must extract the required information without disrupting normal neuronal activity. As such, we consider three primary challenges: spatiotemporal resolution and informational throughput, energy dissipation and volume displacement.

3.1. Spatiotemporal Resolution and Informational Throughput

A sampling rate of 1 kHz is necessary to capture the fastest trains of action potentials at single-spike resolution. A minimal data rate of 7.5 × 1010 bits processed per second is then required to record 1 bit per mouse neuron at 1 kHz.

In electrical recording, higher sampling rates (e.g., 10–40 kHz) are often necessary to distinguish neurons based on spike shapes when each electrode monitors multiple neurons. More fundamentally, one bit per neuron sampling at 1 kHz would likely not be sufficient to reliably distinguish spikes above noise: transmitting ~10 bit samples at ~10 kHz (full waveform) or ~10–20 bit time-stamps upon spike detection would be more realistic.

Conversely, it may be possible to locally compress measurements of a spike train before transmission. The degree of compressibility of neural activity data is related to the variability in the distribution of neural responses (e.g., such a distribution may be defined across time bins or repeated stimulus presentations) (Strong et al., 1998). In the blowfly Calliphora vicina, the entropy of spike trains has been measured to be up to ~180 bit/s, and the information about a stimulus encoded by a spike train was as high as ~90 bit/s (Strong et al., 1998). Extrapolating from fly to mouse, this would suggest that a compression factor of 5× − 10× should be possible, relative to a 1000 bit/s raw binary sampling.

As a naïve estimate of the entropy as a function of firing rate, one can write the entropy H in bit/s, assuming 1 ms long spikes and f = 1000 Hz sampling rate, as

H ≈ ( − P spike · log 2 ( P spike ) − ( 1 − P spike ) · log 2 ( 1 − P spike ) ) · f

where P spike is the probability of spiking during the sampling interval (average firing rate/f). For an average firing rate of 5 Hz, P spike = 0.005 and H = 45 bit/s, corresponding to a compression factor of ~20×. However, at 500 Hz average firing rate, P spike = 0.5 with H ≈ 1000 bit/s, i.e., there is no compressibility. Therefore, compression could conceivably reduce the data transmission burden for activity mapping by 1–2 orders of magnitude, depending on the neurons and activity regimes under consideration. Note that these compressibility calculations have assumed that firing patterns are independent across cells; they represent the temporal compressibility of the spike train from each cell, treated individually. Patterns across cells could conceivably be compressed by a much larger amount, to the extent that there is redundancy between cells. Nevertheless, we use 1 bit/neuron/ms or 100 Gbit/s as a “minimal whole brain data rate” in what follows. In many cases, this likely constitutes a lower bound on what is feasible in practice.

3.2. Energy Dissipation

Brain tissue can sustain local temperature increases (ΔT) of ~2°C without severe damage over a timescale of hours. Indeed, changes of this magnitude may occur naturally in rats in response to varying activity levels (Wolf, 2008). Assuming that the brain is receiving a constant power influx P delivered and that the local thermal transport properties of mouse brains are similar to those of humans, we can approximate the temperature change in deep-brain tissue as a function of the applied power (Lazzi, 2005; Sotero and Iturria-Medina, 2011):

d T d t = ( P delivered + P metabolic − ρ blood C blood f blood Δ T ) / C tissue

where P metabolic = 0.0116 W/g is the power per unit mass of basal metabolism, C tissue ≈ 3.7J/(Kg) ≈ 0.88 · C water is the specific heat capacity of brain tissue, ρ blood = 1.05 g/cm3 is the density of blood, C blood = 3.9j/(Kg) is the specific heat capacity of blood, f blood = 9.3× 10−9 m3/g/s is the volume flow rate of blood, and ΔT is the temperature difference between the brain tissue and the blood (at 37°C). A steady-state temperature increase (dT/dt = 0) of 2° corresponds to dissipation of ~40 mW per 500 mg mouse brain. Therefore, a recording technique should not dissipate more than ~40 mW of power in a mouse brain at steady state.

This estimate of the power dissipation limit in mouse brains, based on such a simplified model of the brain's thermal transport mechanisms, is likely an under-estimate of the actual maximum steady-state power dissipation. Radiative heat loss was ignored here since infrared light emitted by deep-brain tissue is quickly re-absorbed by nearby tissue. We have also ignored cooling due to flows in the cerebrospinal ventricles (Smith and Zhu, 2010) and in the glymphatic system (Iliff et al., 2012). We have further assumed that conductive heat loss from the brain surface is negligible compared to the heat extracted volumetrically by blood flow. While this may hold true locally in deep brain voxels and over short timescales (e.g., <1 min), further work [e.g., a whole-head model (Sukstanskii and Yablonskiy, 2004; Lazzi, 2005)] is needed to define the true limits of sustained volumetric heat production by neural recording systems distributed throughout the mouse brain. Indeed, the characteristic length scale of temperature inhomogeneities in the brain is on the order of millimeters (Sukstanskii and Yablonskiy, 2006), whereas heat exchange with the flowing blood dampens the effects of local perturbations over longer length scales. For large brains, this means that sources and sinks of heat exert only local thermal effects; for a mouse brain on the scale of <10 mm, however, surface and volumetric effects likely combine to influence temperature changes at any site in the brain (Sukstanskii and Yablonskiy, 2007). Experimentally, increasing the temperature gradient at the brain surface, via a cranial window exposed to ambient air at ~25°C (i.e., the common craniotomy technique used to access mouse neocortex), has been shown to dis-regulate brain temperature down to a depth of several millimeters (Kalmbach and Waters, 2012). For the above reasons, our estimates of the brain's capacity for heat dissipation should be treated only as first approximations.

Higher power levels, compared to the maximum steady state power, may be introduced into brains transiently. According to the above equation, if a neural recorder dissipates ~40 mW per 500 mg mouse brain, then the brain approaches the steady-state temperature in 2–3 min, making shorter experiments potentially feasible. This is in agreement with the estimate from Sukstanskii and Yablonskiy (2006) of a ~1 min time constant for brain temperature changes, as well as with experimental measurements showing similar time constants for temperature variations resulting from sustained neural stimulation (McElligott and Melzack, 1967; Trübel et al., 2005). Increasing convective heat loss from the brain by increasing blood flow (e.g., via increased heart rate) or cooling the brain (volumetrically or via its surface Sukstanskii and Yablonskiy, 2007), the blood, the cerebrospinal fluid (CSF), or the whole animal (Polderman, 2004), could increase the allowable transient or steady-state power dissipation.

There are also limits on the power density of radiation applied to brain tissue. For radio-frequency electromagnetic radiation, the specific absorption rate (SAR) limit on the power density exposed to human tissue is ~10 mW/cm2 (IEEE, 2006), while for ultrasound (which couples less strongly to dissipative loss mechanisms in tissue) the SAR limit is up to 72× higher (FDA, 2008). The power density limit for visible and near-IR light exposures are also in the ~10–100 mW/cm2 range for ~1 ms long exposures, decreasing as the exposure time lengthens [based on the IEC 60825 formulas (IEC, 2007)].

High local power dissipation (transient or steady-state) can modify the electrical properties of excitable membranes, altering neuronal activity patterns. For example, heating of cell membranes and of the surrounding solution by millisecond-long optical pulses leads to changes in membrane electrical capacitance mediated by the ionic double layer (Shapiro et al., 2012). Slower temperature changes (on a scale of seconds) resulting from RF radiation lead to accelerated ion channel and transporter kinetics (Shapiro et al., 2013). Both of these effects are appreciable when the temperature changes are on the order of 1–10°C.

For comparison with current practice, common guidelines for chronic heat exposure from biomedical implants (Wolf, 2008) use upper limits of 2°C temperature change, 40 mW/cm2 heat flux from the surface of implanted brain machine interface (BMI) hardware, and an SAR limit of

σ E 2 2 ρ < 1.6 mW/g

for electromagnetic energy absorbed by tissue, where E is the peak electric field amplitude of the applied radiation, σ ≈ 0.18 S/m is the electrical conductivity of grey matter and ρ ≈ 1 g/cm3 is the tissue density (Lazzi, 2005) (this corresponds to an irradiance of ε 0 cE2/2 ≈ 2.4 mW/cm2). A 96-channel BMI system demonstrated in living brains had dissipated areal power density approaching 40 mW/cm2 (Rizk et al., 2009).

3.3. Sensitivity to Volume Displacement

To prevent damage to the brain, we assume that a recording technique should not displace >1% of the brain's volume. The appropriate damage threshold is not yet established, however, so this constitutes a first guess. It is possible to insert large numbers of probes throughout multiple brain areas without compromising function. In rats, 96 electrodes of 50 μm diameter were simultaneously inserted across four forebrain structures (cortex, thalamus, hippocampus and putamen) (Ribeiro et al., 2004). In rhesus macaque, 704 electrodes of diameter 50 μm and average depth 2.5 mm were chronically implanted in cortex (Nicolelis et al., 2003). Note, however, that the total volume displacement in these experiments was below 0.1%, and below 0.01%, respectively. Furthermore, these studies used a low density of electrodes. Thus, detailed limits on the amount and density of inserted material are unknown.

Furthermore, the nature of the volume displacement is important—sheets of instrumentation that sever long-range connectivity, for example, would disrupt normal brain function regardless of the degree of volume displacement. Conversely, higher volume displacement might be possible if introduced gradually, or during early development, insomuch as the brain can adapt without disrupting natural computation. One important consideration in this regard would be the disruption of blood circulation by inserted material; a high density of implanted material in a brain region could cause stroke due to widespread vascular damage. Recent studies have defined in microscopic detail the complete vascular network of the mouse cortex using high-throughput histology (Blinder et al., 2013); this type of information could be used to enumerate key vascular pathways which could be spared from damage. To apply this in a particular animal, however, would require a non-destructive method to image the vasculature at a similar resolution; otherwise, only a broad statistical view can be obtained, since the detailed vascular geometry will vary from animal to animal.

Secondary effects like glial scarring may also pose obstacles to the long-term implantation of large numbers of probes (Polikov et al., 2005; Ward et al., 2009), although methods are being developed to alleviate this (Reichert et al., 2008; Reichert, 2010; Taub et al., 2012). In the context of electrical recording, the impact of glial scarring may vary depending on geometry. For example, the recording sites at the tip of a Utah or Duke multi-electrode array are typically viable in chronic recordings of up to 18 months in primates (Nicolelis et al., 2003; Suner et al., 2005), whereas in array formats with multiple electrodes along each shaft, such as the Michigan array, chronic recordings of up to 4 months have been reported in rats (Vetter et al., 2004). Differences in recording lifetime may be due to differences in the pattern of glial encapsulation of the contacts.

4. Evaluation of Modalities

We next evaluate neural recording technologies with respect to the above challenges, using the mouse brain as a model system. Table 1 lists the modalities studied, the assumptions made, the analysis strategies applied, and the conclusions derived.

TABLE 1

Table 1. Summary of modalities, models, assumptions and conclusions.

4.1. Electrical Recording

In the oldest strategy for neural recording, an electrode is used to measure the local voltage at a recording site, which conveys information about the spiking activity of one or more nearby neurons. The number of recording sites may be smaller than the number of neurons recorded since each recording site may detect signals from multiple neurons. As a note for practitioners, we use the term “electrode” interchangeably with the terms “recording site” or “contact”, meaning a point-like voltage sensing node: many multi-electrode arrays in common use (e.g., the Duke and Utah arrays) are conductive only at the tip, whereas other designs (such as the Michigan array) have multiple contacts along the shaft. Each shaft in a Michigan array would thus constitute multiple “electrodes” or “recording sites” in our parlance. Traditional electrical recording techniques keep active devices such as amplifiers outside the skull and therefore do not pose a heat dissipation challenge; this may change if amplifiers are brought closer to the signal sources to reduce noise.

Slowly varying (e.g., <300 Hz) extracellular potentials (LFPs) (Buzsáki et al., 2012; Reimann et al., 2013) on the order of 0.1–1 mV, and fields (Anastassiou et al., 2010) on the order of 1–10 mV/mm, are generated by neural activity. While LFPs can be filtered from the higher-frequency signals associated with extracellular voltage spikes, these and other effects necessitate maintaining precise potential references (i.e., ground levels) for voltage measurements distributed widely across the brain.

4.1.1. Spatiotemporal resolution

4.1.1.1. Limits assuming perfect spike sorting. We begin with an idealized estimate of the number of electrodes required to record from the entire mouse brain, neglecting the difficulty of assigning observed spikes to specific cells (spike sorting), and focusing only on what is needed to detect spikes from every neuron on at least one electrode. The key variable here is the maximum distance between an extracellular electrical recorder and a neuron from which it records spikes. In a first approximation, this is determined by two factors: the decay of the signal with distance from the spiking neuron and the background noise level at the recording site. We assume that for an electrode to reliably detect the signal from a given neuron, the magnitude of that neuron's signal must be larger than the electrode's noise level. Note, however, that knowledge of spike shape distributions could potentially be used to extract low-amplitude spikes from noise.

The peak signals of spikes from neurons immediately adjacent to an electrode are in the 0.1–1.0 mV range and scale roughly as e−r/r 0 , where r is the distance from the cell surface and the 1/e falloff distance, r 0 , has been experimentally measured at ~28 μm in both salamander retina (Segev et al., 2004) and cat cortex (Gray et al., 1995), and computed at ~18 μm in a biophysically realistic simulation (Gold et al., 2007; Anastassiou et al., 2013). However, this decay is strongly influenced by the detailed geometry of neuronal currents and the properties of the extracellular space [e.g., its inhomogeneity, which may lead to a frequency-dependent falloff of the extracellular potential (Bédard et al., 2004)], making analytical calculation of the decay rate difficult (at large distances, a much slower 1/r2 dipole falloff is expected).

Several sources of background noise enter the recordings. Johnson noise, which arises from thermal fluctuations in the electrode, is

V johnson = ( 4 k B T Z BW ) 1 / 2

which for physiological temperature, electrodes of impedance Z = 0.5 mΩ, and BW = 10 kHz bandwidth is V johnson ≈ 9 mv. The recordings are also affected by interference from other neurons, which has been reported to exceed the Johnson noise, and is non-stationary due to changes in the cells' firing properties (Sahani, 1999). The noise and interference from these sources realistically produces >10–20 μV of voltage fluctuations (Camuñas Mesa and Quian Quiroga, 2013). Current recording setups thus have signal to interference-plus-noise ratios (SINRs) of <100, where the SINR is defined as the ratio of the peak voltage from immediately adjacent neurons to the voltage fluctuation floor of the electrode.

A limit on the maximum recording distance is the distance at which the signal from the farthest neuron falls below the noise floor, r max ≈ r 0 ln(SINR). For SINR ≈ 100, r max ≈ 130 μm. For comparison, recent experimental data from multi-site silicon probes has shown few detectable neurons beyond ~100 μm and none detectable beyond 160 μm (Du et al., 2011). Recordings in the hippocampal CA1 region could not detect spikes from cells farther than 140 μm from the electrode tip (Henze et al., 2000), even after averaging over observations triggered on an intracellularly recorded spike; in hippocampus, this corresponds to a detection volume containing approximately 1000 neurons (Buzsáki, 2004). Furthermore, in many studies (in monkeys, rats and mice) using multi-electrode arrays with 150–300 μm inter-electrode spacings, no neuron is seen by more than one electrode (Wessberg et al., 2000; Carmena et al., 2003; Jin and Costa, 2010; Koralek et al., 2012).

Due to the steep local falloff, even improving the SINR by a factor of 10 only extends the maximal recording distance to r max ≈ 190 μm. Assuming packing of the brain into equal sized cubes of side length d = 2 3 3 r max ≈ 150 μm gives N > 13,0000 electrodes for whole brain recording using recording sites with r max ≈ 130 μm. Note that N varies as the third power of r max and is therefore highly sensitive to variations in the assumed maximal recording distance; the number of required recorders can range from 38,000 to 210,000 as r max varies from 190 to 110 μm.

These calculations, by assuming perfect spike sorting, greatly underestimate the required number of electrodes in practice. First, signals from the weakest cells are far weaker than those from the strongest cells and the signals from some cells decay much faster than others (Gray et al., 1995). Second, because of neuronal synchronization, the local noise produced by nearby neurons may sometimes be large. Third, spike waveforms can vary over the course of a recording session (Fee et al., 1996; Stratton et al., 2012). Finally, with many neurons per electrode or at high firing rates, spikes from detectable neurons will often temporally overlap, making spike sorting difficult.

4.1.1.2. Limits from spike sorting.The previous calculations have assumed that any spike which is visible above the noise on at least one electrode can be detected and correctly assigned to a particular cell, i.e., that the problem of spike sorting can be solved perfectly. However, perfect spike sorting is far beyond current algorithmic capabilities and in fact may not be possible in principle.

To achieve the scenario described above, with N = 13,0000 recording sites per mouse brain, would require each electrode to sort spikes from all 4 3 π r max 3 ρ neurons neurons in a sphere of radius r max ≈ 130 μm surrounding the recording site, where ρ neurons ≈ 92,000/mm3 is the density of neurons. This assigns ~800 neurons to a single electrode. Roughly half (i.e., 400) of these neurons will lie at >100 μm distance from the electrode, and their signals on the electrode will therefore have voltage SINRs of <100e−100 μm/28 μm ≈ 2.8, assuming as above that extracellular spike amplitudes decay exponentially in space.

Electrical recording can be viewed as a data transmission problem, with the electrode playing the role of a communication channel (see section 4.4). According to the Shannon Capacity Theorem (Cover and Thomas, 2006), the information capacity C of a single analog channel (with additive white Gaussian noise) is

C = BW log 2 ( 1 + S / N )

where BW is the bandwidth, S is the signal power (proportional to the square of the voltage), and N is the noise power. Here the bandwidth is BW ≈ 10 kHz/s, and the ratio of peak signal power to noise power of a single spike for the outer 400 cells is no more than 2.82, or 0.5×2.82 using the RMS signal power instead of the peak. With 400 cells emitting 2 ms spikes at 5 Hz, there will be an average of 4 cells spiking at a time, for S/N ≈ 0.5 × 4× 2.82 ≈ 15.7 counting the signal power from all the spikes. The channel capacity is then C ≈ 40 kbit/s. This represents the maximum amount of information (e.g., about which neuron spiked when) that the population of spiking neurons can transmit via the electrode which measures them. To transmit uniquely identifiable signals from all 400 neurons at millisecond temporal precision, however, requires 1 kbit/s × 400 = 400 kbit/s, which is >10× greater than the channel capacity and is therefore not achievable. Even with optimal temporal compression of ~5 Hz spikes (see section 2), we would need to transmit ~400/20 = 20 kbit/s, which is strictly less than the channel capacity and thus possible in principle, but barely so. Furthermore, the channel capacity given here is an overestimate, since 2.8 is an upper bound on the SINR of the outer cells. On the other hand, note that the use of a nominal 5 Hz average firing rate here (in the estimates of signal to noise ratio and of temporal compressibility) greatly oversimplifies the distribution of firing rates across neurons, as discussed in section 2 above, so this analysis can only be treated as a first approximation.

Based on these rough estimates, perfect spike sorting may not be possible at ~800 neurons per electrode, in a sphere of radius 130 μm surrounding a recording site, and at the noise levels typical of current electrodes. In essence, there may not be enough room on the electrode's voltage trace to discriminate such a large number of weak, noisy signals. Note that these information-theoretic limits still apply even if it is possible to resolve temporally overlapping spikes. In fact, the channel capacity is what ultimately limits the ability of a spike sorting algorithm to resolve such overlapping spikes.

To see the regime in which spike sorting becomes feasible, suppose that each electrode is only responsible for spike sorting from the population of ~100 neurons nearest to the electrode, i.e., in a sphere of radius r ≈ 64 μm, assuming the 92,000/mm3 cell density from mouse cortex. The outermost 50% of these neurons are then positioned >50 μm from the recording site. For these outermost 50 neurons, the voltage SINR is <100e−50 μm/28 μm ≈ 17 and S/N < 0.5 × 172 × (2 ms × 5 Hz × 50) ≈ 72.3. The channel capacity is therefore <62 kbit/s, whereas 50 kbit/s is needed for signal transmission from 50 neurons without temporal compression versus ~2.5 kbit/s with temporal compression. Even 100 neurons per electrode may therefore still be close to the limits of information transmission through the noisy channel corresponding to a single electrode.

In practice these limits are likely to be highly optimistic, since the set of spikes emerging from a neuronal population is far from an optimally designed code from the perspective of multiplexed signal transmission through a voltage-sensing electrode: the waveforms for different neurons are similarly-shaped rather than orthogonal, the spikes emitted by a given neuron vary somewhat in amplitude and exhibit shape fluctuations (signal-dependent noise), and it is not known in advance what the characteristic signal from each neuron looks like (or even how many neurons there are).

Indeed, current practice is far from the above information-theoretic limits. At present, spike sorting algorithms operating on data from large-scale (250–500 electrodes), densely spaced (~30 μm), 2D multi-electrode arrays can reliably identify and distinguish spikes from nearly all of the 200–300 retinal ganglion cells (Marre et al., 2012; Pillow et al., 2013) in a small patch of retina, and can also infer approximate cell locations through spatial triangulation of spike amplitudes. This represents a roughly 1:1 ratio of cells to electrodes. Electrodes with up to 4 single units can be found in chronically implanted multi-electrode arrays (in both mouse and primate) (Nicolelis et al., 2003; Costa et al., 2004), where the electrodes are sparse, although the average yield of cells per electrode is closer to 1:1; if only electrodes with at least one cell are counted, the average rises to ~1.5–1.7 cells per electrode. Optimistically, simulations of neural activity suggest that 5–10 neurons per electrode may be distinguishable using current spike sorting algorithms (Sahani, 1999; Pedreira et al., 2012; Camuñas Mesa and Quian Quiroga, 2013). A limit of ~10 neurons per electrode would imply N = 7.5×106 electrodes to record from all neurons in the mouse brain, which could be accomplished by positioning recording sites on a cubic lattice with ~40 μm edge length.

Future algorithmic improvements could enable sorting from more than ~10 cells per electrode, but this becomes increasingly challenging. One simple estimate of a reasonable practical limit, for the regime of many neurons per electrode, would be the largest number of neurons that can be sorted without requiring the frequent resolving of temporally overlapping spikes: if the average neuron fires at ~5 Hz and spikes last ~2 ms, then at most roughly 100 neurons per electrode can be sorted without requiring overlaps to be resolved. Note that while some present-day algorithms can successfully resolve overlapping spikes (Segev et al., 2004; Ge et al., 2011; Prentice et al., 2011; Marre et al., 2012; Pillow et al., 2013), they typically do so only in the case where electrodes are densely spaced and any given spike appears on many electrodes, such that spatial information can be used to resolve the overlap. Resolving overlaps when spikes appear on only one or a few channels is more difficult due to noise and spike-shape variation.

Overall, ~100 cells per electrode may be taken as a rough estimate of the limits of spike sorting, and would imply N = 750,000 electrodes and an edge spacing of ~80 μm if a cubic lattice of recording sites were used. However, we should not exclude the possibility of game-changers which could alter the nature of the recorded data to improve the available information. For instance, CCD cameras could be attached to multi-electrode arrays to aid in the identification and localization of cells, or directional information on the source of spikes could be obtained at each recording site, for example by measuring the directions of gradients in voltage. Systems that capture such additional information could circumvent the above information-theoretic limits and improve spike sorting.

4.1.2. Volume displacement

We require <1% total volume displacement from N recorders. Wires from each electrode must make it to the surface of the brain, which implies an average length l ≈ 4 mm for the mouse brain (depending on assumptions about the wiring geometry).

As a rough approximation, consider each recorder to produce a volume displacement associated with a single cylindrical wire, with length l and radius r. Thus r must satisfy

π r 2 l N min , rd < 0.01 V brain

Using N min,rd = 21,0000 or 38,000 recording sites (lower and upper limits from the perfect spike sorting case from above) and l ≈ 4 mm requires wires of radius r max ≈ 6.0 μm, or 2.5 μm, respectively. Alternatively, if 7.5 × 106 electrodes must be used (current spike sorting case from above), the required wire radius is ~200 nm. While these dimensions are readily achievable using lithographic fabrication, there would be a challenge to produce isolated wires of such dimensions at scale (perhaps suggesting the use of wire bundles). Still, volume constraints per se are unlikely to fundamentally limit whole-mouse-brain electrical recording even in the most pessimistic scenario.

Figure 3 illustrates the above considerations as a function of the electrode SINR.

FIGURE 3

Figure 3. The voltage signal to interference-plus-noise ratio (SINR) for neurons immediately adjacent to the recording site sets an approximate upper bound on the distance, r max , between the recording site and the farthest neuron it can sense (blue), due to the exponential falloff of the voltage SINR with distance. Assuming at least one electrode per cube of edge length 2 3 3 r max in turn limits the number of neurons per recording site (gold), the total number of recording sites (red) and the maximal diameter of wiring consistent with <1% total brain volume displacement (turquoise). SINR values for current recording setups are <102. In practice, the number of neurons per electrode distinguishable by current spike sorting algorithms is only ~10, with an estimated information theoretic limit of ~100, so these curves greatly under-estimate the number of electrodes which would be required based on realistic spike sorting approaches in a pure voltage-sensing scenario.

4.1.3. Implanting electrodes in the brain

There are several technology options for introducing many electrodes into a brain. For example, flexible nanowire electrodes could, in theory, be threaded through the capillary network (Llinás et al., 2005). Capillaries are present in the brain at a density of 2500–3000/mm3 (Schmidt and Thews, 1989), which equates to one capillary per 73 μm, with each neuron lying within ~200 μm of a capillary (Loffredo and Lee, 2008). The minimum capillary diameter is as small as 3–4 μm, although the average diameter is ~8 μm, comparable to the non-deformed size of the red blood cells (Freitas, 1999). Blocking a significant fraction of capillaries could lead to stroke or to unacceptable levels of tissue necrosis/liquifaction.

The cerebrospinal ventricles may also provide a convenient location for recording hardware. Furthermore, neural tissues could be grown around pre-fabricated electrode arrays (Jadhav et al., 2012), or silicon probes arrays with many nano-fabricated recording sites per probe (Du et al., 2011) could be inserted into the brain.

Mechanical forces during insertion and retraction of silicon and tungsten microelectrodes from brain tissue have been measured in rat cortex at ~1 mN for electrodes of ~25 μm radius (Jensen et al., 2003). These forces are comparable to the Euler buckling force F of a 2 mm long cylindrical tungsten rod of r = 5 μm radius

F = π 2 E I ( K L ) 2 ≈ 1 mN

where E = 411 GPa is the elastic modulus of tungsten, I = (π/2)r4 is the moment of inertia of the wire cross-section, L ≈ 2 mm is the length of the wire, and K is the column effective length factor which depends on the boundary conditions and is set to K = 1 here for simplicity. This suggests that it may be possible to push structures of <10 μm diameter into brain tissue [see (Najafi and Hetke, 1990) for related calculations]. It might be advantageous to pull rather than push wires into the brain [e.g., using applied fields, or perhaps even cellular oxen (Weibel et al., 2005) to carry the wires], since the thinnest wires could withstand tension forces much higher than the compressive force at which they buckle (although there may also be ways to circumvent buckling, e.g., via rapid vibration).

4.1.4. Conclusions and future directions

Electrical recording has the advantage of high temporal resolution, but the large number of required recording sites poses challenges for delivery mechanisms. Ongoing innovations in electrical recording that could be leveraged for dramatic scaling include the development of highly multiplexed probes, multilayer lithography for routing electrical traces, novel methods to implant large numbers of electrodes, smaller electrode impedances to reduce the Johnson noise, amplifiers with lower input-referred noise levels, spike sorting algorithms capable of handling temporally overlapping spikes and adaptively modeling the noise, and hybrid systems integrating electrical recording with implantable optics or other methods.

One challenge for a purely-electrical recording paradigm pertains to the ability to relate the measured electrical signals to specific cells within a circuit. As the set of neurons recorded by each electrode grows to encompass a large volume around the electrode, it will become more difficult to attribute the recorded spikes to particular neurons. Furthermore, given the complex geometries of neuronal processes, it is not obvious how to determine the spatial position or layout of a neuron from its electrical signature on a nearby electrode. A given electrode will be positioned near the axons or dendrites of some neurons, and near the cell bodies of other neurons, complicating data interpretation. If the spatial density of recording sites is increased such that many electrodes sample the same neuron, however, this could enable imaging of neuronal morphology and signal propagation via voltage signals across multiple electrodes (Bakkum et al., 2013). Currently, extracellular electrical recording also does not allow extraction of molecular information on the cells being recorded, although intracellular electrophysiological recording methods [e.g., (Kodandaramaiah et al., 2012)] might enable this for a limited number of cells.

4.2. Optical Recording

Optical techniques measure activity-dependent light emissions from neurons, typically generated by fluorescent indicator proteins, although activity-dependent bioluminescent emissions are an emerging possibility. Current genetically encoded calcium indicators can only distinguish spikes below ~50–100 Hz firing rates without averaging (Smetters et al., 1999) due to slow intra-molecular kinetics and indicator saturation at high firing rates, although significant improvements in speed are ongoing (Sun et al., 2013b). Intracellular calcium rises and drops can occur within 1 ms and 10–100 ms respectively (Higley and Sabatini, 2008), which sets the ultimate speed limit for calcium imaging. The field of genetically-encoded high-speed fluorescent voltage indicators is also advancing quickly (Barnett et al., 2012; Kralj et al., 2012; Akemann et al., 2013; Cao et al., 2013; Gong et al., 2013; Storace et al., 2013) and these may find particular use in monitoring sub-threshold events (Scanziani and Häusser, 2009).

4.2.1. Spatiotemporal resolution

4.2.1.1. Multiplexing strategies. For optical approaches, the light originating from the activity of each neuron must be separated from emissions originating from other points in the brain: this can be accomplished in many ways, leading to a variety of architectures for 3D imaging. Epi-fluorescence microscopy images a plane in the specimen (i.e., with depth of field DOF = 2 n λ N A 2 , where n is the refractive index, λ is the wavelength and NA is the numerical aperture of the imaging system Quirin et al., 2013) onto a spatially-resolved two-dimensional detector (e.g., a CCD camera). The focal plane is then scanned in order to reconstruct 3D images; because the entire 3D volume is illuminated during image acquisition, out-of-focus neurons cause background emissions. Light sheet imaging is similar to epi-flourescence imaging, except that only neurons near the focal plane are illuminated, reducing out of focus noise. Unfortunately, this requires transparent brains (Ahrens et al., 2013). Volumetric imaging can also be performed in a single snapshot using lightfield microscopes (Levoy et al., 2009; Broxton et al., 2013), which capture the directions of incoming light rays, trading in-plane resolution for axial resolution, or by using multi-focus microscopes (Abrahamsson et al., 2012). In multi-photon microscopy, non-linearities result in fluorescence excitation occurring only near the focal point of the excitation laser, which is scanned across the sample. In confocal scanning microscopy, only photons from a point of interest are measured due to geometric constraints (e.g., pinholes). Alternatively, 3D imaging can be performed via wavefront coding, which extends the depth of field by creating an axially-independent point-spread function using known optical aberrations, in combination with computational deconvolution (Dowski and Cathey, 1995). With a known 3D pattern of excitation light, wavefront coding can be applied to 3D fluorescence microscopy without scanning using a 2D detector array (Quirin et al., 2013). Emerging, alternative strategies rely on tagging emissions from different sources with distinguishable modulation patterns (Yin, 2006; Wu et al., 2006; Wang et al., 2012; Diebold et al., 2013; Ducros et al., 2013), or precisely controlling and tracking the timing of light emissions (Cheng et al., 2011). Optical techniques thus achieve signal separation by multiplexing spatially (e.g., direct imaging) or temporally (e.g., beam scanning), or often by a combination of the two.

While optics might seem to require a number of photodetectors comparable to the number of neurons (or a similar number of sampling events in the time domain, e.g., for scanning microscopies), new developments suggest ways of imaging with fewer elements. For example, compressive sensing or ghost imaging techniques based on random mask projections (Wakin et al., 2006; Tian et al., 2011; Studer et al., 2012; Sun et al., 2013a) might allow a smaller number of photodetectors to be used. In an illustrative case, an imaging system may be constructed simply from a single photodetector and a transmissive LCD screen presenting a series of random binary mask patterns (Huang et al., 2013), where the number of required mask patterns is much smaller than the number of image pixels due to a compressive reconstruction.

4.2.1.2. Effects of light scattering. Single-photon techniques limit imaging to a depth of a few scattering lengths at the excitation and emission wavelengths of activity indicators: up to ~1–2 mm for certain infrared wavelengths (Horton et al., 2013; Kobat et al., 2009, 2011) vs. a few hundred microns for visible wavelengths (Wilt et al., 2009). Activity dependent dyes are currently available only in the visible spectrum; indicators operating in the infrared [see (Shcherbo et al., 2009; Filonov et al., 2011; Shcherbakova and Verkhusha, 2013) for far-red fluorescent proteins] could improve imaging depth.

Multi-photon excitation takes advantage of the deeper penetration of infrared light. Two or more infrared photons may together excite a fluorophore with an excitation peak in the visible range, leading to the emission of a visible photon. If only one neuron is illuminated with sufficient intensity to generate multi-photon excitation, all photons captured by the detector originate from that neuron, regardless of the scattering of the outgoing light. Hence, the emission pathway is limited less by scattering than by absorption. This has resulted in imaging at >1 mm depth (Kobat et al., 2009, 2011; Horton et al., 2013).

There are at least five options for overcoming visible light scattering to enable signal separation from deep-brain neurons (Alivisatos et al., 2012, 2013):

1. Infrared light can excite multi-photon fluorescence in an excitation-scanning architecture.

2. Fluorophores with both excitation and emission wavelengths in the infrared could be developed.

3. By knowing the precise form of the scattering, it can be possible to correct for it. Emerging techniques based on beam shaping allow transmission of focused light through random scattering media by inverting the scattering matrix (Conkey et al., 2012). Because the scattering properties change over time, this must be done quickly, possibly faster than the imaging frame rate, necessitating high-speed wavefront modulation. This can currently be achieved with digital micro-mirror devices (DMDs), but not with the phase-only spatial light modulators (SLMs) that are used to prevent power losses in the excitation pathways for non-linear microscopies, although GHz switching of phase-only modulators appears feasible in principle (Alivisatos et al., 2013). High speed focusing through turbid media is also achievable using all-optical feedback in a laser cavity (Nixon et al., 2013), and it is even possible to measure the scattering matrix non-invasively (Chaigne et al., 2013) using a photo-acoustic technique, or via all-optical approaches based on speckle correlation (Bertolotti et al., 2012). Similar techniques are available for incoherent light (Katz et al., 2012). When using short optical pulses, scattering can lead to temporal distortions that degrade the peak light intensity at a focal spot. The <100 fs pulse durations used in two-photon microscopy, for example, are comparable to the time it takes light to travel 30 μm in vacuum. Fortunately, wavefront shaping techniques can correct for scattering-induced temporal distortions as well (Katz et al., 2011; McCabe et al., 2011).

4. Light sources and/or detectors could be positioned close to the measured neurons, necessitating the use of embedded optical devices. This could be done using optical fiber (Mahalati et al., 2013) and/or waveguide (Zorzos et al., 2010, 2012) technologies, which are developing rapidly. For example, single-mode fiber cables can support >1 TB/s data rates (Ono and Yano, 1998; Bozinovic et al., 2013) with low light loss over hundreds of kilometers (Miya et al., 1979). It is possible to directly image through gradient index of refraction (GRIN) lenses (Murray and Levene, 2012) or optical fibers (Flusberg et al., 2005; Kang et al., 2010; Mahalati et al., 2013), which provides one way to multiplex multiple observed neurons per fiber.

5. Light emissions from distinct locations can be tagged with distinguishable time-domain modulation patterns, and the emission time-series for each source can later be decoded from the summed signal resulting from scattering (Wu et al., 2006; Yin, 2006; Cheng et al., 2011; Wang et al., 2012; Diebold et al., 2013; Ducros et al., 2013). For example, ultrasound encoding (Wang et al., 2012; Judkewitz et al., 2013), which frequency-tags light emissions from a known location via a mechanical Doppler shift of the emitter (Mahan et al., 1998), provides a generic mechanism to sidestep problems of elastic optical scattering, although it requires distinguishing MHz frequency modulations in THz light waves (part per million frequency discrimination). Radio-frequency tagging of light emissions via a digitally synthesized optical approach is also an option and may be applicable to combatting the problem of emission scattering in deep-tissue, multi-point, multi-photon imaging (Diebold et al., 2013).

4.2.1.3. Speed of beam scanning. The speed of scanning microscopes is currently limited by beam repositioning times (~0.1 μs for spinning disk (Flusberg et al., 2005; Kang et al., 2010; Mahalati et al., 2013), ~3 μs for piezo-controlled linear scan mirrors, ~10 μs for acousto-optic deflectors (Vučinić and Sejnowski, 2007), ~8 kHz line scans for resonant galvanometer mirrors). The 10 μs repositioning time for acousto-optic deflectors is set by the speed of sound in the deflector crystal, while scanning mirrors and spinning disks are limited by inertia. Note that 0.1 μs repositioning time for current spinning-disk confocal techniques would require 10 s per frame for whole mouse brain imaging with a single scanned beam (10−7 s/site × 108 sites/brain). There is therefore a need for a 104 fold improvement in beam repositioning time and/or beam parallelization in order to achieve 1 kHz imaging frame rates for whole mouse brains.

One strategy to implement parallelization would exploit (yet to be developed) fast, high-resolution phase modulator arrays to arbitrarily re-shape coherent optical wavefronts for multisite holographic multi-photon excitation in 3D (Papagiakoumou et al., 2010; Vaziri and Emiliani, 2012; Alivisatos et al., 2013). With fast phase modulation (e.g., ~1 GHz), beating each excitation spot at a different frequency could allow a single detector to probe multiple sites in parallel, despite arbitrarily-large scattering of the outgoing light (Alivisatos et al., 2013). Emerging optical techniques may provide alternative means to implement similar strategies (Diebold et al., 2013). Temporal multiplexing of excitation pulses at distinct locations (e.g., via few-nanosecond beam delays) also allows parallelization of the excitation beam while combatting scattering ambiguity of the emitted light (Cheng et al., 2011). Furthermore, temporal focusing techniques in two-photon microscopy (depth-dependent pulse duration) can excite an entire plane or line within the sample (Oron et al., 2005; Tal et al., 2005; Sela et al., 2013; Packer et al., 2013), as well as arbitrary patterns of points (Papagiakoumou et al., 2010), potentially allowing fast axial scanning (somewhat analog to light-sheet techniques used with transparent samples). This method intrinsically corrects for scattering of the excitation light (Papagiakoumou et al., 2013), although not of the emission light. Like other multi-photon techniques, however, all these methods remain highly dissipative, as discussed below.

Fluorescence lifetimes in the 0.1–1 ns range (Striker et al., 1999) ultimately constrain the design of scanning fluorescence microscopies. A delay of 0.1 ns per mouse neuron per frame corresponds to only 100 Hz frame rate without parallelization, implying that parallelization into at least 10 to 100 beams is essential. The fluorescence lifetime also limits the achievable modulation frequencies in beat-frequency-multiplexed parallelization strategies (Diebold et al., 2013), bit lengths in encoded strategies (Ducros et al., 2013), and temporal offsets in temporally-multiplexed strategies (Cheng et al., 2011), suggesting that parallelization of detectors may be necessary in a strongly scattering environment. Depending on the degree of parallelization, which constrains the achievable dwell times given a fixed frame rate, photon counts may also become a limiting factor for high-speed scanning in some approaches.

4.2.1.4. Diffraction. Using the small angle approximation, the diffraction-limited angular resolution of an aperture is θ ≈ Δ x y ≈ λ D , where Δx is the spacing which must be resolved, y is the imaging depth, λ is the wavelength, and D is the aperture diameter. Thus distinguishing neurons which are 10 μm apart and at a depth of 10 mm requires a lens aperture D of >1 mm when λ ≈ 1 μm. Diffraction therefore does not appear to be a limiting factor for cellular resolution imaging, except in the context of microscale apertures that might find use in embedded optics approaches.

4.2.2. Energy dissipation

Light that does not leave the brain is ultimately dissipated as heat. The total light power requirements for optical measurement of neuronal activity using fluorescent indicators depend on factors including fluorophore quantum efficiency, absorption cross-section, activity-dependent change in fluorescence, background fluorescence, labeling density, activation kinetics, detector noise, scattering and absorption lengths, and others. Unfortunately, many of these variables are unknown or highly dependent on particular experimental parameters.

A statistical analysis of photon count requirements for spike detection (in the context of calcium imaging) can be found in (Wilt et al., 2013), which derived a relationship between the number of background photon counts (N bg ) and the number of signal photon counts required for high fidelity spike detection given photon shot noise. This scales roughly as N signal > 3 2 N bg , even at low absolute photon count rates. While this analysis governs the number of detected photons, the number of emitted photons will be higher due to losses. In one example using two-photon excitation, 5% of the emitted photons were captured by the photodetector (Kim et al., 1999). One implication of photon shot noise is that faster-responding indicators (e.g., voltage indicators which respond in near-real-time to the membrane potential) must be brighter.

4.2.2.1. Multi-photon excitation. Multi-photon experiments rely on short laser pulses with high peak light intensities at a focused excitation spot to excite non-linear transitions (Kim et al., 1999). This imposes an experimentally relevant physical limit: at least one excitation pulse of sufficient intensity per neuron per frame is required in order to excite multi-photon fluorescence during each frame. Assuming 1 kHz frame rate and 0.1 nJ pulses (Cheng et al., 2011), delivering only one pulse per neuron per frame would dissipate roughly (108 × 1 kHz×0.1 nJ) 10 W in the mouse brain, which is clearly prohibitive. This is a lower bound because, in general, more than one excitation pulse per neuron per frame may be required to excite detectable fluorescence [e.g., one reference reported 12 pulses per spot (Kim et al., 1999)]. For three-photon excitation, the situation will be even worse as higher peak light intensities are required to excite three-photon fluorescence.

Could the single-pulse energy be reduced while maintaining efficient two-photon excitation? The number of two-photon (2P) transitions excited per fluorophore per pulse is n a = F2C/t, where F is the number of photons per pulse per area in units of photon/cm2, C is the two-photon cross-section in units of cm4s/photon, and t is the pulse duration in seconds. This can be approximated as

n a = ( E h c / λ ( λ 2 ( NA ) ) 2 ) 2 C t = ( 4 E ( NA ) 2 h c λ ) 2 C t

where NA is the numerical aperture of the focusing optics, E is the pulse energy and λ is the stimulation wavelength. For a 2P experiment with 100 fs, 0.1 nJ pulses, assuming a 2P cross section (Masters, 2006; Drobizhev et al., 2011) of 10−48 cm4s/photon (i.e., 100 Goeppert-Mayer units Goeppert-Mayer, 1931, comparable to that of DsRed2 Drobizhev et al., 2011), λ = 900 nm and NA = 1.0, n a ≈ 1 2 . Thus, a few pulses are likely necessary and sufficient to excite 2P fluorescence by each fluorophore within the focal spot. With a 2P cross section above 10−47 cm4s/photon (1000 Goeppert-Mayer units, higher than that of any fluorescent protein that we are aware of Drobizhev et al., 2011), one could reduce the pulse energy by an order of magnitude (and hence n a by two orders of magnitude) while maintaining n a > 1 20 , i.e., one in 20 fluorophores excited by each pulse. Reducing the pulse energy much further might lead to unacceptably low excitation levels. Alternatively, shorter pulse durations could increase the light intensity, and hence 2P excitation probability, at fixed pulse energy.

Quantum dots can have 2P cross sections much higher than those of fluorescent proteins: water-soluble cadmium selenide–zinc sulfide quantum dots have been reported with 2P cross sections of 47000 Goeppert-Mayer units and are compatible with in vivo imaging (Larson et al., 2003). These would allow excitation efficiencies of n a > 1 20 at pJ pulse energies, bringing whole-brain 2P imaging into the ~100 mW range. Thus, the use of quantum dots or other ultra-bright multi-photon indicators could be decisive for supporting the energetic feasibility of multi-photon methods at whole brain scale; there are also plausible strategies for coupling quantum dot fluorescence to neuronal voltage (Marshall and Schnitzer, 2013). However, some quantum dots have long fluorescence lifetimes (Dahan et al., 2001), which may constrain scan speed.

For comparison to current practice, in a typical multi-photon experiment on mice, ~50 mW of time-averaged laser power at the sample was used with a dwell time of ~3 μs (Wilson et al., 2007), corresponding to ~150 nJ energy dissipation per spot per frame. This dwell time would allow imaging only ~300 neurons at millisecond resolution with a single scanned excitation beam. The average excitation power here is likely already close both to whole-brain thermal dissipation limits, and to photo-damage limits for pulsed two-photon excitation (König et al., 1997; Hopt and Neher, 2001).

4.2.3. Bioluminescence

To work around the requirement for large amounts of excitation light, bioluminescent rather than fluorescent activity indicators could be used (Martin et al., 2007; Martin, 2008; Naumann et al., 2010). Consider a hypothetical activity-dependent bioluminescent indicator emitting at ~1700 nm (IR), in order to evade light scattering. As a crude estimate, assuming that 100 photons must be collected by the detector per neuron per 1 ms frame, and 1% light collection efficiency by the detector relative to the emitted photons, ~100 mW of bioluminescent photons emissions are required for the entire mouse brain (using E photon = hc/λ). This would be feasible from the perspective of heat dissipation. By contrast, in a 1-photon fluorescent scenario, if 100 excitation photons must be delivered into the brain to generate a single fluorescent emission photon, the power requirement becomes 10 mW, which is on the threshold of the steady-state heat dissipation limit. Therefore, bioluminescent indicators could potentially circumvent problems of heat dissipation even in the 1-photon case.

The widely used bioluminescent protein firefly luciferase is ~80% efficient in converting ATP hydrolysis coupled with luciferin oxidation into photon production, yielding ~0.8 photons per ATP-luciferin pair consumed (Seliger and McElroy, 1960), and has ~90% energetic efficiency in converting free energy to light production. Heat dissipation associated with the luciferase biochemistry itself is therefore not a significant overhead relative to the 100 mW of emitted photons calculated above. In the same scenario, however, each neuron would consume ~6 × 108 additional ATP molecules per minute in order to power the bioluminescence, which is within the limits of cellular aerobic respiration rates (~1 fmol O 2 per minute per cell Molter et al., 2009, with ~30 ATP per 6 O 2 , hence 3×109 molecules ATP synthesized per minute from ADP via glucose oxidation), but not by a large margin. Transient increases in metabolic rate are possible: energy dissipation more than doubles in the mouse during high physical activity (Speakman, 2013). Therefore, whole-brain activity-dependent bioluminescence, at speeds high enough to achieve millisecond frame rates, may be metabolically taxing for the cell but is nevertheless plausible as a light generation strategy. Note that we have not treated the energy required to bio-synthesize the luciferin compound, which may create additional overhead (though conceivably luciferin could be provided exogenously).

4.2.4. Conclusions and future directions

Scattering of visible light in the brain creates a problem of signal-separation from deep-brain neurons. Multi-photon techniques, which scan an infrared excitation beam, can work around this scattering problem. However, current multi-photon techniques using fluorescent protein indicators, when applied at whole brain scale, would dissipate too much power to avoid thermal damage to brain tissue. Systems [such as plasmonic nano-antennas (Blanchard et al., 2011) or subwavelength metallic gratings (Harats et al., 2011)] that could locally excite multi-photon fluorescence without the need for high-energy laser pulses could conceivably ameliorate this issue. Importantly, quantum dots show promise as ultra-bright multi-photon indicators, if they can be targeted to neurons and optimized in terms of fluorescence lifetime. New methods besides multi-photon techniques could also work around the scattering of visible light in the brain. For example, fluorophores or bio-luminescent proteins could be developed which operate at infrared wavelengths. A compelling example from nature is the black dragonfish, which generates far red light (~705 nm) via a multi-step bioluminescent process (using this light to see in deep ocean waters) (Widder et al., 1984; Campbell and Herring, 1987). A large set of activity indicators with distinguishable colors, generated through a combinatorial genetic recombination mechanism such as BrainBow (Livet et al., 2007), could also improve signal separation. Targeting, via protein tags, of activity indicators to specific locations—such as the axon, soma, soma and proximal dendrites, distal dendrites, pre-synaptic terminals, post-synaptic terminals, or intact synapses—could also aid in signal discrimination (El-Husseini et al., 2001; Jacobs et al., 2003; Boeckers et al., 2005; Arnold, 2007; Feinberg et al., 2008; Vacher et al., 2008; Corrêa et al., 2009; Yamagata and Sanes, 2012). In addition, implanted optical devices, which place emitters and detectors within a few scattering lengths of the neurons being probed, could potentially obviate the negative effects of scattering and allow visible-wavelength indicators to be used without a need for multi-photon excitation. In principle, excitation and detection do not need to make use of the same modality. For example, photoacoustic microscopy (Filonov et al., 2012) uses pulsed laser excitation to drive ultrasonic emission, leading to optical absorption contrast. Such asymmetric techniques impose fundamentally different requirements from pure-optical techniques relative to fluorophore properties, required light intensities and other parameters.

4.3. Embedded Active Electronics

The preceding sections have assumed that electrical or optical signals from the recorded neurons are shuttled out of the brain before digitization and storage, but it is also conceivable to develop embedded electronic systems that locally digitize and then store or transmit (e.g., wirelessly) measurements of the activities of nearby neurons. This could allow for shorter wires in electrical recording approaches, and for shorter light path lengths in optical recording approaches, as well as for more facile (e.g., non-surgical) delivery mechanisms for the recording hardware.

Integrated circuits have shrunk to a remarkable degree: in about 3 years, following the Moore's law trajectory, it will likely be possible to fit the equivalent of Intel's original 4004 micro-processor in a 10 × 10 μm chip area. Functional wirelessly powered radio-frequency identification (RFID) chips as small as 50 μm in diameter have been developed (Usami et al., 2007) and tags with chip-integrated antennas function at the 400 μm scale (Impinj, Inc.). Integrated neural sensors including analog front ends are also scaling to unprecedented form factors: a 250 × 450 μm wireless implant—including the antenna, but not including a ~1 mm electrode shank used to separate signal from ground—draws only 2.5 μW per recording channel (Biederman et al., 2013). The system operates at ~1 mm range in air, powered by a transmitter generating ~50 mW of transmitted power. Note that for a single such embedded recording device, the heat dissipation constraint is set not by the device's own dissipation (10 μW for four recording channels) but rather by the RF specific absorption rate limit associated with the 50 mW transmit power.

Possibilities may exist for non-surgical delivery of embedded electronics to the brain: remarkably, cells such as macrophages (~13 μm in size) can engulf structures up to at least 20 μm in diameter (Cannon and Swanson, 1992) and have been studied as potential delivery vehicles for nano-particle drugs (Kadiu et al., 2011), suggesting that they might be used to deliver tiny microchips. T-cells and other immune cells can trans-migrate across the blood brain barrier (Engelhardt, 2006) and ghost cells (membranes purged of their contents) engineered to encapsulate synthetic cargo (Cinti et al., 2011) can fuse with neurons (Hikawa et al., 1989). It might even be possible to engineer such cell-based delivery vehicles to form electrical gap junctions (Spruston, 2001) with neurons or to act as local biochemical sensors (Nguyen et al., 2009).

The real-time transmission bandwidth requirements for neural recording could be significantly reduced if it is only desired to take a “snapshot” of neural activity patterns over a limited period of time, but this would require a large amount of local storage. For example, flash memory can store >10 Mbit of data in a device 100 μm on a side: a 64 giga-byte microSD card with 1.5 cm2 area corresponds to 34 mega-bits per (100 μm)2 area. Even denser forms of memory storage are under development and could perhaps be used in a one-time-write mode in the context of neural recording long before they become commercially viable for use as rewritable media in the electronics industry.

Here we consider the power dissipation associated with embedded electronic recording devices, as well as the constraints on possible methods to power them. In the next section, we describe how physics constrains the data transmission rates from such devices.

4.3.1. Power requirements for recording

Any embedded system needs to process data, in preparation for either local storage or wireless transmission. Physics defines hard limits on the required power consumption associated with data processing (neglecting the possibility of reversible logic architectures Bennett, 1973), arising from the entropy cost for erasing a bit of information (Landauer, 1961):

E Landauer = ln ( 2 ) k B T ≈ 3 × 10 − 21 J/bit (the Landauer limit)

Ambitious yet physically realistic values for beyond-CMOS logic lie in the tens of k B T per bit processed (Yablonovitch, 2008). Scaling 40k B T/bit to record raw voltage waveforms at a minimal 1 kbit/s/neuron (e.g., 1 kHz sampling rate, 1 bit processed per neuron per sample), the total power consumption for whole mouse brain recording could in principle be as low as ~16 nW. While this leaves >106-fold more room (energetically) for increased data processing (more required bit flips per second), or energetic inefficiency of the switching device (greater dissipation per bit), realistic devices in the near-term may in fact require this much overhead, if not more. This necessitates a more detailed consideration of limiting factors for today's microelectronic devices.

In the context of electrical recording, the first step that must be performed by an embedded neural recording device is digitization of the voltage waveform. Until mV-scale switching devices are developed (see discussion below), it is necessary to amplify the ~10–100 μV spike potential in order to drive digital switching events in downstream gates. During this sub-threshold amplification step, a CMOS (or BJT) device will dissipate static power (associated with a bias current). Importantly, in order to decrease the input-referred voltage noise of this amplification process, it is necessary to increase the bias current and hence the static power dissipation. For a simple differential transistor amplifier, the minimal bias current scales as

I d = π 2 4 k B T V noise 2 k B T q BW

where V noise is the input-referred voltage noise of the amplifier and q is the electron charge. For an extracellular recording with BW = 10 kHz and V noise = 10 μV, this implies a minimal bias current I d ≈ 60 nA or a minimal static power of (I d V dd ) ≈ 6 × 10−8 W at V dd ≈ 1 V operating voltage. Assuming 10 neurons per recording channel, there are then 7.5 million recording channels for a mouse brain, which gives a power dissipation associated with signal amplification of ~500 mW. Note that realistic analog front ends (which are subject to 1/f noise and require multiple gain stages) draw 6×–10× greater bias current, quantified by the noise efficiency factor (NEF) (Steyaert et al., 1987), to achieve the same input-referred noise levels.

Local on-chip digital computation also incurs an energy cost. Current CMOS digital circuits consume 5–6 orders of magnitude (Yablonovitch, 2008; Koomey et al., 2011; Tucker, 2011; Tucker and Hinton, 2011) more energy per switching event (~1 fJ/bit including charging of the wires Tucker and Hinton, 2011) compared to the Landauer limit (e.g., for a digital CMOS inverter, and ignoring the static power associated with the leakage current). This corresponds to a ~1 fF total load capacitance at 1 V operating voltage. For 100 GHz switching rates (108 neurons × 1 kHz) as above, this corresponds to 0.01–0.1 mW. Realistic architectures, however, will incur overhead in the number of switching events required to store, compress and/or transmit neural signals, likely bringing the power consumption into an unacceptable range (e.g., 1000 bits processed per sample would be 100 mW here). To take a concrete example, commercial RFID tags consume ~10 mW (Fraunhofer Institute for Photonic Microsystems, 2011). At a chip rate of 256 kbit/s (with a Miller encoding of 2), this yields 7.8×10−11 J/bit, which is ~10 orders of magnitude higher than the Landauer limit. Applying current RFID technology to whole mouse brain recording at 1 kbit/s/neuron would thus draw ~8 W of power. Therefore, at least 2–3 orders of magnitude reduction in power consumption will be necessary in order to apply embedded electronics for whole-brain neural recording.

Until recently, the energy efficiency of digital computing has scaled on an exponential improvement curve (Koomey et al., 2011). This was a consequence of Moore's law and Dennard scaling, where both the capacitance of each transistor and its associated interconnect, as well as the operating voltages, were reducing with the device dimensions. Unfortunately, issues related to device variability and the 3D structures needed to maintain the on-to-off current ratio have largely stopped the reduction in effective capacitance per device; current devices are stuck at ~100–200 aF for a minimum sized transistor. Furthermore, the exponential increase in leakage current that comes along with the scaling of the threshold voltage in this scenario has precluded substantial further decreases in voltage at a given performance level. Indeed, for the past several technology generations (since about 2005), CMOS devices have operated at a supply voltage of ~1 V.

While neural signal processing does not demand very stringent transistor speeds and so reductions below ~1 V are certainly feasible, a fundamental limitation in scaling the supply voltage still remains. Specifically, CMOS has a well-defined minimum-energy per bit and an associated minimum-energy operating voltage that is defined by the tradeoff between static (leakage) and dynamic (switching) energy: as the operating voltage is decreased, the capacitive switching energy decreases, but the ratio of currents in the on and off states, I off /I on , increases exponentially, increasing the energy associated with leakage (this effect is independent of the threshold voltage in the sub-threshold regime). For practical circuits, the supply voltage that leads to this minimum energy is on the order of 300–500 mV, and thus supply voltage scaling will at most provide 3×–10× improvement in energy over today's designs.

Thus, a paradigm shift in microelectronic hardware is needed to reduce power by several orders of magnitude if we are to approach the physical limits. Developing a switching device operating in the mV range, rather than the 1 V range of current transistors, would allow (1 V/1 mV)2 = 106 fold reduction in power consumption (Yablonovitch, 2008). Electronic circuits constructed using analog techniques (Sarpeshkar, 1998), which sometimes rely on bio-inspired computational architectures, show promise for reducing energy costs by up to five orders of magnitude (Sarpeshkar, 1998; Mandal and Sarpeshkar, 2007; Rapoport et al., 2009), depending on the nature of the computation and the required level of precision.

Figure 4 shows the power consumption per bit processed for several technology classes as well as the corresponding total power consumption required for whole brain readout, assuming a minimal whole-brain bit rate of 100 Gbit/s.

FIGURE 4

Figure 4. Energy cost of elementary operations across a variety of recording and data transmission modalities, expressed in units of the thermal energy (left axis) and as a power assuming 100 GHz switching rate (right axis). The Landauer limit of k B T ln 2 sets the minimum energy associated with a logically irreversible bit flip. The practical limit will likely lie in the tens of k B T per bit (Yablonovitch, 2008), comparable to the free energy release for hydrolysis of a single ATP molecule (or addition of a single nucleotide to DNA or RNA). The energy of a single infrared photon is ~50 k B T. Single gates in current CMOS chips dissipate ~1×105–106k B T per switching event, including the capacitive charging of the wires interconnecting the gates (red curve). The switching energy for the gate, not including wires, is ~100× lower (blue curve). The power efficiency of CMOS has been on an exponential improvement trend due to the miniaturization of components according to Moore's law [data re-digitized from Tucker and Hinton (2011)], although power efficiency gains have slowed recently. Current RFID chips compute and communicate at ~1×109–1010k B T (>10 pJ) per bit transmitted, while the total energy cost per floating point operation in a 2010 laptop was ~1×1012k B T. The power associated with a minimal low-noise CMOS analog front end for signal amplification corresponds to ~500 mW at whole mouse brain scale. A single two-photon laser pulse at 0.1 nJ pulse energy corresponds to ~1×1010k B T. For comparison, the 40 mW approximate maximal allowed power dissipation, according to Section 2 above, with its equivalent per-bit energy of ~1×108k B T at the minimal 100 Gbit/s bit rate.

4.3.2. Powering embedded devices

Embedded systems need power, which could be supplied via electromagnetic or acoustic energy transfer, or could be harvested from the local environment in the brain.

There are two key regimes for wireless electromagnetic power transfer: non-linear device rectification and photovoltaics. If the single-photon energy is sufficient to allow electrons to move from the valence to the conduction band—that is, band gap < hν/q, where q is the electron charge, h is Planck's constant, and ν is the frequency of the photon—a photovoltaic effect can occur. Otherwise, electromagnetic energy is converted to voltage by an antenna and non-linear device rectification may occur.

When photon energies are much lower than the band gap, power conversion is governed by the total RF power and by the impedances of the antenna and the rectifier, rather than by the individual photon energy. For a monochromatic RF source, there is no thermodynamic or quantum limit to the RF to DC conversion efficiency, other than the resistive losses and threshold voltages for a semiconductor process. For rectification, when the input voltage to the rectifier is much higher than a semiconductor process threshold, conversion efficiencies of 85% have been achieved (Sun and Chang, 2002). At low input voltages relative to the semiconductor process threshold, efficiencies as high as 25% and 2 μW load have been achieved [see (Mandal and Sarpeshkar, 2007) for an analysis of power efficiency]. Ultimately, rectification improvements are dependent on the same improvements which will be needed for next-generation low-power computing: mV scale switching devices (promising research directions include tunnel FETs (Ionescu and Riel, 2011), electromechanical relays (Liu et al., 2012) and other options).

While efficient rectification is thus not a fundamental issue, capturing sufficient RF energy in the first place becomes increasingly challenging as microchips become smaller and more deeply embedded in tissue. Wireless electromagnetic power transfer imposes range constraints due to the loss in power density with distance. For directional power transfer, placing the receiver at the edge of the transmitter's near field (the Rayleigh distance D 2 4 λ where D is the transmitter aperture) has advantages in terms of energy capture efficiency (Ozeri and Shmilovitz, 2010), whereas for omni-directional antennas it is advantageous to place the receiver as close as possible to the transmitter. If embedded chips are oriented randomly with respect to the transmitter, the radiation patterns of their antennas cannot be highly directional, i.e., their gains G r (a measure of directionality) must be close to one. In the far field, this lack of directionality limits power capture by the antenna [due antenna reciprocity (Gershenfeld, 2000)]: the maximal power P A available to the chip is

P A = G r P rad λ 2 4 π

where P rad is the power density of radiation around the antenna, λ is the wavelength and G r ≈ 1 for a non-directional antenna (Mandal and Sarpeshkar, 2007).

It may be possible to power devices with pure magnetic fields (which are highly penetrant) via near-field (non-radiative) inductive coupling, which is widely used in systems ranging from biomedical implants to electric toothbrushes, or conceivably by using magneto-electric materials (Fiebig, 2005; Priya et al., 2009; Kitagawa et al., 2010; Yue et al., 2012). For the case of simple inductive coupling, however, the tiny cross-sections of micro-devices limit the amount of power which can be captured: a loop of 10 μm diameter in an applied field of 1 T switching at 1000 Hz produces an induced electromotive force of only 0.1 μV. Assuming a copper loop (~17 nΩm resistivity) with 1 × 1 μm cross-section and 40 μm length (around the outer edge of the chip) gives a power (V2/R) of only ~15 fW associated with the induced current. In general, the use of coupled high-Q resonators can increase the range and efficiency of near-field electromagnetic power transfer by orders of magnitude (Karalis et al., 2008) compared to non-resonant inductive power transfer and may be particularly relevant for implanted devices (Ho et al., 2013). Unfortunately, at the ~10 μm length scale, the achievable on-chip inductances and capacitances are severely limited, which restricts the operating range of any resonant device to high frequencies ( f resonant = ( 2 π L C ) − 1 ) which will be attenuated by tissue. Electromagnetic near-field power transfer though tissue to ultra-miniaturized microchips may thus be inefficient, again due to low capture efficiency of the applied fields by tiny device cross-sections.

Alternatively, if the photon energy is above the silicon band gap ( λ < h c q V th ≈ 3 μm or less for silicon), the chip is essentially acting as a photovoltaic cell. There is no thermodynamic or quantum limit to the conversion efficiency of light to DC electrical power for monochromatic sources, other than resistive losses and dark currents in the material (86% in GaAs for example Bett et al., 2008). Again, however, capturing sufficient light becomes difficult for tiny devices. To supply 10 μW (typical of current wirelessly-powered RFID chips) photovoltaically to a 10 × 10 μm (cell sized) chip at 34% photovoltaic efficiency requires a light intensity of ~300 kW/m2 at the chip, which is prohibitive. Furthermore, in the use of infrared light for photovoltaics, the penetration of the photons through tissue is decreased compared to radio frequencies.

Piezoelectric harvesting of ultrasound energy by micro-devices is a possibility (Seo et al., 2013). The efficiency of electrical harvesting of mechanical strain energy in piezoelectrics can be above 30% for materials with high electromechanical coupling coefficients (e.g., PZT) (Ahmad and Akdoan, 2008; Xu et al., 2012). The losses in the piezoelectric transduction process are well described by models such as the KLM model (Krimholtz et al., 1970; Castillo et al., 2003).

An alternative to wireless energy transmission is the local harvesting of biochemical energy carriers. Implanted neural recording devices could conceivably be powered by free glucose, the main energy source used by the brain itself. The theoretical maximum thermodynamic efficiency for a fuel cell in aqueous solution is equal to that of the hydrogen fuel cell: ΔG0/ΔH0 = 83% at 25°C. Furthermore, if glucose is only oxidized to gluconic acid, the Coulombic (electron extraction) efficiency is at most 8.33% (Rapoport et al., 2012), which bounds the thermodynamic efficiency. The blood glucose concentration in rats has been measured at ~7.6 mM, with an extracellular glucose concentration in the brain of ~2.4 mM (Silver and Erecińska, 1994). A hypothetical highly miniaturized neural recorder with a device area of 25 × 25 μm and efficiency of 80%, processing a blood flow rate of ~1 mm/s (Ivanov et al., 1981) could extract (80%)(7.6 mM)(25 μm)2(1 mm/s)(2880 kJ/mol) ≈ 11 μW, which is sufficient for low-power device such as RFID chips (Cho et al., 2005). Unfortunately, current non-microbial glucose fuel cells obtain only ~180 μW/cm2 peak power and ~3.4 μW/cm2 steady state power (Rapoport et al., 2012). Thus there is a need for 104- and 106-fold improvements in peak and steady state power densities, respectively, for non-microbial glucose fuel cells to power brain-embedded electronics of the complexity of today's RFID chips (or for the corresponding decrease in power requirements, as emphasized above).

4.3.3. Conclusions and future directions

The power consumption of today's microelectronic devices is more than six orders of magnitude higher than the physical limit for irreversible computing, and 2–3 orders of magnitude higher than would be permissible for use in whole brain millisecond resolution activity mapping, even under favorable assumptions on the required switching rates and neglecting both the power associated with noise rejection in the analog front end and the CMOS leakage current. Thus, the first priority is to reduce the power consumption associated with embedded electronics. In principle, methods such as infrared light photovoltaics, RF harvesting via diode rectification, or glucose fuel cells, could supply power to embedded neural recorders, but again, significant improvements in the power efficiency of electronics are necessary to enable this. Other potential energy harvesting strategies include materials/enzymes harnessing local biological gradients such as in voltage, osmolarity, or temperature. An analysis of the energy transduction potential of each of these systems is beyond the scope of this discussion. Fortunately, with many orders of magnitude potential for improvement before physical limits are reached, we may expect that embedded nano-electronic devices will emerge as an energetically viable neural interfacing option at some point in the future.

4.4. Embedded Devices: Information Theory

Most recording methods envisioned thus far rely on the real-time transmission of neural activity data out of the brain. Physics and information theory impose fundamental limits on this process, including a minimum power consumption required to transmit data through a medium. The most basic of these results hold irrespective of whether the data transmission is wired or wireless, and regardless of the particular physical medium (optical, electrical, acoustic) used as the information carrier.

A communication “channel” is a set of transmitters and receivers that share access to a single physical medium with fixed bandwidth. The bandwidth is the range of frequencies present in the time-varying signals used to transmit information. In wireless communications, information is transmitted by modulating a carrier wave. To allow modulation, the frequency of the carrier wave must be higher than the bandwidth: for example, a 400 THz visible light wave may be modulated at a 100 GHz rate. The physical medium underlying a channel could be a wire (with a bandwidth set by its capacitive RC time constant), an optical fiber, free space electromagnetic waves over a certain frequency range, or other media.

As a concrete example, consider a police department with 100 officers, each possessing a hand-held radio. The radios transmit vocalizations by modulating an 80 MHz carrier wave at ~10 kHz. This constitutes a single shared communications channel with 10 kHz bandwidth. Simultaneously, the fire department may communicate via a separate channel, also with a bandwidth of ~10 kHz, by modulating a 90 MHz carrier wave. The channels are separate because modulation introduced into one does not affect the other. If the neighboring town's police department makes the mistake of also operating at 80 MHz carrier frequency, then they share a channel and conflicts will arise.

4.4.1. Power requirements for single-channel data transmission

We first treat the case in which there is a single channel for transmitting data out of the brain. As discussed above in the context of electrical spike sorting, the Shannon Capacity Theorem (Cover and Thomas, 2006) sets the maximal bit rate for a channel (assuming additive white Gaussian noise) to

R max = BW log 2 ( 1 + SNR )

where BW is the channel bandwidth and SNR is the signal-to-noise ratio. If there is only thermal noise the SNR = P/(N 0 BW), where N 0 is the thermal noise power spectral density of k B T W/Hz and P = (PL)P 0 is the power of the transmitted signals P 0 , weakened by path loss PL. Therefore the transmitted power P 0 is lower-bounded:

P 0 > k B T BW 2 R max / BW − 1 PL

as shown in Figure 5 (bottom). In a minimal model of a transmitter-receiver system, there thus exists a tradeoff between the required signal power and the bandwidth of the carrier radiation, due to the thermal noise floor, even in the absence of path loss (PL = 1).

FIGURE 5

Figure 5. Power requirements imposed by information theory on data transmission through a single (additive white Gaussian noise) channel with carrier frequency ν (an upper bound on the bandwidth), given thermal noise and path loss. Bottom: absorption length of water as a function of frequency (blue), minimal power to transmit data at 100, 1000, and 10,000 Gbit/s (green) as a function of frequency, assuming thermal noise but no path loss. Top: minimal power to transmit data at 100, 1000 and 10,000 Gbit/s as a function of frequency, assuming thermal noise and a path loss corresponding to the attenuation by water absorption over a distance of 2 mm. While formulated for a single channel, at certain wavelengths (e.g., RF) these factors also constrain multiplexed data transmissions between many transmitters and many receivers, depending on capacity of the system for spatial multiplexing. Horizontal dashed lines: 40 mW, the approximate maximal whole-brain power dissipation in steady state.

Path loss weakens the proportion of the power that can reach the detector. Using the above equation, we can calculate, as a function of bandwidth, the power necessary to transmit a target whole-brain bit rate of 100 Gbit/s through a medium with path loss dependent on the carrier wavelength, as shown in Figure 5 (top).

For RF wavelengths, the radiation penetrates deeply but the achievable data rates are low without excessive power consumption, due to the limited bandwidth. For wavelengths intermediate between RF and infrared, the penetration depth is low and power must be expended to combat these losses, despite the high carrier bandwidth. Only in the infrared and visible ranges do the tradeoffs between power, bandwidth and penetration depth allow transmission of >100 Gbit/s out of the brain through a single channel without unacceptable power consumption.

The analysis above has ignored the effects of noise sources other than thermal noise, but many additional noise sources will increase the amount of power needed to transmit data, via a decrease in the SNR at fixed input power. For optical transmission in the brain, the noise is dominated by time-correlated “speckle noise” below 200 kHz, which arises mostly from local blood flow (Carp et al., 2011). This correlated noise, which cannot be filtered by simple averaging, could be avoided by modulating optical signals at frequencies above 200 kHz.

4.4.2. Spatially multiplexed data transmission

As discussed above, transmitting information through a single channel imposes direct limits on bit rate, carrier frequency and input power. However, it is conceivable to divide the data transmission burden over many independent channels, i.e., over many pairs of transmitters and receivers, each operating at lower bandwidth (e.g., at radio frequencies). Indeed, this would be optimal in a scenario where many embedded devices measure and then transmit the activities of nearby neurons. As a concrete example of such “spatial multiplexing,” an effective capacity of 1 Tbit/s could conceivably be obtained by splitting the data over 1000 transmitter-receiver pairs each operating at 1 Gbit/s, with the transmitters arranged in a 10 × 10 × 10 grid. Importantly, in order to exceed the above limits for single-channel data transmission, it must be possible for these transmitter receiver pairs to share the same bandwidth and operate simultaneously without conflicts, for example by modulating distinguishable carrier waves or by transferring data over separate wires. The conditions under which this may occur, however, can be counter-intuitive. For example, for antennas to operate independently, they must be spaced apart from one another by roughly a wavelength. For 10 GHz microwaves, the wavelength is ~3 cm, so no more than a handful of microwave transmitters (e.g., operating at frequencies in the 100 GHz–1 THz range) can co-occupy the mouse brain while operating independently.

Even with many non-independent transmitters co-occupying the brain and operating simultaneously over the same frequency spectrum, it may be possible under some conditions to “factor out” the effects of the coupling and allow an increase in channel capacity relative the single-channel result. To treat such scenarios, a generalization to Shannon's capacity theorem to multi-input-multi-output (MIMO) channels has shown that the maximal total data rate is

R max = BW · log 2 | I + ( SNR ) H H * |

where I is the identi