It is very likely that everything that makes you be you is encoded in the physical structure of your brain. If we could extract all this information, then we ought to be able to 'run' you as a program on some powerful computer. This potential future technology gets called "whole brain emulation", and the

Future of Humanity Institute prepared a very detailed roadmap in 2008 which covers a huge amount of research in this direction, combined with some scale analysis of the difficulty of various tasks.

We're currently nowhere near having the technology to do this. We have neither the ability to extract this information from your brain nor the computational power to simulate even a minute of thought a year. [1] The social and economic impact of having running people as software would be huge, however, so it would be nice to know how likely this is. Could we have it in a decade? A century? Ever?

The human brain is incredibly complex. People have around a hundred billion neurons, with even more connections between them. Even the fruit fly has around a hundred thousand. That's more than we can handle right now. The tiny nematode C. elegans, a 1mm roundworm, has only 302 neurons, so few that we have names for all of them. It is also very well understood, having been studied as a model animal for at least fifty years. We have its brain fully mapped out: perhaps we should be able to simulate it by now?

There have been several projects to create a nematode simulation. The NemaSys project at the University of Oregon in the late 1990s planned a full model, including the body, every neuron, every "electrical and chemical synapse and neuromuscular junction", and a "complete set of sensory modalities". They never published a paper on their simulation, however, so I can't tell if they even got funded.

The Perfect C. elegans Project was a 1998 attempt at something similar by a different group of people. They got as far as releasing an initial report (pdf) but their simulation was not complete at the time of the paper. I don't see anything else from them afterwards, so it looks to me like they did not end up completing it.

The Virtual C. elegans Project at Hiroshima University around 2004 was another attempt at nematode simulation. They released two papers describing their simulation: A Dynamic Body Model of the Nematode C. elegans With Neural Oscillators and A Model of Motor Control of the Nematode C. elegans With Neuronal Circuits. The basic idea is that they would set up the most realistic nematode they could, then simulate poking it on the head. It should back away from the poke. While they did manage this, one of their steps was kind of cheating. They simulated the neurons of the nematode's brain, but they didn't know the connection weights [2]. Instead of getting this information from the nematode, they used a machine learning algorithm to find some weights that would work.

Recently there has been new work on simulating movement of C. elegans. Past simulations had used simple enough body models that the simulated worms would have been unable to move in their normal way: not simulating the weight distribution and instead having equal friction at all points. Two papers proposed improved body models: A Biologically Accurate 3D Model of the Locomotion of Caenorhabditis Elegans (2010) and C. elegans Locomotion: an Integrated Approach (2009).

David Dalrymple is currently working on this at MIT. He writes:

In short form, my justification for working on such a project where many have failed before me is: The "connectome" of C. elegans is not actually very helpful information for emulating it. Contrary to popular belief, connectomes are not the biological equivalent of circuit schematics. Connectomes are the biological equivalent of what you'd get if you removed all the component symbols from a circuit schematic and left only the wires. Good luck trying to reproduce the original functionality from that data. What you actually need is to functionally characterize the system's dynamics by performing thousands of perturbations to individual neurons and recording the results on the network, in a fast feedback loop with a very very good statistical modeling framework which decides what perturbation to try next. With optogenetic techniques, we are just at the point where it's not an outrageous proposal to reach for the capability to read and write to anywhere in a living C. elegans nervous system, using a high-throughput automated system. It has some pretty handy properties, like being transparent, essentially clonal, and easily transformed. It also has less handy properties, like being a cylindrical lens, being three-dimensional at all, and having minimal symmetry in its nervous system. However, I am optimistic that all these problems can be overcome by suitably clever optical and computational tricks. I'm a disciple of Kurzweil, and as such I'm prone to putting ridiculously near-future dates on major breakthroughs. In particular, I expect to be finished with C. elegans in 2-3 years. I would be Extremely Suprised, for whatever that's worth, if this is still an open problem in 2020.

There are also several researchers working in a distributed open source manner on the OpenWorm project. Stephen Larson, from that group, writes:

We've just published a structural model of all 302 neurons represented as NeuroML. NeuroML allows the representation of multi-compartmental models of neurons. We are using this as a foundation to overlay the c. elegans connectivity graph and then add as much as we can find about the biophysics of the neurons. We believe this represents the first open source attempt to reverse-engineer the c. elegans connectome. One of the comments mentioned Andrey Palyanov's mechanical model of the c. elegans. He is part of our group and is currently focused on moving to a soft-body simulation framework rather than the rigid one they created here. Our first goal is to combine the neuronal model with this physical model in order to go beyond the biophysical realism that has already been done in previous studies. The physical model will then serve as the "read out" to make sure that the neurons are doing appropriate things.

I wrote to Ken Hayworth who is a neuroscience researcher working on scanning and interested in whole brain emulation, and he wrote back:

I have not read much on the simulation efforts on C. elegans but I have talked several times to one of the chief scientists who collected the original connectome data and has been continuing to collect more electron micrographs (David Hall, in charge of www.wormatlas.org). He has said that the physiological data on neuron and synapse function in C. elegans is really limited and suggests that no one spend time simulating the worm using the existing datasets because of this. I.e. we may know the connectivity but we don't know even the sign of many synapses. If you look at a system like the retina I would argue that we already have quite good models of its functioning and thus it is a perfect ground for testing emulation from known connectivity. So the short answer is that I think it may be far easier to emulate a well characterized and mapped part of the mammalian brain than it is to emulate the worm despite its smaller size.

I then asked:

So even a nanoscale SEM pass over the whole brain wouldn't be enough unless we could find some way to visually read off the sign of a synapse, perhaps with a stain, perhaps by learning what different types of neurons look like, perhaps by something not yet discovered?

And he replied:

That is right, but those tell tale signs are well known for certain systems (like the retina) already, and will become more clear for others once large scale em imaging combined with functional recording becomes routine.

Stephen Larson disagrees with Hayworth:

I would respectfully disagree with Dr. Hayworth. I would challenge him to show a "well characterized and mapped out part of the mammalian brain" that has a fraction of the detail that is known in c. elegans already. Moreover, the prospect of building a simulation requires that you can constrain the inputs and the outputs to the simulation. While this is a hard problem in c. elegans, its orders of magnitude more difficult to do well in a mammalian system. There is still no retina connectome to work with (c. elegans has it). There are debates about cell types in retina (c. elegans has unique names for all cells). The gene expression maps of retina are not registered into a common space (c. elegans has that). The ability to do calcium imaging in retina is expensive (orders of magnitude easier in c. elegans). Genetic manipulation in mouse retina is expensive and takes months to produce specific mutants (you can feed c. elegans RNAi and make a mutant immediately). There are methods now, along the lines of GFP to "read the signs of synapses". There is just very little funding interest from Government funding agencies to apply them to c. elegans. David Hall is one of the few who is pushing this kind of mapping work in c. elegans forward. What confuses this debate is that unless you study neuroscience deeply it is hard to tell the "known unknowns" apart from the "unknown unknowns". Biology isn't solved, so there are a lot of "unknown unknowns". Even with that, there are plenty of funded efforts in biology and neuroscience to do simulations. However, in c. elegans there are likely to be many fewer "unknown unknowns" because we have a lot more comprehensive data about its biology than we do for any other species. Building simulations of biological systems helps to assemble what you know, but can also allow you to rationally work with the "known unknowns". The "signs of synapses" is an example of known unknowns -- we can fit those into a simulation engine without precise answers today and fill them in tomorrow. The statement that no one should start simulating the worm based on the current data has no merit when you consider that there is a lot to be done just to get to a framework that has the capacity to organize the "known unknowns" so that we can actually do something useful with them once they have them. More importantly, it makes the gaps a lot more clear. Right now, in the absence of any c. elegans simulations, data are being generated without a focused purpose of feeding into a global computational framework of understanding c. elegans behavior. I would argue that the field would be much better off collecting data in the context of adding to the gaps of a simulation, rather than everyone working at cross purposes. That's why we are working on this challenge of building not just a c. elegans simulations, but a general framework for doing so, over at the Open Worm project.

With the people currently working on this, I think we'll probably have a nematode simulation in about ten years. People have been working on this for at least 15 years [1], so that would be 25 years for a nematode simulation. The amount of discovery and innovation needed to simulate a nematode seems maybe 1/100th as much as for a person. [4] Naively this would say 100 * (15+10) or 2500 years for human whole brain emulation. More people would probably work on this if we had initial successes and it looked practical, though, giving us maybe a 10x boost? Which still is (100/10) * (15+10) or 250 years. This might go faster if we have some sort of intellegence amplification or other changes in how research works. Or all progress might stop as we run out of cheap fossil fuels. There are huge uncertainties here, but I don't think we'll be uploading anyone in this century.

(I previously made a lesswrong discussion post on this.)



[1] The whole brain emulation roadmap estimates 10^22 flops to simulate at the level of electrophysiology. The current fastest supercomputer runs at nearly 10^16 flops. So we'd need computers about a million times faster than we have today. That's twenty doublings, so if moore's law keeps us doubling every eightteen months for another thirty years we'll be there. I have trouble imagining moore's law not breaking down before then, though.

[2] A neuron firing can either excite or inhibit another neuron, and by a variable amount. This can be modeled as a single weight: negative for inhibitory, positive for excititory, and the magnitude of the number represents the connection strength.

[3] Possibly as long as 25: the connectome for C. elegans was published in 1986.

[4] I'm not counting improvements in computer speed and storage or in scanning technology in here, because these seem to be moving along quickly on their own.