Last month I got to attend the Asilomar Conference on Beneficial AI. I tried to fight it off, saying I was totally unqualified to go to any AI-related conference. But the organizers assured me that it was an effort to bring together people from diverse fields to discuss risks ranging from technological unemployment to drones to superintelligence, and so it was totally okay that I’d never programmed anything more complicated than HELLO WORLD.

“Diverse fields” seems right. On the trip from San Francisco airport, my girlfriend and I shared a car with two computer science professors, the inventor of Ethereum, and a UN chemical weapons inspector. One of the computer science professors tried to make conversion by jokingly asking the weapons inspector if he’d ever argued with Saddam Hussein. “Yes,” said the inspector, not joking at all. The rest of the conference was even more interesting than that.

I spent the first night completely star-struck. Oh, that’s the founder of Skype. Oh, those are the people who made AlphaGo. Oh, that’s the guy who discovered the reason why the universe exists at all. This might have left me a little tongue-tied. How do you introduce yourself to eg David Chalmers? “Hey” seems insufficient for the gravity of the moment. “Hey, you’re David Chalmers!” doesn’t seem to communicate much he doesn’t already know. “Congratulations on being David Chalmers”, while proportionate, seems potentially awkward. I just don’t have an appropriate social script for this situation.

(the problem was resolved when Chalmers saw me staring at him, came up to me, and said “Hey, are you the guy who writes Slate Star Codex?”)

The conference policy discourages any kind of blow-by-blow description of who said what in order to prevent people from worrying about how what they say will be “reported” later on. But here are some general impressions I got from the talks and participants:

1. In part the conference was a coming-out party for AI safety research. One of the best received talks was about “breaking the taboo” on the subject, and mentioned a postdoc who had pursued his interest in it secretly lest his professor find out, only to learn later that his professor was also researching it secretly, lest everyone else find out.

The conference seemed like a (wildly successful) effort to contribute to the ongoing normalization of the subject. Offer people free food to spend a few days talking about autonomous weapons and biased algorithms and the menace of AlphaGo stealing jobs from hard-working human Go players, then sandwich an afternoon on superintelligence into the middle. Everyone could tell their friends they were going to hear about the poor unemployed Go players, and protest that they were only listening to Elon Musk talk about superintelligence because they happened to be in the area. The strategy worked. The conference attracted AI researchers so prestigious that even I had heard of them (including many who were publicly skeptical of superintelligence), and they all got to hear prestigious people call for “breaking the taboo” on AI safety research and get applauded. Then people talked about all of the lucrative grants they had gotten in the area. It did a great job of creating common knowledge that everyone agreed AI goal alignment research was valuable, in a way not entirely constrained by whether any such agreement actually existed.

2. Most of the economists there seemed pretty convinced that technological unemployment was real, important, and happening already. A few referred to Daron Acemoglu’s recent paper Robots And Jobs: Evidence From US Labor Markets, which says:

We estimate large and robust negative effects of robots on employment and wages. We show that commuting zones most affected by robots in the post-1990 era were on similar trends to others before 1990, and that the impact of robots is distinct and only weakly correlated with the prevalence of routine jobs, the impact of imports from China, and overall capital utilization. According to our estimates, each additional robot reduces employment by about seven workers, and one new robot per thousand workers reduces wages by 1.2 to 1.6 percent.

And apparently last year’s Nobel laureate Angus Deaton said that:

Globalisation for me seems to be not first-order harm and I find it very hard not to think about the billion people who have been dragged out of poverty as a result. I don’t think that globalisation is anywhere near the threat that robots are.

A friend reminded me that the kind of economists who go to AI conferences might be a biased sample, so I checked IGM’s Economic Expert Panel (now that I know about that I’m going to use it for everything):

It looks like economists are uncertain but lean towards supporting the theory, which really surprised me. I thought people were still talking about the Luddite fallacy and how it was impossible for new technology to increase unemployment because something something sewing machines something entire history of 19th and 20th centuries. I guess that’s changed.

I had heard the horse used as a counterexample to this before – ie the invention of the car put horses out of work, full stop, and now there are fewer of them. An economist at the conference added some meat to this story – the invention of the stirrup (which increased horse efficiency) and the railroad (which displaced the horse for long-range trips) increased the number of horses, but the invention of the car decreased it. This suggests that some kind of innovations might complement human labor and others replace it. So a pessimist could argue that the sewing machine (or whichever other past innovation) was more like the stirrup, but modern AIs will be more like the car.

3. A lot of people there were really optimistic that the solution to technological unemployment was to teach unemployed West Virginia truck drivers to code so they could participate in the AI revolution. I used to think this was a weird straw man occasionally trotted out by Freddie deBoer, but all these top economists were super enthusiastic about old white guys whose mill has fallen on hard times founding the next generation of nimble tech startups. I’m tempted to mock this, but maybe I shouldn’t – this From Coal To Code article says that the program has successfully rehabilitated Kentucky coal miners into Web developers. And I can’t think of a good argument why not – even from a biodeterminist perspective, nobody’s ever found that coal mining areas have lower IQ than anywhere else, so some of them ought to be potential web developers just like everywhere else. I still wanted to ask the panel “Given that 30-50% of kids fail high school algebra, how do you expect them to learn computer science?”, but by the time I had finished finding that statistic they had moved on to a different topic.

4. The cutting edge in AI goal alignment research is the idea of inverse reinforcement learning. Normal reinforcement learning is when you start with some value function (for example, “I want something that hits the target”) and use reinforcement to translate that into behavior (eg reinforcing things that come close to the target until the system learns to hit the target). Inverse reinforcement learning is when you start by looking at behavior and use it to determine some value function (for example, “that program keeps hitting that spot over there, I bet it’s targeting it for some reason”).

Since we can’t explain human ethics very clearly, maybe it would be easier to tell an inverse reinforcement learner to watch the stuff humans do and try to figure out what values we’re working off of – one obvious problem being that our values predict our actions much less than we might wish. Presumably this is solvable if we assume that our moral statements are also behavior worth learning from.

A more complicated problem: humans don’t have utility functions, and an AI that assumes we do might come up with some sort of monstrosity that predicts human behavior really well while not fitting our idea of morality at all. Formalizing what exactly humans do have and what exactly it means to approximate that thing might turn out to be an important problem here.

5. Related: a whole bunch of problems go away if AIs, instead of receiving rewards based on the state of the world, treat the reward signal as information about a reward function which they only imperfectly understand. For example, suppose an AI wants to maximize “human values”, but knows that it doesn’t really understand human values very well. Such an AI might try to learn things, and if the expected reward was high enough it might try to take actions in the world. But it wouldn’t (contra Omohundro) naturally resist being turned off, since it might believe the human turning it off understood human values better than it did and had some human-value-compliant reason for wanting it gone. This sort of AI also might not wirehead – it would have no reason to think that wireheading was the best way to learn about and fulfill human values.

The technical people at the conference seemed to think this idea of uncertainty about reward was technically possible, but would require a ground-up reimagining of reinforcement learning. If true, it would be a perfect example of what Nick Bostrom et al have been trying to convince people of since forever: there are good ideas to mitigate AI risk, but they have to be studied early so that they can be incorporated into the field early on.

6. AlphaGo has gotten much better since beating Lee Sedol and its creators are now trying to understand the idea of truly optimal play. I would have expected Go players to be pretty pissed about being made obsolete, but in fact they think of Go as a form of art and are awed and delighted to see it performed at superhuman levels.

More interesting for the rest of us, AlphaGo is playing moves and styles that all human masters had dismissed as stupid centuries ago. Human champion Ke Jie said that:

After humanity spent thousands of years improving our tactics, computers tell us that humans are completely wrong. I would go as far as to say not a single human has touched the edge of the truth of Go.

A couple of people talked about how the quest for “optimal Go” wasn’t just about one game, but about grading human communities. Here we have this group of brilliant people who have been competing against each other for centuries, gradually refining their techniques. Did they come pretty close to doing as well as merely human minds could manage? Or did non-intellectual factors – politics, conformity, getting trapped at local maxima – cause them to ignore big parts of possibility-space? Right now it’s very preliminarily looking like the latter, which would be a really interesting result – especially if it gets replicated once AIs take over other human fields.

One Go master said that he would have “slapped” a student for playing a strategy AlphaGo won with. Might we one day be able to do a play-by-play of Go history, finding out where human strategists went wrong, which avenues they closed unnecessarily, and what institutions and thought processes were most likely to tend towards the optimal play AlphaGo has determined? If so, maybe we could have have twenty or thirty years to apply the knowledge gained to our own fields before AIs take over those too.

7. People For The Ethical Treatment Of Reinforcement Learners got a couple of shout-outs, for some reason. One reinforcement learning expert pointed out that the problem was trivial, because of a theorem that program behavior wouldn’t be affected by global shifts in reinforcement levels (ie instead of going from +10 to -10, go from +30 to +10). I’m not sure if I’m understanding this right, or if this kind of trick would affect a program’s conscious experiences, or if anyone involved in this discussion is serious.

8. One theme that kept coming up was that most modern machine learning algorithms aren’t “transparent” – they can’t give reasons for their choices, and it’s difficult for humans to read them off of the connection weights that form their “brains”. This becomes especially awkward if you’re using the AI for something important. Imagine a future inmate asking why he was denied parole, and the answer being “nobody knows and it’s impossible to find out even in principle”. Even if the AI involved were generally accurate and could predict recidivism at superhuman levels, that’s a hard pill to swallow.

(DeepMind employs a Go master to help explain AlphaGo’s decisions back to its own programmers, which is probably a metaphor for something)

This problem scales with the size of the AI; a superintelligence whose decision-making process is completely opaque sounds pretty scary. This is the “treacherous turn” again; you can train an AI to learn human values, and you can observe it doing something that looks like following human values, but you can never “reach inside” and see what it’s “really thinking”. This could be pretty bad if what it’s really thinking is “I will lull the humans into a false sense of complacency until they give me more power”. There seem to be various teams working on the issue.

But I’m also interested in what it says about us. Are the neurons in our brain some kind of uniquely readable agent that is for some reason transparent to itself in a way other networks aren’t? Or should we follow Nisbett and Wilson in saying that our own brains are an impenetrable mass of edge weights just like everything else, and we’re forced to guess at the reasons motivating our own cognitive processes?

9. One discipline I shouldn’t have been so surprised to see represented at the multidisciplinary conference was politics. A lot of the political scientists and lawyers there focused on autonomous weapons, but some were thinking about AI arms races. If anyone gets close to superintelligence, we want to give them time to test it for safety before releasing it into the wild. But if two competing teams are equally close and there’s a big first-mover advantage (for example, first mover takes over the world), then both groups will probably skip the safety testing.

On an intranational level, this suggests a need for regulation; on an international one, it suggests a need for cooperation. The Asilomar attendees were mostly Americans and Europeans, and some of them were pretty well-connected in their respective governments. But we realized we didn’t have the same kind of contacts in the Chinese and Russian AI communities, which might help if we needed some kind of grassroots effort to defuse an AI arms race before it started. If anyone here is a Chinese or Russian AI scientist, or has contacts with Chinese or Russian AI scientists, please let me know and I can direct you to the appropriate people.

10. In the end we debated some principles to be added into a framework that would form a basis for creating a guideline to lay out a vision for ethical AI. Most of these were generic platitudes, like “we believe the benefits of AI should go to everybody”. There was a lunch session where we were supposed to discuss these and maybe change the wording and decide which ones we did and didn’t support.

There are lots of studies in psychology and neuroscience about what people’s senses do when presented with inadequate stimuli, like in a sensory deprivation tank. Usually they go haywire and hallucinate random things. I was reminded of this as I watched a bunch of geniuses debate generic platitudes. It was hilarious. I got the lowdown from a couple of friends who were all sitting at different tables, and everyone responded in different ways, from rolling their eyes at the project to getting really emotionally invested in it. My own table was moderated by a Harvard philosophy professor who obviously deserved it. He clearly and fluidly explained the moral principles behind each of the suggestions, encouraged us to debate them reasonably, and then led us to what seemed in hindsight the obviously correct answer. I had already read one of his books but I am planning to order more. Meanwhile according to my girlfriend other tables did everything short of come to blows.

Overall it was an amazing experience and many thanks to the Future of Life Institute for putting on the conference and inviting me. Some pictures (not taken by me) below: