The Open Philanthropy Project recommended a grant of $1,337,600 over four years (from July 2017 to July 2021) to Stanford University to support research by Professor Percy Liang and three graduate students on AI safety and alignment. The funds will be split approximately evenly across the four years (i.e. roughly $320,000 to $350,000 per year).

This is one of a number of grants we plan to recommend to support work by top AI researchers on AI safety and alignment issues, with the goals of a) building a pipeline for younger researchers, b) making progress on technical problems, and c) further establishing AI safety research as a field.

Background

This grant falls within our work on potential risks from advanced artificial intelligence, one of our focus areas within global catastrophic risks.

We previously recommended a $25,000 planning grant to Professor Liang in March 2017 to enable him to spend substantial time engaging with our process to determine whether to proceed with this larger funding recommendation.

About the grant

Proposed activities

We asked Professor Liang to submit a research description of problems he currently plans to work on, why he finds these problems important, and how he thinks he might make progress. In broad terms, Professor Liang initially plans to focus on a subset of the following topics:

Robustness against adversarial attacks on machine learning (ML) systems

Verification of the implementation of ML systems

“Knowing what you don’t know,” i.e. calibrated / uncertainty-aware ML

“Natural language” supervision, i.e. using compositional, abstract, underspecified languages (e.g. English or some engineered language) in dialogue to specify rewards and goals

Professor Liang thinks that it is possible to make empirically verifiable progress on these topics, and that the general principles developed in this way are reasonably likely to be relevant for addressing global catastrophic risks from advanced AI (though this latter impact will be much harder to evaluate). Professor Liang also thinks that work on these topics will clarify how existing reliable ML research relates to potential risks from advanced AI, which could increase the number and quality of researchers working on potential risks from advanced AI.

Professor Liang plans to spend about 20% of his overall research time on the agenda supported by this grant. This grant will also support about three graduate students.

Risks and reservations

We have some disagreements with Professor Liang about which problems and approaches in this area are most important and promising. These disagreements depend largely on differing intuitions about which research directions are likely to be most promising, and we can easily imagine later agreeing with Professor Liang on these issues. In one past instance, Daniel Dewey (our Program Officer for Potential Risks from Advanced Artificial Intelligence, “Daniel” throughout this page) was persuaded by Professor Liang of the likely usefulness of a line of research about which he had initially been skeptical.

Rather than trying to resolve our disagreements and settle on a fixed research agenda for this grant now, we expect it to be more valuable to keep Professor Liang’s potential research directions relatively open, and to facilitate discussion about these issues between Professor Liang, our technical advisors, other Open Philanthropy grantees, and other AI research organizations in order to move toward resolving our disagreements over time.

Overall, we are highly confident that Professor Liang understands and shares our interests and values in this space.

Plans for learning and follow-up

Key questions for follow-up

How is the research going overall?

Has Professor Liang’s team formed any new perspectives on research problems they investigate?

Have there been any updates to the team’s research priorities?

Are there other ways in which Open Philanthropy could help?

Follow-up expectations

We plan to check in with Professor Liang roughly every six months for the duration of the grant to get in-depth updates on his results so far and plans for the future. We may also have less comprehensive, more informal discussions with Professor Liang roughly once a month (if both we and Professor Liang have time and think it would be beneficial).

At the end of the grant period, we will decide whether to renew our support based on our technical advisors’ evaluation of Professor Liang’s work so far, his proposed next steps, and our assessment of how well his research program has served as a pipeline for students entering the field. We are optimistic about the chances of renewing our support. We think the most likely reason we might choose not to renew would be if Professor Liang decides that AI alignment research isn’t a good fit for him or for his students.

Our process

Two of Open Philanthropy’s technical advisors reviewed Professor Liang’s research proposal. Both felt largely positive about the proposed research directions and recommended to Daniel that Open Philanthropy make this grant, despite some disagreements with Professor Liang (and with each other) about the likely value of some specific components of the proposal (see above).