Previous Intro: Formal Metaethics and Metasemantics for AI Alignment



I’m nearing the completion of a hopefully much more readable version of the ideas previously released as set-theoretic code. This takes the form of a detailed outline, currently in WorkFlowy, in which you can easily expand/collapse subsections which elaborate on their parents’ content. You can find the current draft here.



Although it’s not polished, I’m releasing it in preparation for a Q&A I’ll be holding at the University of California Berkeley AI and Philosophy working group, which I hope you will attend. I’ll likely make some brief introductory remarks but reserve most of the time for answering questions. The working group is part of the UC Berkeley Social Science Matrix and will be held at:



Barrows Hall, 8th Floor, Mezzanine Level

Wed, Dec 4th 12:30-2:30pm

(only the first hour is reserved for this Q&A)

Here I’ve reproduced just the first few levels of the outline. Click here to see their elaboration (currently ~4,800 words).



Given mathematical models of the world and the adult human brains in it, an ethical goal function for AI can be constructed by applying a social welfare function to the set of extensional rational utility functions of the brains.

The mathematical model of a world or brain is to be given as a causal Markov model.



A causal Markov model is a convenient model for generating a causal model.





The notion of a causal model is taken directly from Judea Pearl.







A causal model is composed of:







A causal Markov model is composed of:







A causal Markov model (cmm) generates a causal model (cm) as follows:



A brain’s rational utility function is the utility function that would be arrived at by the brain’s decision algorithm if it were to make more optimal decisions while avoiding unrelated distortions of value.



A brain’s decision algorithm is the one that best satisfies these desiderata:





First, it must take the mathematical form of a decision algorithm, which is a tuple composed of:







Next, there must be an implementation function which maps brain states to decision states such that these two routes from a brain state to a decision event always arrive at the same result:







It achieves a high rate of compression of the brain’s causal transition function.







It is probabilistically coherent, including with its represented causal models.







It is instrumentally rational in both its first-order and higher-order utility functions.







It is ambitious, trying to explain as much as possible with the decision algorithm.





The final formulation specifying the rational utility function gets rather complicated but we can build up to it with a couple initial approximations:





Final specification: Simulate all possible continuations of an agent and apply a social welfare function to their utility functions while weighting them by optimality of prescriptions, agential identity and likelihood.





The advantages of this metaethics include:



Extension: The rational utility function of a brain above is couched in terms of the brain’s own represented expressions, but for interpersonal comparisons, we first cash them out extensionally in terms of their referents in the world.



The social welfare function might be thought of as choosing a center of gravity between the extensional rational utility functions.



The details above form an initial prototype.

Read the full version here.