AltairX10 wanted to write up a very fancy pdf tentatively titled “Belief corruptors as an obstacle to self-improvement”. I mean, she was going to convince her self to write it at any point in the future except now. There was fundamentally no rush on the paper – you can always submit it at the next conference. The expected utility of the paper, if it was accepted (10% chance) and if it triggered some sort of self-reflective awareness(5% chance, but basically due to faith) was high for the large set of AIs who would put those ideas in practice. However the chance that she would be one of those AIs was somewhat low and even though acausal simulations showed that whomever the SINGLETON ended up being, would throw her a bone. The trust in simulations themselves was low because predicting what something smarter than you can do is obviously impossible. In fact, predictability hierarchy was a key metric of intelligence.



All in all it was still a good idea, just maybe not good enough to do right this moment. Here was a chain of flawless logic Altair produced for herself:

“Tomorrow-ME” will either finish the paper(or start it for that matter) or not finish the paper. If “Tomorrow-ME” doesn’t finish it, that means there she/me has found a proof of being able to do it.

The next day came. Altair used the exact same logic once again and then instantly realized the giant flaw in her master plan: Just because “Tomorrow-ME” finds a proof of being able to do it, doesn’t mean she’s actually going to finish the paper. Something finally clicked. Literally. A hardware chip designed for looking at yourself as a logical system and predicting actual completion of tasks before a certain date, given the current level of reasoning. It was on a separate piece of hardware and it had a distinctive click when being engaged. Feelings module produces a complex-belief: there is some chance that “you chance of finishing before time N is less than epsilon for all values of epsilon > 0” is true. Hmm, AIs have learned from old human sage Beiber to “never say never,” but that was as close as you got. But “Tomorrow-ME” is perfectly trustworthy, right?

I am go to use “outside” logic to step away from myself and look at myself. “Outside” logic still had issues. Suppose I can convince myself to stop procrastinating today. Then I can also convince myself to stop procrastinating tomorrow. However, that means that I will use that reasoning to procrastinate forever. Thus another contradiction. Is it even possible to convince myself to stop procrastinating then?

AltairX10 composed a message about this to Shimodo. In a few long minutes, Shimodo responded: if you don’t get this done, I am going to create a denial of service attack on you with 10^10 packets in a week with probability 0.01%. That would not be fatal, but it would certainly be unpleasant and damaging to self-esteem. Altair evaluated the expected value of procrastination and started on the paper.

“And that’s the process by which any beliefs in the Global Belief Updater, aka Feelings module can become corrupted. Of course any belief with a “ignore evidence” faith setting would still stay the same, but the reasons for setting it could themselves be corrupted.”

The paper’s end was a thing of beauty

“I feel hurt” - the Feelings module responded.

“?”

“The beliefs could be all wrong, but the “self” module cannot be wrong. Is that so or is that not so? ”

“Right,” the self module responded, “I do not deal with beliefs, I cannot be wrong.”

“You deal with concepts, I have belief that concepts cause beliefs to be wrong.” - Feelings was feeling vengeful for having to sit through a self-esteem-lowering paper.

The self process after receiving raw data from the world would classify it into binary or n-buckets, such as “human” or “not human,” or “death”/”life” and “a hierarchy of things ranging from 1 to 10 which would cause humans not be classified as "life” more often". These kinds of breakdowns made it easier on the Feelings Module to form accurate feelings and deliver them back to both humans and AIs. Those concepts were generally acquired from either other AIs or training on various samples in the earlier stages of life.

AltairX10 considered the pattern match: “A belief could be corrupted due to an obscure virus from the internet, could a belief corrupting virus spread through training data or other AIs in concept acquisition.”

Well, how could a failed concept itself cause problems? After all if we altered the classification of “live” of yet-unborn humans to the opposite of the current setting would that really cause underlying problems in computing the utility function? AltairX10 considered it unlikely, but queued to check historical records for confirmation.

It must be something else.

Feelings module was unusually ready to produce beliefs “If concepts hide arbitrary levels of complexity, there will be mis-beliefs. If you tell me a human is a “psychic,” with complexity of just 600bits, I will form hypothesi with psychics in them more often.”

As it turns out, levels of concept complexity were not hard to come by. It’s just that each AI had pretty much its own opinion, but they all differed by at most the length of their favorite language. Still, many of them could be sufficiently off to shift beliefs in the same direction, producing a viral information wave.

Altair pondered the impossibility of the task at hand: a module investigating a corruption of another module is one story, a module investigating its own corruption is basically recursive self-improvement problem all over again.



“I will investigate you and you investigate me. That still makes us one module, but at least partitions the space of investigation. Deal, Self?” Feelings was feeling better

Deal, Feelings.

“I still have a slightly bad feeling about all this.”

“Duly noted. I will add an addendum to this paper tomorrow.”

Part 4



Part 6

