I have decided to not release my model, and explain why below. I have also written a small addendum answering some questions about my model and its quality. I would like to thank every single person that engaged with me and helped me come to these conclusions. I have learned so much and could never have done it on my own. Thank you.

A few days ago, I trained a 1.5bn parameter language model, with the goal of replicating GPT2 (though subsequent analysis showed that although my model is identical in most technical details, its performance characteristics are significantly worse) and wrote an extended essay on why I wanted to release it to the public. (I find it amusing that some have called it a “manifesto”) Since then, I have received a huge amount of thoughtful, relevant arguments and information, and a decent amount of much less helpful kinds of feedback as well, that have made me rethink my arguments and views very carefully. Instead of replying to all the points and comments as they arose I wanted to take my time and formulate my thoughts right. Not communicating clearly enough or leaving too much up for interpretation can be as harmful as not communicating at all.

I was presented with many arguments that have made me reevaluate and weaken my beliefs in some of the arguments I presented in my last essay. There were also many, maybe even a majority of, people in full support of me. Overall I still stand by most of what I said. There are some things I should address in that essay, and some of the arguments need to be revised and expanded upon. But this is not the point of this essay. Maybe I’ll write more on that topic in the future.

Because, despite still thinking that all those arguments were mostly right,

I was still wrong.

And I want to tell you the story of why I was wrong, how I realized it, and what we all can learn from my strange little journey.

The hacker enters the world

To explain how this all happened, and what we can learn from it, I think it’s important to learn a little bit more about my personality and with what kind of attitude and world model I came into this situation.

When I started this, I was consumed by the usual curious hacker Modus Operandi: “This is within my reach and it’s cool, I’m doing it!”. In the back of my mind I also always had this thought that I wouldn’t actually succeed, that surely there was some misunderstanding with Google and they wouldn’t actually give me the hardware, or I’d be unable to write the code, or something else. But as things started to really come together, and I realized this was actually happening, I took a step back and started to think.

To be rational, you have to understand that you’re not rational. No human is or could be perfectly rational. That would take, literally or figuratively, infinite computing power to achieve. And so the best we can do is realize our biases and correct for them as good as we can. There are some human biases we all share, like confirmation bias, but some are personality specific. Some people are too optimistic, some people too pessimistic etc. I understood this and had already identified one of my greatest personal biases: My curiosity. This is the reason why I thought writing the Curious Hacker section in my previous post was so important. It was me grappling with this personality bias I knew I had and needed to correct for.

But I have other personal biases that I now believe I may have not corrected enough for. I have a depressive/paranoid streak, and tend to assume the worst until proven otherwise.

I think this is important to know in order to understand why I acted the way I did in the early times of this situation. At the time I made my first twitter post, it seemed completely plausible in my mind that no one, OpenAI or otherwise, would care or even notice me. Or, even worse, that they would antagonize me.

This was completely and utterly false, and this is the first important lesson I have learned. The people at OpenAI and the wider AI community have been incredibly helpful, open and thoughtful in their responses to me. I owe to them everything I have learned. OpenAI reached out to me almost immediately to talk and they were nothing but respectful and understanding. The same applies to Buck Shlegeris from MIRI and many other thoughtful and open people, and I am truly thankful for their help.

I expected a hostile world of skepticism and competition, and there was some of that to be sure. But overall, the AI community was open in ways I did not anticipate. In my mind, I couldn’t imagine people from OpenAI, or MIRI, or anywhere else actually wanting to talk to me. But I found that was wrong.

So this is the first lesson: The world of AI is full of smart, good natured and open people that I shouldn’t be afraid of, and neither should you.

The hacker meets the mentors

After making it publicly known what I had done, I was quickly approached by a range of smart people with good arguments. Many of them helped me update my beliefs in light of new evidence, and I’d love to thank each of them personally here but that would take us too far. But be assured, I read every single comment, email and message I received, even if I wasn’t able to respond to all of them.

The day after my announcement, I got to talk to Jack Clark, Alec Radford and Jeff Wu from OpenAI. We had a nice hour long discussion, where I explained where I was coming from, and they helped me to refine my beliefs. They didn’t come in accusing me in any way, they were very clear in saying they wanted to help me gain more important insight into the wider situation. For this open and respectful attitude I will always be grateful. Large entities like OpenAI often seem like behemoths to outsiders, but it was during this chat that it really hit me that they were people just like me, and curious hackers to boot as well.

I quickly began to understand nuances of the situation I wasn’t aware of. OpenAI had a lot more internal discussion than their blog post made it seem. And I found this reassuring. Jack in particular also gave me a lot of valuable information about the possible dangers of the model, and a bit of insight into the workings of governments and intelligence agencies.

After our discussion, I had a lot to think about. But I still wasn’t really convinced to not release. Even some people inside OpenAI were still discussing the not-release policy. So while I definitely had things to consider, I was still mostly set on releasing.

And then, I talked to Buck.

The hacker comes tumbling down

I had no idea what to expect going into my conversation with Buck. From what I have gathered, people generally think that either MIRI researchers are among the smartest people on the planet, or absolutely utterly insane, with little middle ground. I am generally of the former camp, my respect for MIRI is immeasurable and they have always been something I’ve aspired towards.

Talking to Buck was immediately a pleasant experience. He had a kind of relaxed and irreverent way of speaking that I found appealing. He allowed me to explain where my current thoughts were, what I had discussed with OpenAI etc, and then began to make his case.

I was prepared, all my arguments built up into this nice fort of step-by-step logic. I was ready for him to test my arguments, present new information about some of the assumptions I was making, and I was happy and ready to incorporate that new information. I even had a new suite of counterarguments to the Unilateralist’s Curse, which I was sure he would mention.

But instead of attacking my fort, he pulled the rug right out from under my feet and everything came tumbling down.

His point was simple: The model may or may not be dangerous, we don’t know for sure. I’ve heard incredibly convincing arguments that go both ways. But that does not matter.

Because this isn’t just about GPT2. What matters is that at some point in the future, someone will create something truly dangerous and there need to be commonly accepted safety norms before that happens.

And this is also the view OpenAI was trying to communicate in their initial blog post, but somehow the message was lost in translation (at least to me, and a significant amount of other people as well it seems).

AI has enormous potential. Sometime in the future we will have reached a point where the consequences of our research are beyond what we can discover in a one-week evaluation cycle. And given my recent experiences with GPT2, we might already be there. The more complex and powerful our technology becomes, the more time we should be willing to spend in evaluating its consequences. And if we have doubts about safety, we should default to caution.

We tend to live in an ever accelerating world. Both the industrial and academic R&D cycles have grown only faster over the decades. Everyone wants “the next big thing” as fast as possible. And with the way our culture is now, it can be hard to resist the pressures to adapt to this accelerating pace. Your career can depend on being the first to publish a result, as can your market share.

We as a community and society need to combat this trend, and create a healthy cultural environment that allows researchers to take their time. They shouldn’t have to fear repercussions or ridicule for delaying release. Postponing a release because of added evaluation should be the norm rather than the exception. We need to make it commonly accepted that we as a community respect others’ safety concerns and don’t penalize them for having such concerns, even if they ultimately turn out to be wrong. If we don’t do this, it will be a race to the bottom in terms of safety precautions.

The hacker learns

And it was here that I understood the implications of what I was doing. I acted the way I did because I genuinely thought at the time that it was the greatest good I could do to advance AI safety. But I was wrong. Whether or not GPT2 was dangerous, or if my model was even good, it was setting a social precedent.

I want to be clear here that I’m not operating under some false pretense of my weight here. I know it’s perfectly possible that this will all be forgotten in two weeks and no one will ever care about what I did again. But there is a non-zero chance that people won’t forget. So I have a chance to share the valuable lessons I have learned with others that are following this situation.

We shouldn’t be angry with OpenAI for what they did. We should applaud them for making a point before it becomes a true problem. Prophylaxis is much better than treatment. I still disagree with some of the things OpenAI did and how they communicated them, but I now understand that sending a message that it is ok, even celebrated, for a lone individual to unilaterally go against reasonable safety concerns of other researchers is not a good message to send. I want to support OpenAI’s message. So, while it might be a small, mostly symbolic gesture, I will not be releasing my model.

Some day, someone like me may be in a situation just like mine, but it won’t be GPT2. It might be something much, much more dangerous. And that is the person I am trying to talk to here.

Hard moral choices come unannounced. Who could have predicted I’d one day suddenly get access to all this hardware? Who’d predict that I of all people would be the first one to publicly do something like this? I didn’t expect this, I was just some dude.

What I hope for most out of all of this, is that someday we remember it as a happy coincidence that I was the first. Because I learned a lesson. I learned that caution was the right choice, both for OpenAI and for me. If I had just released this right away (and I really wanted to), not only may or may not have the model been dangerous, but it could have sent a terrible message that security concerns can be dismissed by default. I couldn’t have learned this lesson if I hadn’t waited and opened myself up to others changing my mind. I was as careful as I could have been, and I still almost did something stupid.

What I have learned is that we need to trust. The AI community is full of smart, good hearted people that truly want the best for the world. Are there exceptions? Sure. But for the most part, that’s how the world is. And as my little contribution to this, I pledge to remain available indefinitely at my email thecurioushacker@outlook.com. If you feel uncomfortable about contacting “big” people like OpenAI, first of all, you shouldn’t be. OpenAI are nice and open people, and they’ve personally reassured me that they’re always happy to talk. But if you still feel hesitant, talk to me. I’m just a guy, just a curious hacker like you. I’ll talk, I’ll help. We can work together.

We as a community of researchers and humans need to trust one another and respect when one of us has safety concerns. We need to extend understanding and offer help, rather than get caught in a race to the bottom. And this isn’t easy, because we’re curious hackers. Doing cool things fast is what we do.

But not only should we be better, we can be better.