When OpenAI's Universe came out, and various articles suggested games even like Grand Theft Auto 5 were ready to go, I was very excited in checking it out. Then, however, somewhat mysteriously, GTA V was completely removed from Universe with no explanation whatsoever.

I gave up and forgot about it for a while, but the idea still seemed exciting. Finally, I decided to put some more mental energy into it, and questioned whether or not I even needed Open AI at all for a task like this. Sure, it's nice for simpler games that can be run en masse, so you can train thousands of iterations in moments, but, with something like GTA V, this is really not going to be much of an option anyway.

Just in case it's not totally obvious, why GTA V? At least for me, Grand Theft Auto 5 is a great environment to practice in for a variety of reasons. It's an open world with endless things you can do, but let's consider even just a simple one: Self-driving cars. With GTA V, we can use mods to control the time of day, weather, traffic, speeds, what happens when we crash...all kinds of things (mainly using mods, but this isn't absolutely required). It's just a completely customize-able environment.

Some of my tutorials are planned fully, others sort of, and some not at all. This is not planned at all, and is going to be me working through this problem. I realize not everyone has Grand Theft Auto 5, but it is my expectation that you have SOME similar games to do the tasks we're going to be working on, and that this method can be done on a variety of games. Because you may have to translate some things and tweak to get things working on your end, this is probably not going to be a beginner-friendly series.

My initial goal is to just create a sort of self-driving car. Any game with lanes and cars should be just fine for you to follow along. The method I will use to access the game should be do-able on almost any game. A simpler game will likely be much more simple of a task too. Things like sun glare in GTA V will make computer vision only much more challenging, but also more realistic.

I may also try other games with this method, since I also think we can teach an AI to play games by simply showing it how to play for a bit, using a Convolutional Neural Network on that information, and then letting the AI poke around.

Here are my initial thoughts:

Despite not having a pre-packaged solution already with Python:

We can surely access frames from the screen. We can mimic key-presses (sendkeys, pyautogui...and probably many other options).

This is already enough for more rudimentary tasks, but what about for something like deep learning? Really the only extra thing we might want is something that can also log various events from the game world. That said, since most games are played almost completely visually, we can handle for that already, and we can also track mouse position and key presses, allowing us to engage in deep learning.

I doubt this will be sunshine and rainbows, but I think it's at least possible, and will make for a great, or at least interesting, project. My main concern is processing everything fast enough, but I think we can do it, and it's at least worth a shot.

So this is quite a large project, if we don't break it down, and take some baby-steps, we're going to be overwhelmed. The way I see it, we need to try to do the bare minimum first. Thus, the initial goals are: