I was working on this run for multiple months already and started to lose interest in the concept, when AGDQ 2017 came around, and I saw the amazing TASBot block , which showed a similar concept, but with a different focus, more suitable for a live event than a submission. After a brief defeated I-was-beaten-to-the-punch feeling, I realized that it's now or never, get this done. So thanks to everyone involved in that TASBot block, without it I may have dragged this out for many more months.

I've always wanted to explore arbitrary code execution (ACE), but it's not easy to come up with something meaningful to do with it. Since you have the potential to do literally anything, you are held to a high standard, so it doesn't seem like you're wasting people's time sitting through the setup for what feels like an unsatisfying effect, even though the run is technically excellent ( unfortunate example ).

This is also an exploration of the limits of the Gameboy hardware. I try to do things that have not been done before on a Gameboy, things that it was clearly not designed for, and things that seem impossible at first glance. I'm not only exploiting the game, but the very hardware it runs on.

The aim of this run is to show more aspects of what you can do with ACE. I feel it is often misunderstood as "you can skip to the end", or "you can cause crazy effects in the game", the concept of being able to to literally anything is hard to grasp, perpetuated by the fact that in most applications what you want to do is fairly limited by the goal you have set within the game.

This realization was the key to this run, as it opened many more possibilities: The source of the A/V doesn't need to be another game. It could be a hack of a game. Or the mash-up of different games. Or from a different system. Or literally arbitrary A/V. This made this run become not about running a game in another game, but more about pushing the limits of the Gameboy hardware and see what is possible.

But this is going to be a predefined input file anyway, so I don't need to run all of the actual code, I just need to run equivalent code that produces the same audio-visual effect as the original. At first I thought about streamlining the original code by cutting not needed code paths and priming it so that it produced the predetermined results I wanted (basically pre-computing the emulation and only running the resulting instructions), but I came up with an even more radical idea: I realized that all the instructions that really mattered are those that put tiles on the screen or played some sounds. So all I need to do is emulate the actual audio-visual output of the game with the right timing, without any internal game state.

But there were more problems. As a bit of background, GB game cartridges are not just ROM storage for the game, they also have their own controller on them and additional hardware pieces that vary by game. For example most games have writable storage built in to hold saved games or high scores. Gen II Pokémon games also have a battery-backed real-time clock in their cartridge, which is used to track real time in game. Gen I games don't have that (they have an entirely different controller in fact), which is a serious problem, meaning that Gen II code can't just be run on Gen I cartridges and work, even if we had a way to get the code on there somehow.

Specifially, I looked at running Gen II Pokémon games inside a Gen I game. The reasoning was that I already had lots of experience with the Gameboy system and the games in particular from my previous runs (or so I thought). Gen I has easy and quick ACE setups, so you can get to the meat quickly, instead of wasting the majority of the time just for the setup. I quickly realized that Yellow will be my base game of choice, because it is the only one with Gameboy Color support, which is essential for Gen II.

This run was began with a simple idea, to play another game within a game, using ACE exploits. This has been done before with toy examples, but I was aiming for a full-fledged existing game, in a way that is indistinguishable from the real thing. Obviously, it should be two games from the same system, you don't want to write emulators in ACE, plus running a powerful system (e.g. Wii) to emulate a much less powerful one (e.g. GB/NES) seems cheap.

The base game is not all that important, since the main point of this run is to showcase ACE. I chose Pokémon Yellow, because it has a very fast ACE setup, and it is has GBC capabilites, but any other game with an ACE exploit would work just fine.

When running your own code, being able to do as many inputs per frame as you want can be exploited to vastly increase the data throughput when injecting data using the joypad. Since the joypad is the only source of data you have, this dramatically speeds up the setup times, and allows for the real-time playback seen in this movie, with thousands of inputs each frame.

An even better solution, which is used in lsnes, is to allow a different input every time the game polls the joypad. This way you ensure that you can definitely do any sequence of inputs that are possible on an actual console. It's still kind of awkward though, because it has an arbitrary concept of "frame" baked into the input file, so while you can define a different input for each poll of the game, you still need to know which arbitrary frame this input occurs at, instead of just having a list of inputs, once per poll, as they occur. When doing many inputs per frame, this becomes a problem, because you need to pretty much know exactly to the cycle where the input frame boundary will be, in order to assign them to the correct input frame.

It's actually not good enough since the concepts of what a "frame" is is arbitrary too, and the game loop frames and the input frames often don't align properly, so you can miss out on inputs you could otherwise make on a real console. It would need to be at least twice per "frame" to work reliably, you can kind of see it as an unusual application of the Nyquist–Shannon sampling theorem (with no actual connection to it), with the expected maximum frequency of one poll per frame.

This run uses lsnes, because unlike the other preferred emulator BizHawk, it supports sub-frame inputs. Games can poll the joypad inputs at arbitrary times, as frequently as they like. However, most emulators arbitrarily limit your input capabilites to one input per frame, meaning that every time the game polls the inputs in that frame, this same one input is used. This is often seen as "good enough", since most game loops run only once per frame as well.

The first stage is 9 bytes long, and written using item manipulation which costs multiple seconds per byte. The second stage is 13 bytes long, and written using the first stage at one byte per frame. After that, the second stage can write many bytes each frame, effectively making the rest of the rest of the setup instantaneous.

It first calls GBFadeOutToWhite from Yellow's original code, which does a smooth screen transition to white. This is not at all necessary for the expolit to work, but helps with providing a smooth transition between the game and the ACE-controlled scenes that follow. After the transition it disables the screen (this is important to be able to access certain memory areas and be able to control the exact frame timing), and puts the system into double-speed mode. Double-speed mode is feature introduced in the GBC that increases the clock speed from 4MHz to 8MHz, effectively doubling the amount of computation you can do in the same amount of time (there are some caveats).

The third stage has no concern for its size anymore since the second stage can write it very quickly, so it is focused on finishing the setup and putting the right bits into the right places for the payload to run.

The main advantage of this stage over the first one is that it is able to run many times each frame, so it can potentially write more than 1000 bytes each frame, not just 1.

The final "xor e" is only used for the exit condition. Zero is an important byte to be able to write and therefore a bad exit condition, and xoring with $5d makes it so that $5d is the exit condition instead, which happens to be an expendable value.

In a Gameboy, the inputs are not all read at once, you can only read half of the inputs at a time, either the directional keys or the buttons, 4 bit each. The other half of the byte you receive is static garbage data. In order to read a full byte of data, the joypad is therefore polled twice, and the results are combined using xor, which ensures that for each byte you want to produce there is a combination of two inputs that does it.

The inital ACE setup used in this run is very similar to FractalFusion's Pi movie , with only minor improvements. It spells out the same code, but uses menuing improvements to get there a bit faster (e.g. using swapping of 0-stacks as a faster way of throwing away items). The code effectively allows to write one byte per frame starting at $d350, and after each byte everything is immediately executed. See FractalFusion's post for detailed information on how it works.

So why not only use one and load the final payload right away? You are very limited at first in what you can manipulate, and it can take a long time, so you're often better off only creating a very simple program that is slightly more powerful than the current one, but can be built quickly, and let it do the rest of the work at a much faster speed.

How Gameboy graphics work

Note: This is only an overview over the relevant parts of how a Gameboy works, a more in-depth description can be found in the This is only an overview over the relevant parts of how a Gameboy works, a more in-depth description can be found in the Pan Docs and the Gameboy CPU Manual , which were instrumental in figuring all of this out. All graphics are based on 8x8 pixel tiles with 2bpp depth (i.e. 4 colors). These tiles can be rendered on the screen in three different ways: Background, Window and Sprites. The Background is a 32x32 tile grid (actually two of them that you can choose one of to use) that can be smoothly scrolled around on, and is often used for background images. The Window uses the same tile grids as be background, but is not scrollable and rendered over the background. They are often used for menus, dialogs, splashscreens, etc. Lastly, the Sprites are either single (8x8) or double (8x16) tiles, that can be places anywhere on the screen and can be semi-transparent. They are used for anything that moves on the background. In addition to tiles, there are color palettes, which define which of the 4 colors of a tile corresponds to which RGB color (15bit color depth). Palettes are not bound to individual tiles, but to the place in the background, window or sprite where they are used, so a single tile can be used with different palettes in different places. All graphics are based on 8x8 pixel tiles with 2bpp depth (i.e. 4 colors). These tiles can be rendered on the screen in three different ways: Background, Window and Sprites. The Background is a 32x32 tile grid (actually two of them that you can choose one of to use) that can be smoothly scrolled around on, and is often used for background images. The Window uses the same tile grids as be background, but is not scrollable and rendered over the background. They are often used for menus, dialogs, splashscreens, etc. Lastly, the Sprites are either single (8x8) or double (8x16) tiles, that can be places anywhere on the screen and can be semi-transparent. They are used for anything that moves on the background. In addition to tiles, there are color palettes, which define which of the 4 colors of a tile corresponds to which RGB color (15bit color depth). Palettes are not bound to individual tiles, but to the place in the background, window or sprite where they are used, so a single tile can be used with different palettes in different places. The Gameboy renders its screen line by line, one at a time. Each line is largely treated independently from the others. The screen has 144 lines, with 160 pixels each. The time spent on each line is constant, exactly 912 cycles each (All listed cycle counts assume double-speed mode, single speed cycles counts are halved). These 912 cycles are split up into 3 phases, called Modes. The first phase is Mode 2, in which the LCD controller searches through the sprites to render, and which lasts for 160 cycles. It is followed by Mode 3, in which the data to render is sent to the LCD controller, and which can take anywhere from ~344 to ~592 cycles, depending on a lot of factors, like the number of sprites on that line. The rest of the time is spent in Mode 0 (also called HBlank), in which the LCD is inactive. After all 144 lines are rendered that way, 10 more sets of 912 cycles are spent with the LCD inactive in Mode 1 (also called VBlank). That makes a total of 912*154 = 140448 cycles spent per frame, resulting in a frame rate of 8388608 Hz/140448 = ~59.72 fps. The Gameboy renders its screen line by line, one at a time. Each line is largely treated independently from the others. The screen has 144 lines, with 160 pixels each. The time spent on each line is constant, exactly 912 cycles each (All listed cycle counts assume double-speed mode, single speed cycles counts are halved). These 912 cycles are split up into 3 phases, called Modes. The first phase is Mode 2, in which the LCD controller searches through the sprites to render, and which lasts for 160 cycles. It is followed by Mode 3, in which the data to render is sent to the LCD controller, and which can take anywhere from ~344 to ~592 cycles, depending on a lot of factors, like the number of sprites on that line. The rest of the time is spent in Mode 0 (also called HBlank), in which the LCD is inactive. After all 144 lines are rendered that way, 10 more sets of 912 cycles are spent with the LCD inactive in Mode 1 (also called VBlank). That makes a total of 912*154 = 140448 cycles spent per frame, resulting in a frame rate of 8388608 Hz/140448 = ~59.72 fps. While the LCD controller is accessing data, it is inaccessible for the CPU. That means that tiles can only be written and background and window changed in Modes 0-2, and sprites can only be written in Modes 0 and 1. Gameboy games usually handle this by using the time while the screen is rendered to execute its game logic, and use the VBlank period to do all the graphics updates preparing for the next frame. While the LCD controller is accessing data, it is inaccessible for the CPU. That means that tiles can only be written and background and window changed in Modes 0-2, and sprites can only be written in Modes 0 and 1. Gameboy games usually handle this by using the time while the screen is rendered to execute its game logic, and use the VBlank period to do all the graphics updates preparing for the next frame.

How the playback of Gameboy content is done

The main part of what this run does is provide a framework that allows the playback of arbitrary Gameboy footage in real-time. To achieve this, it takes several processing steps: The source footage is played and all relevant writes to memory are logged, resulting in a log containing at which cycle which value was written to which address in memory. The source footage is played and all relevant writes to memory are logged, resulting in a log containing at which cycle which value was written to which address in memory. From this log, you can determine the value of every address at every given cycle throughout the whole execution. This is used to determine, for each line of each frame, which tiles were rendered on that line in the background, window and sprites. Having gathered this information for all frames, you can work out which tiles and palettes are needed at which times, and when the background, window and sprite tiles need to be set to which value. The end result is a collection of actions that need to be taken with a range of cycles when they need to happen, which when executed have the same effect as the original footage. From this log, you can determine the value of every address at every given cycle throughout the whole execution. This is used to determine, for each line of each frame, which tiles were rendered on that line in the background, window and sprites. Having gathered this information for all frames, you can work out which tiles and palettes are needed at which times, and when the background, window and sprite tiles need to be set to which value. The end result is a collection of actions that need to be taken with a range of cycles when they need to happen, which when executed have the same effect as the original footage. This results in a way of rendering a scene to look the same as the original footage, but is generally more efficient, because it uses several optimizations that the original game doesn't use. For one, it only renders a tile only if it will actually be visible on the screen at some point during the scene, whereas games often render tiles that happen to end up off screen or covered by other tiles. Also, most games are loading and overriding tiles and palettes in chunks ("tilesets"), even if only some of them end up actually getting used, whereas the generated scene only loads a tile or palette if it ends up getting rendered, and tries to keep it loaded if it will be needed again later, so that most tiles and palettes are only loaded exactly once throughout the entire movie, even across different games. Additionally, tiles can be mirrored, allowing to re-use the same tile if they only differ by mirroring, so even fewer distinct tiles are used. Having full knowledge about the scene beforehand also means that you can load the necessary tiles and palettes spread out at convenient times, even long before they are actually needed. This results in a way of rendering a scene to look the same as the original footage, but is generally more efficient, because it uses several optimizations that the original game doesn't use. For one, it only renders a tile only if it will actually be visible on the screen at some point during the scene, whereas games often render tiles that happen to end up off screen or covered by other tiles. Also, most games are loading and overriding tiles and palettes in chunks ("tilesets"), even if only some of them end up actually getting used, whereas the generated scene only loads a tile or palette if it ends up getting rendered, and tries to keep it loaded if it will be needed again later, so that most tiles and palettes are only loaded exactly once throughout the entire movie, even across different games. Additionally, tiles can be mirrored, allowing to re-use the same tile if they only differ by mirroring, so even fewer distinct tiles are used. Having full knowledge about the scene beforehand also means that you can load the necessary tiles and palettes spread out at convenient times, even long before they are actually needed. In order to execute the actions to reproduce the scene, the list of actions needs to be serialized into a sequence of commands that can be executed one after the other so that each action is executed at the right time. This is a scheduling problem with lots of constraints, since each action not only has a different range of cycles it needs to be executed in, but also takes a different amount of time based on the type of action (e.g. loading a tile takes longer than setting a tile on the background grid). Also, different actions can only be executed at specific times when their memory regions are accessible (i.e. when they are not used by the LCD controller). In order to execute the actions to reproduce the scene, the list of actions needs to be serialized into a sequence of commands that can be executed one after the other so that each action is executed at the right time. This is a scheduling problem with lots of constraints, since each action not only has a different range of cycles it needs to be executed in, but also takes a different amount of time based on the type of action (e.g. loading a tile takes longer than setting a tile on the background grid). Also, different actions can only be executed at specific times when their memory regions are accessible (i.e. when they are not used by the LCD controller). The used commands are hand-crafted assembly functions that are loaded as part of the ACE payload, and perform specific tasks (e.g. load a tile into memory), reading all necessary information (e.g. pixels of the tile, location where it should be stored) from the joypad. For each command, I know precisely how many cycles it takes, at which cycles it reads joypad inputs, and at which cycles it writes its output. This information is crucial to be able to schedule the commands properly, at each point you need to know exactly at which point of the rendering of the frame the Gameboy is, to avoid the times when the required memory is inaccessible. The whole execution is planned precisely down to the CPU cycle. The used commands are hand-crafted assembly functions that are loaded as part of the ACE payload, and perform specific tasks (e.g. load a tile into memory), reading all necessary information (e.g. pixels of the tile, location where it should be stored) from the joypad. For each command, I know precisely how many cycles it takes, at which cycles it reads joypad inputs, and at which cycles it writes its output. This information is crucial to be able to schedule the commands properly, at each point you need to know exactly at which point of the rendering of the frame the Gameboy is, to avoid the times when the required memory is inaccessible. The whole execution is planned precisely down to the CPU cycle. An example command used in this movie, which writes a single byte to HRAM: WriteHByteDirect:: ; 88 cycles, 4 inputs at cycles (12,28,40,56), output at cycle 64 ld hl, $ff00 ; 12 ld a, [hl] ; 8 swap a ; 8 xor [hl] ; 8 ld c, a ; 4 ld a, [hl] ; 8 swap a ; 8 xor [hl] ; 8 ld [$ff00+c], a ; 8 ret ; 16 In order to define in which order the individual commands are executed, one of the commands pushes function pointers of the commands that should be executed in order onto the stack (again, read from the joypad). It is the first command to be executed after the payload has been loaded in the ACE initialization, and the last function pointer put onto the command stack always is the function itself, so that after the commands have been executed, we are ready to write a new command stack and keep going. Writing the new command stacks is interspersed between the commands that do the actual playback in regular intervals, since the stack has only limited capacity. An example command used in this movie, which writes a single byte to HRAM:In order to define in which order the individual commands are executed, one of the commands pushes function pointers of the commands that should be executed in order onto the stack (again, read from the joypad). It is the first command to be executed after the payload has been loaded in the ACE initialization, and the last function pointer put onto the command stack always is the function itself, so that after the commands have been executed, we are ready to write a new command stack and keep going. Writing the new command stacks is interspersed between the commands that do the actual playback in regular intervals, since the stack has only limited capacity. Game audio is handled in a similar way to the graphics: The log contains all memory writes to the sound subsystem, so by writing the same values we can recreate the same sound. Audio is not bound to any video frame, and its memory is always accessible. They are batched up when they happened in short succession in the original footage, and are replayed at approximately the same time (+- some thousand cycles). In the end they are actions that are sequenced into commands together with the graphics actions. Game audio is handled in a similar way to the graphics: The log contains all memory writes to the sound subsystem, so by writing the same values we can recreate the same sound. Audio is not bound to any video frame, and its memory is always accessible. They are batched up when they happened in short succession in the original footage, and are replayed at approximately the same time (+- some thousand cycles). In the end they are actions that are sequenced into commands together with the graphics actions.

Portal credits

After the success of playing back GB game content using ACE, where the sound was merely a side aspect, I wondered how capable the sound hardware is, and what you can do with it. Sound in a Gameboy turns out to be very limited in its abilities. It has 4 sound generating channels that can be connected to two output terminals. The first two channels generate square waves of different frequencies and amplitudes, with limited control over frequency and amplitude over time, and the last channel produces static noise. Sound in a Gameboy turns out to be very limited in its abilities. It has 4 sound generating channels that can be connected to two output terminals. The first two channels generate square waves of different frequencies and amplitudes, with limited control over frequency and amplitude over time, and the last channel produces static noise. Only the third channel is interesting, as it allows arbitrary wave patterns to be played. However, the RAM that holds the wave pattern only contains 32 samples that are repeated over and over, with only 4 bits per sample (i.e. 16 different possible values). It was clearly not designed for complex sounds like voice, but rather as an alternative way to creating waves with unusual shapes. You can hear this clearly in the title screen of Pokémon Yellow, with the very crude sound they achieved by overlaying multiple waves: You can hear the words, but it's not pleasant. Only the third channel is interesting, as it allows arbitrary wave patterns to be played. However, the RAM that holds the wave pattern only contains 32 samples that are repeated over and over, with only 4 bits per sample (i.e. 16 different possible values). It was clearly not designed for complex sounds like voice, but rather as an alternative way to creating waves with unusual shapes. You can hear this clearly in the title screen of Pokémon Yellow, with the very crude sound they achieved by overlaying multiple waves: You can hear the words, but it's not pleasant. However, you can use the third channel to play longer pieces of arbitrary audio, by managing to update the wave RAM while the sound is playing. This of course requires perfect precision when to update them, to ensure they are played once and only once. The sound can only be played at very specific frequencies of 2097152/x Hz, where x is an integer between 1 and 2048. For this to line up nicely with the Gameboy's frames, only specific values of x work, exactly multiples of 57. All arbitrary sounds in this movie use x=114, which results in exactly 2 samples played every 912 cycles, so it lines up perfectly with the line timings of the screen, resulting in a sample frequency of ~18396 Hz. However, you can use the third channel to play longer pieces of arbitrary audio, by managing to update the wave RAM while the sound is playing. This of course requires perfect precision when to update them, to ensure they are played once and only once. The sound can only be played at very specific frequencies of 2097152/x Hz, where x is an integer between 1 and 2048. For this to line up nicely with the Gameboy's frames, only specific values of x work, exactly multiples of 57. All arbitrary sounds in this movie use x=114, which results in exactly 2 samples played every 912 cycles, so it lines up perfectly with the line timings of the screen, resulting in a sample frequency of ~18396 Hz. Still, the problem remains that there are only 4 bits available per sample, not nearly enough to produce acceptable-quality sound. But there's one more audio control we can abuse: the volume control. The volume control provides a linear scaling of the audio with 8 discrete levels. By adjusting the volume for each sample, we can use it to increase the resolution of different amplitudes that can be achieved, from 16 to ~100 (some sample/volume conbinations result in the same effective amplitude). These effectively possible amplitudes are not evenly distributed though, there are more values available for the small amplitudes than for the large ones (which is actually exactly what you want). Still, the problem remains that there are only 4 bits available per sample, not nearly enough to produce acceptable-quality sound. But there's one more audio control we can abuse: the volume control. The volume control provides a linear scaling of the audio with 8 discrete levels. By adjusting the volume for each sample, we can use it to increase the resolution of different amplitudes that can be achieved, from 16 to ~100 (some sample/volume conbinations result in the same effective amplitude). These effectively possible amplitudes are not evenly distributed though, there are more values available for the small amplitudes than for the large ones (which is actually exactly what you want). So, what this movie does to produce high quality sounds (for a GB that is), is writing the wave RAM at exactly 2 samples every 912 cycles to update the samples data, while also rapidly adjusting the volume control at exactly the right times to tweak the resulting amplitudes. These processes need to be time shifted by 32 samples, meaning that the volume control affects the currently played sample, while the newly written sample is only played 32 samples into the future. So, what this movie does to produce high quality sounds (for a GB that is), is writing the wave RAM at exactly 2 samples every 912 cycles to update the samples data, while also rapidly adjusting the volume control at exactly the right times to tweak the resulting amplitudes. These processes need to be time shifted by 32 samples, meaning that the volume control affects the currently played sample, while the newly written sample is only played 32 samples into the future. This requires a lot of precision and cycle counting, and is performed by a special assembly function that is loaded with the initial payload, and fed the sound data using the joypad inputs as usual. In the idle times between two audio samples, it updates the tiles on the screen to render the accompanying text and pictograms, so it also needs to be synced up with the LCD operations to only write when the memory is accessible. This requires a lot of precision and cycle counting, and is performed by a special assembly function that is loaded with the initial payload, and fed the sound data using the joypad inputs as usual. In the idle times between two audio samples, it updates the tiles on the screen to render the accompanying text and pictograms, so it also needs to be synced up with the LCD operations to only write when the memory is accessible.

SpongeBob video sequence

For the ending, I wanted to go all-out, and see how good of an A/V experience you could produce on Gameboy hardware using only the joypad inputs. Part of it was that I wanted to show off so-called HiColor graphics. The Gameboy only has space to store 8 palettes each background tile can choose from, with 4 colors each, so the maximum amount of colors you can use on each frame is usually 32, and each 8x8 tile area can only use 4 of them at a time (plus some extra colors for the sprites which draw from different palettes, but they're not useful for this purpose). The so-called HiColor technique allows you to use significantly more colors in an image, by changing the palettes for each rendered line. This way, each line could use its own colors, even within the same 8x8 tile. This technique was not originally intended in the Gameboy's design, but it was actually used in some commercial Gameboy games. The problem with it is that you have only a very small time window to update the palettes before the next line is rendered. It is impossible to update all 8 palettes each line, so most games only update some of them, mostly 4, resulting in a total of 2304 possible colors each frame. However, there are still a lot of limitations (e.g. while you change the colors of the palettes, all tiles still point to the same palette indices, so the configuration of which tile uses which palette is constant for each line of 8x8 tiles), and it requires a lot of precision to do the palette change at exactly the right time, prohibiting the game from doing much else in the mean time. Moreover, the whole palette-swapping procedure needs to be repeated each frame, even if the screen content isn't changing at all, so it is a significant battery drain. Part of it was that I wanted to show off so-called HiColor graphics. The Gameboy only has space to store 8 palettes each background tile can choose from, with 4 colors each, so the maximum amount of colors you can use on each frame is usually 32, and each 8x8 tile area can only use 4 of them at a time (plus some extra colors for the sprites which draw from different palettes, but they're not useful for this purpose). The so-called HiColor technique allows you to use significantly more colors in an image, by changing the palettes for each rendered line. This way, each line could use its own colors, even within the same 8x8 tile. This technique was not originally intended in the Gameboy's design, but it was actually used in some commercial Gameboy games. The problem with it is that you have only a very small time window to update the palettes before the next line is rendered. It is impossible to update all 8 palettes each line, so most games only update some of them, mostly 4, resulting in a total of 2304 possible colors each frame. However, there are still a lot of limitations (e.g. while you change the colors of the palettes, all tiles still point to the same palette indices, so the configuration of which tile uses which palette is constant for each line of 8x8 tiles), and it requires a lot of precision to do the palette change at exactly the right time, prohibiting the game from doing much else in the mean time. Moreover, the whole palette-swapping procedure needs to be repeated each frame, even if the screen content isn't changing at all, so it is a significant battery drain. I did some calculations, to find out how much quality I can put into the sequence, limited by the amount of data I could possibly push through in a given amount of time. It came down to a balance of frame quality and frame rate: If I try to refresh the whole 20*18 tile screen every video frame, that's 20*18*16 = 5760 bytes of data, costing at least 5760*36 = 207360 cycles to read from the joypad (36 cycles is a lower bound, for just loading the byte, not actually doing anything with it). Additionally, I'd need to load 144*4 palettes for each line of the image to produce the HiColor effect, costing another 4608 bytes or 165888 cycles to load. Meanwhile, each Gameboy frame I need to maintain the palette switching to keep the HiColor effect going, costing around 62784 cycles, meaning there are only 77664 free cycles each Gameboy frame to do something useful like loading the next video frame. This would have meant I can only show a new video frame each ~6 Gameboy frames under ideal circumstances, resulting in ~10fps video, which I deemed not good enough. I did some calculations, to find out how much quality I can put into the sequence, limited by the amount of data I could possibly push through in a given amount of time. It came down to a balance of frame quality and frame rate: If I try to refresh the whole 20*18 tile screen every video frame, that's 20*18*16 = 5760 bytes of data, costing at least 5760*36 = 207360 cycles to read from the joypad (36 cycles is a lower bound, for just loading the byte, not actually doing anything with it). Additionally, I'd need to load 144*4 palettes for each line of the image to produce the HiColor effect, costing another 4608 bytes or 165888 cycles to load. Meanwhile, each Gameboy frame I need to maintain the palette switching to keep the HiColor effect going, costing around 62784 cycles, meaning there are only 77664 free cycles each Gameboy frame to do something useful like loading the next video frame. This would have meant I can only show a new video frame each ~6 Gameboy frames under ideal circumstances, resulting in ~10fps video, which I deemed not good enough. Instead, I chose to lower the quality a bit to achieve a higher frame rate. The two compromises I made is to not update all the screen tiles, and to only update 2 of the palettes each line instead of 4, cutting down on the maintenance costs. This way I could push the frame rate up to a more acceptable 15fps, updating the video frame every 4 Gameboy frames, while maintaining a HiColor image with 960 total colors and good quality audio. Instead, I chose to lower the quality a bit to achieve a higher frame rate. The two compromises I made is to not update all the screen tiles, and to only update 2 of the palettes each line instead of 4, cutting down on the maintenance costs. This way I could push the frame rate up to a more acceptable 15fps, updating the video frame every 4 Gameboy frames, while maintaining a HiColor image with 960 total colors and good quality audio. Unlike the playback of other Gameboy content before, this is not assembled out of individual pieces, but instead a single hand-crafted assembly function that coordinates everything, because of just how much preciosion is necessary down do every CPU cycle. It basically uses double-buffering to show one video frame, while building up the next one and switching them using the two available background tile maps. For each line that is rendered, it performs multiple operations: It updates the music samples and volume (as described above), it writes the next two palettes to update the HiColor image for the next line, it loads 1/2 of a new tile to memory for the next video frame, and it loads 3/8 of a palette to memory for the next video frame. The awkward fractions are necessary in order to be able to squeeze everything into the 912 clock cycles that are available for each line. The VBlank period is used to load the tile attributes (i.e. the mapping of tile to palette), and to prepare the rendering of the new frame. Unlike the playback of other Gameboy content before, this is not assembled out of individual pieces, but instead a single hand-crafted assembly function that coordinates everything, because of just how much preciosion is necessary down do every CPU cycle. It basically uses double-buffering to show one video frame, while building up the next one and switching them using the two available background tile maps. For each line that is rendered, it performs multiple operations: It updates the music samples and volume (as described above), it writes the next two palettes to update the HiColor image for the next line, it loads 1/2 of a new tile to memory for the next video frame, and it loads 3/8 of a palette to memory for the next video frame. The awkward fractions are necessary in order to be able to squeeze everything into the 912 clock cycles that are available for each line. The VBlank period is used to load the tile attributes (i.e. the mapping of tile to palette), and to prepare the rendering of the new frame. Preparing the source video to be in a format that is suitable to be rendered this way while still looking acceptable was a challenge in itself. Even though there are many more colors available in HiColor mode, they are not available where you want them. Since I update only 2 palettes per line, that means the palette a specific 8x8 tile uses only updates every 4 lines, so there are still effectively up to 4x8 blocks of pixels which use the same 4-color palette. And since you only have 8 palettes available at a time for 20 tiles in each line, some will need to share the same palette. Determining which palettes are best and for which blocks to use them turns out to be a difficult problem with many constraints. I used some known algorithms to determine a good palette for each block ( Preparing the source video to be in a format that is suitable to be rendered this way while still looking acceptable was a challenge in itself. Even though there are many more colors available in HiColor mode, they are not available where you want them. Since I update only 2 palettes per line, that means the palette a specific 8x8 tile uses only updates every 4 lines, so there are still effectively up to 4x8 blocks of pixels which use the same 4-color palette. And since you only have 8 palettes available at a time for 20 tiles in each line, some will need to share the same palette. Determining which palettes are best and for which blocks to use them turns out to be a difficult problem with many constraints. I used some known algorithms to determine a good palette for each block ( Median cut k-means clustering ), used some simplifying assumptions to distribute the palettes on the blocks, and some Dither to smooth out the resulting image. Moreover, the colors you see on the screen and the colors which a Gameboy Color produces are different, meaning that the same RGB value will produce different results on a computer screen and on a Gameboy screen. Luckily, a sneak peek into the source code of the emulator shows how it does the conversion, and all I need is to do the reverse transformation. One matrix inversion later I got a working color transformation to convert the video colors into GB colors. Moreover, the colors you see on the screen and the colors which a Gameboy Color produces are different, meaning that the same RGB value will produce different results on a computer screen and on a Gameboy screen. Luckily, a sneak peek into the source code of the emulator shows how it does the conversion, and all I need is to do the reverse transformation. One matrix inversion later I got a working color transformation to convert the video colors into GB colors.

Ending