Animating gAnime with StyleGAN: The Tool

In-depth tutorial for an open-source GAN research tool

Visualization of feature map 158 at a layer with resolution 64x64

0. Preface

This is a tutorial/technical blog for a research tool I’ve been working on as a personal project. While a significant portion of the blog assumes you have access to the tool while reading it, I attempted to include enough screenshots that it should be clear how it works even if you do not have time to try it out yourself.

In the tutorial we’ll interact with a trained StyleGAN model to create (the frames for) animations such as this:

Spatially isolated animation of hair, mouth, and eyes

In the animation above, the transformations that change the mouth, eyes, and hair are mostly separate. This is often preferable to other methods (that I’m aware of) for creating talking animations with GANs, which may cause side effects such as hair loss:

Another animation we’ll create to demonstrate how changes in the ‘mouth’ attribute can influence other parts of an image. Note the hair thickening and thinning along the edges.

We’ll also build simple heuristic facial feature detectors by using feature maps at various layers in the network:

Using feature maps at various layers to detect mouths, eyes, and cheeks

These detectors can then be used to automate meaningful modifications in order to isolated portions of images:

These images generated were in a single batch without human intervention or a labeled dataset

None of these steps require labels for the training set, but there is a bit of manual work involved.

1. Introduction

You can download a compiled version of the tool from one of the following links:

sha256: ec2a11185290c031b57c51edb08bf786298201eb36f801b26552684c43bd69c4

It comes with a model trained on an anime dataset (which a number of projects are based on). Unfortunately, due to the nature of the dataset, there is a lack of gender diversity. I’m currently trying to train a model to produce higher quality masculine images, but it will take a while to complete. The dataset also contains NSFW images, and while I’ve generated thousands of images and never encountered anything NSFW, I didn’t vet every image in the training set (the risk may increase with large modifications to feature maps in early layers). If you run into problems, you can create an issue in https://github.com/nolan-dev/GANInterface and I’ll try to respond.

This blog has two parts: a basic tutorial, and an advanced tutorial. The basic tutorial demonstrates how to use the tool and shouldn’t require much technical knowledge to complete (though it does include some technical explanations for those interested). The advanced tutorial demonstrates how to customize the tool and is more technical — you should be familiar with feature maps in convolutional neural networks, for example.

I introduced the tool in a previous blog and shared the source code, but getting it working from there is complicated and requires a model trained with my custom StyleGAN implementation. I hope supplying a compiled version of the tool and a pre-trained model makes it easier to try out. The tool is part of a larger project (including a reimplementation of StyleGAN) which the previous blog discusses, but reading it is not a prerequisite for this blog. Here’s a link if you’re interested though:

As this is a research tool, I’ve been adding and subtracting features regularly to get a better idea of how the model works and the best way to interact with it. There are a lot of minor features, but major ones include:

Modify the latent vector that produces an image in order to interpolate between images, express certain features, and trade off between quality and variation (truncation trick)

Modify feature maps to change specific locations in an image: this can be used for animation

Read and process feature maps to automatically detect meaningful features

Automate all of the above by creating batch jobs

Like the previous blog, my goal for this blog is to get others’ perspectives on the topic, detail my experience working on the project, and receive constructive criticism/corrections. The tutorial format of this blog is meant to mitigate the underdeveloped UI of the tool, and make it possible to use without dealing with the messy source code. Unfortunately, it’s Windows only, however it has been tested on a free tier Windows AWS instance (Microsoft Windows Server 2019 Base, image generation will be slow with this however).

2. Basic Tutorial

A quick note before we get started: I modified the tool while writing this, so some screenshots are slightly different from the current version, but everything is in roughly the same place.

Once you’ve downloaded and opened the zip file linked above, you’ll be presented with several files/folders:

I’ll explain some of these in more detail in the advanced section (3.3), but at this point the only important file is GanStudio.exe. When you run it, click ok to the disclaimers (and hopefully don’t get any errors), you’ll be presented with something like the following:

Due to the complexity of the UI, the first time I reference a part of the tool in this tutorial I’ll have a nearby screenshot with the relevant part outlined in red. Many of the UI elements have tooltips.

Setup:

Using this tool involves interacting with windows explorer, and it’s easiest to view generated files with “Large” or “Extra large” icons. Select one of these by right clicking on an explorer window and selecting ‘View’:

In many cases it is also helpful to sort images by date modified, also achieved through the right click menu:

Generate New Image

To test out image generation, click Generate New (3). This will produce a new latent code, and display the image for it. Note generating the first image usually takes longer than subsequent images.

The image was created randomly, but was interpolated to be close to the ‘average’ image. This results in higher quality images, but reduces variation. If you have the quality slider (1) in the same place as the above image, your image will likely be similar: a brownish haired girl with purple and/or blue eyes.

Load Image

To keep images in this tutorial consistent with what the tool will produce, I’ve provided a sample image. Click ‘Import Image’ (above, 2). This will create an open file dialog box in the ‘portraits’ directory. Navigate to the directory above ‘portraits’ and select ‘tutorial_files’:

Double click on sample_01.png to load it. All images you generate are saved to the ‘portraits’ folder and you can load them again using this method.

GANs cannot normally load an arbitrary image, however this tool will append the latent code that generated the image to every PNG file it writes to the disk. The Import Image button reads the latent code that was written to the image you select. As long as the tool has loaded the model that generated the image it will be able to recreate it.

Modify Attribute

Hover over attribute label that have been cut off to see full name

To start modifying attributes, select the ‘Attributes_0’ tab (above). Attributes include hair/eye color, intensity of background, accessories, and mouth state (smile/open). Moving a slider that corresponds to an attribute to the right will increase the influence of that attribute on the image, moving it to the left will decrease said influence. Some of them work better than others. After selecting a location, press ‘Update This Image’ (above). Here are some examples:

Left to right/top to bottom: Positive open_mouth, positive skin_tone, negative background and negative black_hair, positive blonde_hair and negative smile

One downside of modifying attributes in this way is that they are not always spatially isolated; modifying an attribute that should only influence one part of an image will also influence other parts. This is particularly problematic if we want to create animations. To see the problem in action perform the following steps (screenshot below for reference):

Scroll down to the ‘open_mouth’ slider Move it to to the right Press ‘Update This Image’. The mouth should now be open slightly Select Batch->Attribute->Spectrum Select ‘OK’ to generate 5 images, and ‘No’ to the ‘Slide Past 0?’ prompt.

This will produce 5 images with the ‘open mouth’ attribute shifting from 0 to the selected location on the slider:

Throwing these into a gif generator produces the following animation:

Frame order based on the first number in the filename: 0, 1, 2, 3, 4, 3, 2, 1

As you can see, features all over the image change even though we only selected an attribute related to the mouth.

Modifying Specific Locations

In this section we’ll make a modification to just the mouth, without altering other features. Unfortunately, at this point images imported with the Import Image button will not reflect the changes made here.

Repeat the instructions in the Load Image section to get back to the base image (or reset the attribute sliders and update). Well use the ‘Spatial’ tab (below) to modify isolated parts of an image.

The UI is complicated, however for this section we’ll only be using a couple parts. The first thing we need to do is indicate which part of the image we want to change:

Select the ‘mouth_open’ tab. Click, hold, and drag the cursor across the mouth before letting go

This makes a selection around the mouth, and ensures that our changes to the image will only influence the area selected.

Click on locations within these squares to create them on your image, or click and drag across this portion of the image

This will produce a light green square, unless “Swap green for blue in visualization” is selected. I’ll have that option selected for this tutorial to improve visibility, and hopefully be more colorblind friendly when we start dealing with the red boxes that indicate negative influence.

If you selected a location in error, you can remove squares by holding ‘control’ when you make a selection:

Remove an undesired selection. Click and drag with ctrl held to remove multiple selections.

The following are all of the ways you can ‘draw’ on the image. Some of these aren’t needed yet, but will be useful later:

Left click to produce a box with size depending on the selected resolution (more on that later). This box indicates a positive influence at that location and will either be green or blue depending on your settings. Right click to produce a red box with negative influence at that location Ctrl+click to erase a box at a location Click and drag to draw a large rectangular area when performing any of the above If ‘No Selection Overlap’ is not checked, you can select the same location multiple times to increase the (positive or negative) magnitude of the selection. This is visualized as higher color intensity and thicker boxes.

With the mouth selected, move the slider below the ‘mouth_open’ tab to the right until the number on the bottom left is around 100. This slider is the “Feature Map Multiplier Slider”, which influences the active tab exponentially as it is moved to the right (positive influence) or left (negative influence). With the number in the bottom left of the slider set to around 100, select ‘Update This Image’:

This should produce the following image:

As the name of the tab would imply, this opened the mouth. Let’s try and animate with this method. Select Batch->Fmap->Combinatoric (I’ll expand on why it’s called that in the advanced tutorial):

Select 5 for images to generate:

Select 0 for the start point. The batch will consist of 5 images with the slider regularly spaced between the start and end point (0, 20, 40, 80, 100). Because the mouth is closed by default in this image, a start point of 0 (no influence) means closed.

This will produce 5 images with less spatial entanglement than the attribute method:

A gif generator produces the following:

This same process can be used with the other tabs, and different tabs can be combined. If you’re using the Combinatoric batch generator, you’ll need to keep the multiplier bar at zero for all tabs except the active one to avoid producing combinations of multiple tabs. This can be done by pressing ‘Set All to Zero’ before changing the active tab’s multiplier:

Here’s a list of some of the possible changes. Note that using large multipliers has a good chance of producing strange artifacts:

red_or_blue_eyes

With the following start and end points:

The dialog actually says Start Point, this is an old screenshot. The start point corresponds to the first image that gets generated in the batch, and the end point corresponds to the last.

Produces:

2. blush

Settings:

If the screenshot and the tool disagree, believe the tool. This is an old screenshot (should say Start Point instead of End Point)

Animation:

3. hairband

Settings:

I actually had the variable for this prompt called ‘startValue’ originally, don’t know why I made the prompt say End Point. You may have guessed this is an old screenshot

Animation:

4. hair

Settings:

You’d think it would make sense to start at the low number and end at the high number, but actually this is an old screenshot and End Point should be Start Point. It will start at 100 and shift down to -257

Animation:

These settings slightly influence the mouth because the effective receptive field of convolutions late in the network cover a lot of spatial locations in early layers.

5. ice_is_nice (???)

Settings:

We’re selecting the entire image here. For scenarios like this, I’d advocate for the click and drag method over clicking 4096 times with extreme precision. Also: it should say Start Point instead of End Point

Animation:

Example of a modification late in the network: details are influenced, but overall structure stays the same

Other things to try:

Batch generate new images with the quality and attribute bars at various locations:

Use ‘Set As Base Image’ (below) to make the quality bar interpolate between the current image (instead of the average image) and new images. This can be useful for finding new images that are slightly different from the current image when combined with Batch->New latents (above).

Use ‘Toggle Prev’ to quickly switch between the current image and the previous one to inspect changes.

Focus on the red box, not the blue box around ‘Toggle Show Selection’ which I clicked earlier. At the moment toggle show selection doesn’t even work.

Use Misc->Interpolate Between Two Images to generate a spectrum of images in between two existing images.

3. Advanced Tutorial

This section assumes some familiarity with convolutional neural networks and StyleGAN.

3.1 Finding New Features

Tabs in the ‘spatial’ section (mouth_open, hairband, etc) correspond to values added to specific feature maps at a specific layer. Try selecting the mouth_open tab. In the combo box above the tabs, it should show a resolution: 16x16. Early layers in StyleGAN have low resolution feature maps, while later layers have high resolution feature maps (resolution regularly doubles). As the images we generate are 256x256 pixels, the layer that corresponds to 16x16 is early in the network. To view which feature maps are modified by the mouth_open tab, press ‘View All Fmap Mults’ with ‘Filter Zero’ checked and select the ‘Feature Map Input’ tab:

This means spatial locations selected in the image by clicking on it are multiplied by -2, then multiplied by the feature map multiplier slider, and the result is added to feature map 33 on the layer that has a 16x16 resolution.

Some tabs influence multiple feature maps:

The feature maps influenced by the ‘hairband’ tab, viewed by clicking on ‘View All Fmap Mults’

I found these multipliers manually by playing around with the tool. I used two methods:

To modify an existing attribute in an image (for example, mouth), I use trial and error by modifying different feature maps and seeing which one produces the desired result. To add an attribute (hairband), I see which feature maps are active when that attribute is present.

In the next two sections I’ll walk through examples for these methods.

Method 1: Modify the mouth

Given that the network can generate images with open and closed mouths (required to fool the discriminator) and that feature maps at each layer are representations of the final image, it makes sense that modifying the feature maps could be used to open or close a mouth. However, there’s no guarantee that modifying a single feature map will result in a meaningful change — it may be that we need to modify a combination of many feature maps to get the desired result. That said, single feature maps are easier to work with (at least with the current tool), so seeing how each one influences an image can serve as a starting point.

Here’s how I found feature map 33 to open/close mouths. First, I added a 16x16 tab with the ‘Add Tab’ button (below). I chose this resolution because it produces boxes that are reasonably mouth-sized. Smaller resolutions would change areas beyond the mouth, while larger resolutions often result in changes with finer granularity than opening or closing a mouth (choice of resolution is heuristic at this point). By clicking ‘View All Fmap Mults’ again we see that no feature maps are set for the new tab. Then I moved the slider to around 190, again a heuristic decision based on past experience with the model. Finally, as we did before, I selected the two boxes that contain the mouth.

Adding a new tab, which initially does not influence any feature maps

Then, I selected Batch -> Fmap -> Axis-aligned, and selected 512 images to get generated.

This will actually produce 1024 images, as for each feature map it both adds and subtracts the value specified in the multiplier bar (190 in this case) to the spatial location marked in the image (the mouth). Batch generation pops up a window that shows how many images have been generated and allows you to interrupt the process. Clicking on the counter next to ‘Generating image’ opens the directory to which they are being written:

The number prepended to the file names is the feature map that was modified

The sample that starts with ‘33_n_sample’ (n stands for negative) clearly has an open mouth, while ‘33_p_sample’ does not. This means that when 190 was subtracted from feature map 33 around the mouth, the mouth opened.

I set feature map 33 to -1 using the ‘Set Fmap’ box (below). This makes it so moving the slider to the right will open the mouth (which feels more intuitive than setting feature map 33 to 1 and having the tab named ‘mouth_close’), and I renamed the tab to mouth_open using the ‘Rename Tab’ button. The Save Tabs button next to Rename Tab can be used to save the tabs.

Method 2: Add a hairband

This method relies on finding existing images with the desired attribute. This requires a base of images to work off of, which can be generated with Batch->New latents. In these cases I’ll usually move the Quality bar a bit past the middle to ensure there’s a reasonable amount of variation.

It may take a few hundred samples to get several samples with hairbands. I added one to the tutorial_files directory which I’ll load for this tutorial (sample_02.png). Before loading it, create a 16x16 tab and make sure ‘Fmaps To Get’ is set to ‘All’ or ‘Current Tab’ (below). These options get and store extra output from the network when a new image is created: the feature maps for the current tab, or for all tabs. This can slow things down a bit, so it’s not the default option (also as of this writing it has something like a 0.5% chance of causing a crash, relevant for large batches).

Then, do the following:

Select boxes around the hairband Under ‘Add View Feature Map Buttons’, select ‘Sort by Similarity To Selections’ Press ‘Update This Image’

This adds a bunch of buttons to the ‘Feature Map Output’ tab:

These buttons correspond to feature maps. They are sorted by magnitude of the dot product of the feature map and the selection on the image (after they’re flattened). This makes it so feature maps with large magnitudes around the hairband will show up earlier in the list of buttons.

Here’s an example of the 310th feature map. Blue corresponds to positive values, red to negative. The larger absolute value at a location, the thicker and more saturated the box drawn there.

This feature map seems to have large positive values around the hairband, but also around the mouth. While it’s clearly used for more than just hairbands, we can try modifying it to see what the result is. Set feature map 310 to 1 for that tab, erase the selections around the hairband (ctrl+click and drag), and load sample_01.png again.

Try selecting hair, increasing the weight to around 100, and updating the image:

…not has much changed. However, as we didn’t produce any weird artifacts, it may not hurt to increase the magnitude beyond 100.

Around 600 we do get what looks like a hairband. My initial hypothesis is that one reason we need to use a large magnitude is because hairbands are uncommon.

For the hairband tab included with the tool I set several other feature maps, which were active around the hairband in example images, to 3. Setting them to a number larger than 1 helps normalize the expected range of the tabs so that setting the multiplier to around 100 should express the wanted attribute.

3.2 Automatic Feature Detection

One problem with the way we’ve been modifying attributes is that it requires manually selecting the squares we want to change. This is definitely a downside when compared to modifying the latent vector in the ‘mouth_open’ direction, which did not require us to know the mouth’s location (even though it also modifies non-mouth features). This prevents the method from scaling well; while easier than drawing, every modification still requires human intervention. However, as we saw in the previous section, some feature maps correlate with the location of attributes: maybe feature map 310 could be used to generically detect hairbands, for example. Let’s see if we can find a way to detect the mouth in an image using just a linear combination of feature maps.

First, repeat the process used to get feature maps active around a hairband, only this time select the mouth:

Like before, we can click a button to show a feature map:

This method is a bit slow, however, when it comes to viewing and comparing a large number of feature maps. Instead, type 20 into the text box below the ‘Add From Output’ button and press the button. Then, press ‘View Multiple Fmaps’.

‘Add From Output’ with 20 adds the feature maps from the first 20 buttons to the ‘Fmaps’ text box, and ‘View Multiple Fmaps’ displays them all side by side.

The first 4 (along with several others), which correspond to feature maps 234, 34, 370, and 498 all look like they could be mouth detectors. However, we don’t know if they consistently detect the mouth for new images. To test this, we can generate several new images with the Quality bar to the right of the center for decent variance. First make sure ‘Record Feature Maps’ is checked. Use ‘Reset History’ to clear existing records. Also, make sure ‘Fmaps To Get’ is set to ‘Current Tab’ (the tool does not record history for all resolutions, just the resolution that corresponds to the current tab). Then, we can generate a number of new images by using Batch-> New latents and the tool will record their feature maps.

In this case I’ll generate 10 new images, which will be different from any of yours. To view the same feature map for all the images, type the feature map into the Fmaps box and select ‘View History’. I’ll do that for each of 234, 34, and 370.

234:

34:

370:

It doesn’t hurt that the mouth doesn’t change position much, but these feature maps do seem to track it reliably.

The same process can be applied to other attributes at other layers. Here are some examples:

Eyes:

Cheeks:

Background:

In many cases I get the most consistent results by combining multiple feature maps. To do this, I made python scripts callable from the tool (which could be useful for future features as well), as I’d rather do multidimensional data processing with numpy. The tool uses the PATH environmental variable to find pythonw.exe, so that will need to be set before running the tool. The scripting functionality is the newest feature in the tool and even less developed than the rest. Here’s an example:

spatial_map.py is stored in the ‘scripts’ directory, and you’ll need to install its dependencies to use it. The tool passes the path to a directory where the feature maps specified in ‘Script Feature Maps’ are written. It then combines those feature maps and outputs the result, which is read by the tool and used to make selections in the image. Here’s an example that uses some of the feature maps we found earlier which correlated with mouth location:

Moving the mouth slider to ~100 and updating the image opens the mouth like normal.

This lets us automate image creation with certain attributes.

By selecting ‘Run and Modify’ under ‘Run Script When Generating’, setting mouth_open to around 100, and generating new latents, we can ensure new images have an open mouth. By setting mouth_open to -100 and applying the settings to the directory with open mouth images we can generate the same images with closed mouths.

Applying settings to a directory can be done with Misc->Apply Current Sliders To Directory and selecting the directory with generated images:

Alternatively, this can be achieved in one go using the Batch->Fmap->New Latent Combinatoric:

Note that this option will do a combinatoric modification based on all tabs with non-zero multipliers. For example, if the ‘hair’ tab has a non-zero multiplier and we select 3 ‘hair’ images to be generated and 2 ‘mouth_open’, it will create 6 variations of each image with the following attributes:

short hair, closed mouth

short hair, open mouth

medium length hair, closed mouth

medium length hair, open mouth

long hair, closed mouth

long hair, open mouth

A couple more points on scripts:

You can add a multiplier in front of feature maps to change the influence of various maps when they’re combined.

The resolution of feature maps used to detect facial features does not need to match the resolution to which we input data. The script should resize as needed:

There are other potential uses for scripts, such as recording a representation of the feature maps for analysis after the tool is closed.

This covers most of the tool’s functionality. In the next section I discuss the architecture in more detail.

3.3 Tool Details