The faces on the left were created by a GAN in 2014; on the right are ones made in 2018. Image: Goodfellow et al; Karras, Laine, Aila / Nvidia

The Synthetic Media Revolution

Ian Goodfellow and his colleagues gave the world generative adversarial networks (GANs) five years ago, way back in 2014. They did so with fuzzy and ethereal black & white images of human faces, all generated by computers. This wasn’t the start of synthetic media by far, but it did supercharge the field. Ever since, the realm of neural network-powered AI creativity has repeatedly kissed mainstream attention. Yet synthetic media is still largely unknown. Certain memetic-boosted applications such as deepfakes and This Person Does Not Exist notwithstanding, it’s safe to assume the average person is unaware that contemporary artificial intelligence is capable of some fleeting level of “creativity.”

Media synthesis is an inevitable development in our progress towards artificial general intelligence, the first and truest sign of symbolic understanding in machines (though by far not the thing itself- rather the organization of proteins and sugars to create the rudimentary structure of what will someday become the cells of AGI). This is due to the rise of artificial neural networks (ANNs). Popular misconceptions presume synthetic media present no new developments we’ve not had since the 1990s, yet what separates media synthesis from mere manipulation, retouching, and scripts is the modicum of intelligence required to accomplish these tasks. The difference between Photoshop and neural network-based deepfakes is the equivalent to the difference between building a house with power tools and employing a utility robot to use those power tools to build the house for you.

Succinctly, media synthesis is the first tangible sign of automation that most people will experience.

Accelerated Evolution to an End

Public perception of synthetic media shall steadily grow and likely degenerate into a nadir of acceptance as more people become aware of the power of these artificial neural networks without being offered realistic debate or solutions as to how to deal with them. They’ve simply come too quickly for us to prepare for, hence the seemingly hasty reaction of certain groups like OpenAI in regards to releasing new AI models.

Already, we see frightened reactions to the likes of DeepNudes, an app which was made solely to strip women in images down to their bare bodies without their consent. The potential for abuse (especially for pedophilic purposes) is self-evident. We are plunging headlong into a new era so quickly that we are unaware of just what we are getting ourselves into. But just what are we getting into?

The best way to find out is to imagine a future point where synthetic media has matured enough to be threatening but not to the point of replacing all traditional media, to a world where even the most technologically illiterate are aware of the vast changes.

And I feel that shall come as soon as 2024.

A GauGAN-enhanced 30-second sketch of a mountain

Source: http://nvidia-research-mingyuliu.com/gaugan

A GauGAN-enhanced 1-minute sketch of a natural scene

Source: http://nvidia-research-mingyuliu.com/gaugan

The World of 2024

Welcome to August 2024, where the news is hysterical and the people are bemused, where decade-defining fads are generated by bots and human culture itself undergoes the beginnings an ultracultural memetic evolution.

Synthetic media is a common feature of internet culture. Users make do with what exists and get creative to get around drawbacks and limitations, giving us an even greater idea of what will come ahead.

Let’s start with an entire industry that is likely to evaporate in very short order.

A plurality of fashion models speak out against irrelevance and the copyrighting of faces, for they are now seen as superfluous and demanding- their only hope is to sell their likenesses for what little it might be worth, handing over agency over their own identities to algorithms. Yet this proves fruitful only for those already established.

Neural networks can generate full human figures, and altering their appearance and clothing is a matter of changing a few parameters or feeding an image into the data set. Changing the clothes of someone in a picture is as easy as clicking on the piece you wish you change and swapping it with any of your choice (or result in the personal wearing no clothes at all). A similar scenario applies for make-up. This is not like an old online dress-up flash game where the models must be meticulously crafted by an art designer or programmer- simply give the ANN something to work with, and it will figure out all the rest. You needn’t even show it every angle or every lighting condition, for it will use commonsense to figure these out as well. Such has been possible since at least 2017, though only with recent GPU advancements has it become possible for someone to run such programs in real time.

The unfortunate side effect is that the amateur modeling industry will be vaporized. Extremely little will be left, and the few who do remain are promoted entirely because they are fleshy & real human beings. Professional models will survive for longer, but there will be little new blood joining their ranks. As such, it remains to be seen whether news and blogs speak loudly of the sudden, unexpected automation of what was once seen as a safe and human-centric industry or if this goes ignored and under-reported- after all, the news used to speak of automation in terms of physical, humanoid robots taking the jobs of factory workers, fast-food burger flippers, and truck drivers, occupations that are still in existence en masse due to slower-than-expected roll outs of robotics and a continued lack of general AI.

We needn’t have general AI to replace those jobs that can be replicated by disembodied digital agents. And the sudden decline & disappearance of models will be the first widespread sign of this.

TimbreTron used to stylistically transfer instruments, such as turning a piano into a harpsichord. It does this by “understanding” what instruments sound like and recreating that instrument by manipulating waveforms

Rudimentary music generation apps will allow for anyone to create the music they want, within reason. As for the method utilized, this ranges from creating melodies via MIDI files all the way to generating the raw audio waveforms of instruments. This is most obvious with “style remixing”: inputting a song into an ANN and altering the parameters to suit your desires. Such will range from methods current available but technically complex for the common person- taking regular songs and removing all vocals, taking instrumental songs and adding vocals, removing entire parts of a song without the final product feeling choppy, and adding minor effects- all the way to qualitative changes in the music.

For example: changing a violin to a piano, or turning a Metallica song into a particularly aggressive and electronic Justin Bieber track (and vice versa), or taking the Star Wars theme and having it played on a theremin. This naturally includes shifting the vocals in a way that current audio programs a la Audacity are incapable of doing naturally. And it needn’t be a pure swap between two unlike songs. Rather, you can take any song and alter any individual aspect of it- change the gender of the vocalist, replace a synth with a sitar, create a new harmony between instruments, and more. What’s more, you can add lyrics that were not originally there and fit them into the overarching melody.

The memetic possibilities are nigh-endless, and by God have they been exploited. Common sense dictates that if you can do this with a music file, you ought to also be able to do this with any audio file. If it exists digitally, you can manipulate it. Give an ANN something to work with, and it will find & fill in the gaps.

This includes the likes of voice acting. Productions that require dubs still require their services, as voice synthesis is not perfect. Even with ANN mastery of intonations and timbre, context and direction are very difficult to pull off. As a result, voice actors remain on board for professional productions, while amateur works that utilize advanced text-to-speech also inevitably require digital coaching and adjustments. Of course, a potential work around is the use of vocal style transfer- you speak into the microphone and the neural network transforms your voice into someone else’s. And while this is a godsend for budget and fanworks, it’s still not perfect.

Yet it need not be perfect- only good enough to be serviceable.

Artists who seek commissions as a source of income now fear what the near future may bring. Whatever material they host online is used against them, fed into neural networks to produce new works. The most they can hope for is for the Romantic to choose the authentic over the artificial. Newcomers, however, maybe see image synthesis & manipulation as a means of becoming commissioned artists themselves, using the still-existing difficulty curve of exacting perfect dream images to charge for realizing a client’s wishes. Even the more exaggerated and cartoony styles have been achieved by the machines, rendering it possible to generate a true cartoon image of anyone and anything.

Freelance commission artists are not the only ones watching the progressing of AI-generated art.

The extraneous hands typically involved in animation find themselves left behind. For creators, this allows for a flourishing of ideas at increasingly economical prices- indeed, on YouTube and similar video-hosting sites, there is a resurgence of indie animation. Outside of a few proof-of-concept pieces shown off by AI laboratories, there are no entirely AI-generated cartoons. Rather, labor-saving methods allow for algorithmic streamlining and automation of the processes behind animation. Unlike earlier methods that oft appeared cheap, this “intelligent” automation of animation appears natural and indistinguishable from the real thing. Algorithms are not stitching two images together with keyframe editing but rather filling in the blanks.

While this saves money for the creators and results in much more consistent products, it renders ever-larger chunks of the workers irrelevant to the process. In the case of Japanese anime- where the majority of workers are already overworked and underpaid- the rise of synthetic media products signals a major wave of obsolescence and unemployment with little restitution. It will be no better overseas, where production costs for animated works are typically higher per episode.

Yet this epoch of pain also results in extreme doubt and frustration, as these tools are not yet mature. Animation automation is a neat development, but it can’t be used to create entire television-ready cartoons and, as with many developments in the history of AI & automation, there are sentiments that the technology has gone as far as it will go for years to come and that newcomers and green-handed types are safe in the assumption human-done cartooning is not going anywhere in the next 20 years.

As the 2020s are an era of computational stagnation wrought by the breakdown of Moore’s Law with stopgaps to keep progress moving at some slower pace (e.g. 3D circuitry), media synthesis will not develop as rapidly as some fear, though it certainly will not prove to be a fleeting trend as others hope.

This is a short story generated by GPT-2 Medium, courtesy of Talk To Transformer. The model size was 345M parameters, smaller than the current release (13 October, 2019) as well as the full-sized 1.5B parameter version

Source: https://talktotransformer.com/

GPT-X and similar language modeling transformer networks scatter the internet and comprise the root for modern chatbots. These text generators are capable of more than just storytelling or crafting fake news- with the right input, you may receive an image or even a short MIDI piece. But undoubtedly, these are used to show off the capabilities of modern-day AI.

Writers will use GPT-X to generate realistic and coherent passages of description and dialogue, and some of the more meticulous will find the time to stitch together multiple passages into larger, publishable short stories. Others, such those specializing in writing romance or video game fantasy potboilers on Amazon, can use intelligent auto-writing software to speed up the process and increase their output- most commonly via machine-generated plot outlines, setting & character descriptions, and extended prompts.

Writers who struggle with coming up with decent endings can get the computer to do it for them. Filling out everything in between the introduction and the denouement is an automated process. Even better for amateurs- state-of-the-art networks can even refine a story outline so that it follows the natural beats of storytelling. Mediocre and awful writers often do not understand how story beats work, resulting in works that seem to meander or lack basic structure. Machines can fix that. Building off tools like Scrivener and Dramatica Pro, these transformers have read millions of stories ranging from novels to drabbles to screenplays, and have gained a fundamental understanding of how stories are structured. The best ones can take a writer’s basic fleshed-out concept and find the many story beats it must follow, resulting in a stronger finished product altogether.

Blogs will use scraping to gather information to completely automate the process of publishing, editing, and promoting a certain new piece- fully-auto blogs pop up and gain very large numbers of followers. As they are machine written and can draw from any literary pool, they will be of the highest possible quality at all times despite lacking any need for payment. Even in the case of human-written blog posts, there will be massive stretches that are “professionally” edited by these machines. What’s more, these intelligent editing programs are fed the works of great fiction and non-fiction writers as well as heavily-marked documents comparing what works and what doesn’t- they’ll be able to turn even the most juvenile and beigely-written post into something more akin to Ernest Hemingway or Roger Ebert.

Spinning articles is easier than ever. In the past, article spinning, auto-summarizing, and lazy anti-plagiarism efforts amounted to little more than using a thesaurus program, and this had the effect of making spun pieces unreadable by accident.

Intelligent automation is not the same, as must be deeply stressed- if ANNs possess some level of commonsense reasoning and natural language understanding, then they won’t spin “The spirit is willing, but the flesh is weak” into “The whisky is agreeable, but the meat has gone bad.” Students find themselves able to copy from Wikipedia on an assigned subject, input the text into a neural summarizer, and receive a completely plagiarism-free rewrite.

These People Do Not Exist: They Are Generated by AI

This person does not exist. He was generated entirely by a GAN.

Source: https://thispersondoesnotexist.com/

Is your best buddy a bot? For many people, that answer is “I don’t know.” Utilizing image and text generation, many media aggregation & social media sites (e.g. Reddit, Twitter, Facebook, LinkedIn, etc.) are awash with a wave of “artificial humans” virtually indistinguishable from the real thing. Their profile pictures are that of people indistinguishable from you or me, and they engage in very coherent conversations. They have inconspicuous usernames and behave rationally. Yet there are no humans behind any one of these accounts. It’s a silent mass passing of the Turing Test.

On dating sites, folk fall in love with those who have never physically existed.

On forums, internet friendships are forged between man and machine.

You can interact with many users without immediately knowing who or what they really are.

And once this becomes known, many may decide they do not care. If they felt it was real, even if the other party was a bundle of math equations, then that’s all that matters. But for many others, there’s a visceral sense of betrayal and distrust. Any random account may be artificial, while only the most spirited and obvious can escape doubt. Yet even this may not last forever, for media synthesis is still advancing as a field- in five years, some surmise, even if an account posts a video of themselves talking into a camera, you will doubt it is what it is. What you’re seeing and what you’re hearing may never be what is actually happening.

Bots are used for more than just toying with friendships and passing the Turing Test. As far back as 2019, foreign agents used AI-generated tools as a means of espionage. Moral guardians & political experts will bring up fears of advanced bots grooming the youth and elderly towards certain paths, of phishing bots run by the malevolent to gain your trust- and then your data. Such neural-enhanced bots are not obvious in their actions.

Memes and trends are created, promoted, and killed by these neural-enhanced bots. If someone with enough computational resources wants to make a certain dead musical genre, fashion trend, or dangerous fad a thing, they need only flood social media with these ultra-realistic bots commenting upon this and wait until news organizations and popular blogs (some of the latter of which may also be bot-run) run with the story and gets the ball rolling. Likewise, if someone else hates a trend that they feel has gone on too long, or a particular movie, or a particular band, or anything in culture in general, they may employ these bots to influence the wider collective consciousness. For all our hyper-individualistic media, humans still wish to fit in. Give us the opportunity to be whoever we want to be, and we will almost always choose to be like everyone else. Advanced bots are perfect for exploiting this mass psychological quirk. Many claim they already have for the the 2016 and 2020 US presidential elections (among most others in the Western world), and yet these were utilizing bots of an older sort- those which lacked natural language understanding and were better suited for spam. That is, those bots with which most people are familiar.

See also: Better Language Models and Their Implications

The public announcement of GPT-2. This transformer wasn’t a technological breakthrough itself — rather, it was an already-existing network made so incredibly powerful that it proved too dangerous to release at the time.

Bizarro Civilization & Ultracultural Memes

The world of 2024 sees human culture descend- or perhaps transcend- into a state of schizophrenia, having lost all sense of reality and fiction. The trivial bits of data created by ourselves are preserved and utilized by ultracultural agents that exist solely to feed upon that data. With it, they grow smarter and more capable. With it, they create more of such data. With it, they give humanity ever-greater power to fashion our own minute worlds.

2024 is much too soon to see the full results of this development. Yet it is there. People who get high off information and entertainment will choose to generate & synthesize only that which amuses them, and they will begin the process of cocooning themselves into their own alternative realities entirely detached from the rest of civilization. Nothing that actually exists is sacred. Everything within the digital realm exists solely for the individual to use as they please. Not only will people accept being manipulated by bots & manipulating media to serve their own desires, but they will entirely embrace it. Cultural ideas have entered a bizarro state of evolution- simultaneously, the spread of concepts & language has sped up and slowed down tremendously. Within homes, there will be families whose individual members live in separate realities. The cracks are subtle at first- merely a difference between those who prefer the synthesized vs. those who hold fast to what can be held, or perhaps those who follow these bot-created trends vs. those who create their own little trends that exist only to themselves and whatever personal bots with which they pretend are human.

There is no Matrix- yet. The real and hand-made still persist and thrive. And to those who’ve never cared for the internet or digital technology, they may wonder in bemusement what those sitting upon the bleeding edge even talk about when they claim that artificial intelligence is causing an ultracultural breakdown. 2024-era AI is still fairly narrow, despite certain generalized networks. We are still far from true general AI. And there are still models, animators, writers, bloggers, and believers of a common truth. To them, this all seems overblown and that’s all it’ll ever be until long after they’re gone.

And perhaps those advanced bots will be sure to create such a technoskeptical meme to foster these doubts.

Snap Back to Reality

The field of synthetic media is so inconceivably wide and filled with potential, with surprises and joys, dangers and threats, that I can’t do it justice with a single blog post- indeed, due to this post’s increasing length, I could not touch upon the most famous branch of synthetic media: deepfakes. Nor could I go into detail on the minutia of the effects of media synthesis on various industries, such as that of the animation & comic book industry, or how auto-blogs might foster the development of fully-automatic propaganda sites that feed into each other, or how the democratization of skill may invoke a great flourishing of the arts akin to the rise of behavioral modernity. The sudden twist into fashioning 2024 as a Metal Gear Solid 2- esque memetic dystopia also was not intentional, nor is it particularly grounded. Indeed, the reality of life in 2024 will be a juxtaposition of continued mundanity, failed AI training runs being discarded to show off the few successful ones, and on-going developments in and public demos of synthetic media. If you took from this post that I believe all creativity will be automated by 2024, I apologize. If anything, the 2020s will be a fantastic peak of human creativity enabled by combining human and machine imagination- soured only by controversies over eroding social trust, the use of digital manipulation tools for social control or sexually explicit purposes, and the looming threat of widespread automation.

I will return to discussing the future of individual branches in later posts.

Don’t accept what I’ve said at face value: question everything I’ve written, expand upon them to find uses and abuses I did not think of, and run with them in your own way to find a future different than the one I’ve envisioned. Better yet, talk to someone in the field of machine learning to gain a better understanding of the present-day capabilities of generative adversarial networks and the many other kinds of artificial neural networks.

After all, the most likely reason why OpenAI did not release the full version of GPT-2 was to give media platforms the time needed to figure out if they’re ready for what’s coming. This is good advice for us all.