I suspect that the issue here is that though simple SGD & Adafactor worked fine when running on a single GPU, in the scaled-up asynchronous swarm setting, they have especially poor gradient estimates and make slow progress; the loss spike comes from the optimizer state being reset on startup, causing early gradients to be poorly estimated & destabilizing training, requiring thousands of iterations to gradually recover. Adam then improves over Adafactor by estimating true variance/momentum, overcoming gradient noise to make faster progress. If so, the spike issue could be fixed several ways:

One issue worth noting was the problem of regularly restarting the swarm due to preemption & TPUs expiring, which caused large loss spikes on startup that would waste hours of training as it recovered; as a compromise between simple SGD and full Adam, we were using Adafactor (as most Transformer projects do, like Connor or Gokaslan’s GPT-2 replications), and we speculate that the loss spike is related to losing optimizer state and bad initial variance estimates. Simple SGD avoided the loss spike, but at the cost of making no discernible progress regardless of LR; Adafactor made slow progress, but wasted a substantial fraction of available training time; we tried to avoid Adam because the memory overhead of tracking momentum for all variables (as opposed to Adafactor’s simplified approximation of momentum) would reduce minibatch size, but when we tried Adam on the full swarm, despite the initial loss spike, it made much more rapid progress than Adafactor did.

While sampling, we noticed double and single quotes were being replaced by mojibake gibberish. This appeared to be due to Unicode curly quotes ( ""' ) in the original text dataset. The GPT-1 paper mentions using the ftfy Python library to clean up mojibake & Unicode in their crawl data, and ftfy converts Unicode quotes to the ASCII straight quotes, so presumably GPT-2 does as well and it (or its BPE encoding) is confused by their presence, causing the mojibake output. It was late in training, but we updated the PG+PF dataset to replace the quotes ( ftfy.fix_text('foo') etc).

By 2019-12-11, after applying for additional credits because we were coming up on the TFRC deadline on the 14 th , we’d gotten the loss down to ~2.15. After switching to Adam and scaling the swarm further to ~95 TPUs on the 13 th , we reached a loss of 1.61, matching or beating the GPT-2-117M record on the combined dataset. A further 5 days (interrupted by swarm preemption and occasional tweaks/experiments) brought the loss down to <0.6 on 2019-12-18. I had expected flagrant plagiarism/overfitting well before a loss of 0.6, perhaps ~1.2, but regularly inspecting unconditional samples and searching initial lines or generated titles/authors, I found little & they didn’t read like plagiarism, so we kept training to see how far it could go. (Prompting with lines from famous poems would’ve almost surely elicited plagiarism, but I am less concerned with that, since GPT-2-1.5b is so big it can easily memorize famous poems without compromising its general poetry abilities.) I suspect that GPT-2-1.5b is not really >3× better than GPT-2-117M, and that GPT-2-117M could have been trained to <1.6 loss if we had used similar amounts of compute, so the actual benefit from scaling up GPT-2 is smaller—but why bother with training GPT-2-117M to a better convergence when we can use GPT-2-1.5b?

Presser began the long and painful process of debugging the swarm and all its problems… The halts were never quite fixed but we kept scaling.

A swarm was too expensive for me, so we applied for TensorFlow Research Cloud credits . I wasn’t expecting anything to come of it, but the form was easily to fill out in a minute (it’s not much more than an email address), and to my surprise, within 3 hours we had been approved for 1 month of credits, covering several on-demand TPUv2–3s, and 100 preemptible TPUv2s (but no TPU pods).

Presser worked around the bandwidth with an asynchronous approach somewhat like the old Hogwild training method: instead of every node copying its entire model at a fixed timestep and waiting for all the other nodes, the nodes are constantly communicating a fraction of their latest model with the master and receiving an updated fraction back, regardless of how many iterations other nodes have run. So every node is running a hybrid & partially-out-of-date model and sending stale gradients out, but gradient descent is robust enough that this will still work and will scale up easily. After enough slices have been sent, a node will have sent an equivalent of a full model, and caught up partially, and the ‘swarm’ will hopefully be able to make progress by training on a large amount of hardware and be faster than using just a few TPUs synchronously.

The main limit for distributed TPU training is the network bandwidth: copying around the latest version of the multi-gigabyte model to and from the central VM uses up all the bandwidth available. In the simple synchronous case, which most closely approximates training on a single GPU, if the entire cluster has to stop and wait for every node to copy its updates to the master, the master do a single batch update, and the master sync back out to each node, the cluster will spend most of its time just waiting on the network to copy everything. (And what happens when one or more TPUs inevitably freeze?)

Presser decided to press on and after further optimizing work to ensure we used the full TPU RAM and more of the cores with a minibatch n=4, began experimenting with support for multiple TPUs. Since each TPU is a separate computer inside Google’s network, and not ‘attached’ to a VM like a GPU, there was in theory little limit to how many TPUs our 1 VM could orchestrate. We could, in theory, create an equivalent of the expensive TPU ‘pods’ by simply connecting to a bunch of TPUs at once.

After setting up on GCP and figuring out the details like needing to set an environment variable with the target TPU name, we discovered… the TPUs kept freezing anyway. This was on top of the standard preempting of TPUs, since we were using preemptibles to save money as is standard in cloud deep learning. This remained a mystery. Was there some undocumented heartbeat that was required? Was Presser’s TF code, which avoided the standard TPUEstimator approach which it seems everyone else uses, triggering some sort of problem? (We were warned in vague terms that TPUs do not like loops or reshaping operations.) Even more irritatingly, our on-demand TPUs turned out to preempt anyway! But at least the checkpoints were fast, so now the watchdogs worked better. But on the gripping hand, the TPU was not fast and was performing far below what we thought it should based on its nominal specs, and we seemed to be using barely a third of the cores.

A preemptible TPUv2 costs $1.35/hour, which is not too bad… A week of training would cost >$226 & I’d never used GCP before, but I was too curious what a fully-trained GPT-2-1.5b would generate. The net cost for November 2019, due to all the experiments and costs not covered by the TFRC research credits, was $321.47, primarily for high-RAM instances, and then network egress bandwidth fees—for cross-zone traffic with the TPUs, apparently. Optimizing for GPT-2-1.5b-poetry, we got December 2019’s cost down to $199.78, and trained GPT-2-1.5b-poetry and an IRC logs model. We spent in January 2020 an additional $408.22 on a number of projects: the chess , Subreddit Simulator , Archive Of Our Own, and video game walkthrough GPT-2-1.5 models; the 30k context window GPT-2-117M ABC/MIDI model ; ImageNet resnet benchmarking; and StyleGAN 2 prototyping for training on Danbooru2019 .

This was not going to work for weeks of training. Presser again modified the codebase & notebook to add ‘watchdog’ processes which would watch for an apparently hung TensorFlow process due to a TPU freeze, and kill it and restart. But the lost time was a serious issue: we couldn’t checkpoint too often because then we’d waste all our time checkpointing, but not checkpointing meant we’d lose minutes or hours of training. We couldn’t find any information about why TPUs would freeze and figured it was some sort of Colab issue, so I decided to bite the bullet and pay for a GCP VM & TPU.

Exacerbating the problem, TPUs on Colab appear to randomly ‘freeze’, an issue unrelated to Colab notebooks timing out after a day or so; manually interrupting the training process and restarting fixes it, but at the cost of any progress made since the last (slow) checkpoint & required constant babysitting; I calculated that one would have to checkpoint every hour to optimize the tradeoff between freezes & checkpoints! At one point I was dealing with TPU freezes every half hour. It was already giving decent poetry samples despite a loss >3, but we wanted to train to convergence, which ought to be <1.6 (the final combined-117M loss).

Presser tried out curriculum learning/progressive growing by setting the context window to a small window like k=50 BPE tokens, with the idea that it could be gradually annealed to the original k = 1024 over the course of training. (Because of how Transformers scale, k=50 uses far less memory & compute than k = 1024, so it fits much larger faster minibatches.) This seemed to be working to some degree, but it was no silver bullet.

Unfortunately, Colab notebooks are still limited in system RAM and disk space, so training GPT-2-1.5b then encountered the surprising problem of running out of RAM & crashing, running out of disk space, and saving to disk being extremely slow due to slow TensorFlow serialization of the model checkpoint. (The TPU-based serialization code would have been far faster using the standard TF way, but it would also required the user to create & manage a Google Cloud bucket; we were still hoping to create an easy works-out-of-the-box Colab notebook to let anyone do GPT-2-1.5b-finetuning. If there was a faster way to do it, Presser didn’t know about it.) This was partially solved by saving few checkpoints, figuring out how to attach a Google Drive folder (after paying $2/month for an upgrade to ~100GB of additional space, since the default 15GB Google Drive is perilously small), and further work on optimizing the serializing. Training was slow—1 minute per minibatch, initially—but did work. An example:

How to get a TPU? Fortunately, Google Colab did just enable free TPUs by default… So Presser enhanced his fork to support TPUs, and we started training.

The only solution here seemed to be to abandon my 1080tis and upgrade to TPUs. TPUs may not be any faster, but they have far more RAM and can train a GPT-2-1.5b with no problem.

Using Shawn Presser’s fork of nshepperd’s fork , we experimented with alternatives like using reduced-precision, truncating parameters to FP16. This caused serious errors. After fixing those errors, and reducing the context window by half (potentially hamstringing it), we could train GPT-2-1.5b on a 1080ti, but our naive conversion to FP16 appears to have seriously damaged the model and it emitted only garbage. We then tried using a different floating-point format, bfloat16 , which in theory is much better suited to NN models than FP16 & natively supported on TPUs, but it trained extremely slowly on my Nvidia 1080ti GPU. Given the daunting expected training time, bfloat16 was not a solution.

We began trying to train on the combined PG+PF corpus, like the GPT-2-117M model I trained for RL preference learning , but turning on all the options in the nshepperd repo doesn’t fix the memory problems. (FeepingCreature was able to train on his new AMD GPU, which has 16GB RAM, so a few more gigabytes would’ve done the trick..)

Training is a different story. 345M took ~7 days to train, and GPT-2-1.5b is 4.4× larger, so that alone implies a training time of a month. Worse, where GPT-2-345M fits in reasonably in a 1080ti’s 11GB VRAM & 745M just barely fits, GPT-2-1.5b does not fit at all.

We were able to train it to ~1 loss, but it appeared to have overfit in some fashion as sampled qualities became increasingly worse by the time we halted ~2019-12-20, so we settled for iteration #500,522. Download:

From November–December 2019, Shawn Presser & I worked on finetuning training GPT-2-1.5b on the combined PG+PF poetry dataset from above. The 1080ti GPU approach failed, so we switched to Google Colab to use the free TPUs. Colab worked, but constant failures made it painful to contemplate multi-week training runs, and so we switched to GCP to use TPUs directly. Direct TPU use is much faster, but the errors remained, so we began working on a distributed TPU approach, to work around individual TPU errors. Eventually, using Google TFRC research credits to pay for TPUS, we began running TPU ‘swarms’ of <60 TPUs (since scaled to <200). These produced meaningful training progress and we reached a loss of ~2 by 2019-12-12.

In keeping with its gradual rollout plan, observing no particular misuse in the wild (aside from a few anecdotes about content mills), OpenAI released the final and largest model, GPT-2-1.5b, in November 2019 along with detection tools ( paper ) The model was an easy upgrade for services like Talk To Transformer which simply sample from the original model, since it still fits easily onto commodity GPUs.

“Who alive can say,

‘Thou art no Poet—may’st not tell thy dreams?’” John Keats, The Fall of Hyperion: A Dream I

“More than iron, more than lead, more than gold I need electricity.

I need it more than I need lamb or pork or lettuce or cucumber.

I need it for my dreams.” The Policeman’s Beard is Half-Constructed, RACTER & William Chamberlain 1983

Loss: 2.6 Partway through, having reached a loss of ~2.6 (down ~0.5 from the Colab model), we experimented with training our model on a P100 GPU, halving the context window to make it fit, to informally compare its training speed with the swarm. The P100 made little training progress, but it did generate some fun poetry samples (we had disabled the training sample generation for the swarm because generating samples is so slow). The samples strike me as good, perhaps even better than GPT-2-117M, despite the loss being much worse (2.6 rather than 1.6). Why might that be? I hypothesize it reflects a weakness of the likelihood loss in terms of perceptual quality: humans are more sensitive to long-range correlations and text degenerating into gibberish than we are to local details like exact use of particles or to slightly better modeling of spelling (which is why stylometrics works). The original OA GPT-2-1.5b achieves much better modeling of long-range correlations and producing coherent text than the GPT-2-117M did, of course. What happens when they are both trained on a poetry dataset? It is the tale of the tortoise & the hare, or the bias–variance tradeoff: the GPT-2-117M is weak, bad at long-range modeling because of its small parameter count & shallow layers, but the benefit is that it can learn quickly about local details like spelling, and, achieving good prediction there, converge to that 1.6 loss; GPT-2-1.5b starts off good at long-range modeling and good at short-range modeling, and must tradeoff learning both from its limited training, thereby achieving mediocre performance on local correlations and thus mediocre loss, even though humans reading it are impressed by the thematic consistency and relative lack of ‘gibberish’ (locally but not globally consistent text). An additional issue here is that the GPT-2 models are not fully trained: as the GPT-2 paper notes, “All models still underfit WebText and held-out perplexity has as of yet improved given more training time.” (The difficult of training such powerful LMs to convergence was also noted by the MegatronLM researchers, whose MegatronLM-8.3b model was still learning rapidly when they ended the training—despite use of NVIDIA’s DGX SuperPOD with 512 GPUs.) So some of the finetuning here may also be finishing the GPT-2 training. I selected the following from its training log. A commentary on Oda Nobunaga’s mysterious betrayal by a loyal general? "Oda on the Pavement", by Mark Bibbins [Activities, Jobs & Working, Social Commentaries] The first time I saw this sign someone's blood streaming down the pavement, I didn't know what to do. I walked as quickly as I could toward the body and almost didn't see the second because it was going by so quickly I had to blink and it was gone. But this is what I saw when I got there, a corpse laid out flat on its back, its legs bent at the knee, with a blue tattered blanket crammed in the small of its back. I said to my wife sitting next to me, don't you think this is awful? She said, Mark, it looks like just a trick he's done. I said, it looks awful real to me. The man has his back turned to us, like he is not even thinking. As I am, I said to my wife, is this what you have been waiting for? Is this what you have been waiting for? The man was lying there not even looking up at us. In the Golden Age, when the people of the Yellow Valley were instructed by the sages of antiquity: ...That the earth would then not open (O, but to hear our fathers speak!), The angel of the waters spake. For in that time the race of man Sought the groves, each his own dwelling, Where were then fruits to satisfy, Where were trees in plenty springing To furnish them with fuel, And so the forest grew, And o'er it all was running The river that we see. Then did those ancient sages Order their days as was fitting; And whatsoe'er they looked On was ordered rightly; all Taught each what was fit for living And working for self; And now we have come to the end, But this is what their teaching Said for our instruction,-- That we know and do not know,-- And yet we know and do not know! When we have found that which is right The will to act and not to act To be the chief and not the end, The day and not the hour, The joy and not the pain, When shall we find this, then at last? Why then at last at last? There can be no yesterday If such be the age of wisdom; There can be no time or place If such be life and work for man; There can be no man or man's child In which he has no friend or friend's child; Nor yet a man or man's son, Who to his work does not go. Thus the sage's order is And what the sage does--so he does. Romance? ...I know my father taught me, To count your blessings and forgive The people who are wrong; But I think, when my day is done And life is all in strife, Perhaps the one good man who isn't yet Will find the way to see Why men die, the reason lies In the people, never the least in them. To-morrow it may have been a woman's hand Caught his in the doorway, And I may laugh to-night at a stranger's sigh, Just as I did yesterday. But what if he and I are dead? What matters it? For God's sake, listen to me! If God finds out That I am not as I should be, Then I shall surely pay the price-- That is if He will be just. I think my father loved me, And was glad I was his son. I say to-night I see it every day. I say to-night I know it all. And why should I fear What my son thinks, what he does? I should be mad to doubt it, For aught my father says. I shall find out when I grow old, And say to my soul, "Be glad That things that are may be, For if things were worse, why, we'd have things worse." And, knowing what my soul knows, I shall be glad for the things I shall miss-- For God's sake, I shall be glad! A man's true worth is revealed to you When, by the light of your own soul's light, And God's, you look at his; You can feel them both shine through you Like a glory you could shatter Just as a crown could shatter you Into thousand pieces that were bright And perfect, and eternal. And so no other can you see and know Except his soul. As the rose of him Sends back in glory back its hue, So does the soul of one reveal The beauty of another. Be mine the glory to know it best When my soul shall look at its own. O great and blessed Lord, Who made the earth, the heaven, and the sea. I who am standing in these holy places, Be grateful A nice descriptive piece: ...The sun sets and rises in its glory And sets in its glory only. And the stars are all that can be seen up there, That set and rise and come to the earth in their glory. And the moon, too, sets and rises only, And comes to the earth in its glory. The sun sets and rises in its glory, And when it sets, the night is black, And when it sets, the day is dark. The sky is all up against us and over us, With its starry light, And all the earth beneath us and above us, Under the shining sky, Is black with all the stars. An elegy: ..."To the Youth", by Susan Greenfield [Death, Growing Old, Sorrow & Grieving, Relationships, Family & Ancestors] For the youth, who, long ago, Came up the long and winding way Beneath my father's roof, in sorrow--- Sorrow that I would not bless With his very tears. Oh, My son the sorrowing, Sorrow's child. God keep thy head, Where it is dim with age, Gentle in her death! And though the years should bear him low Into the world's dark embrace, He shall feel more fondly there Than he did in my bosom then. Even now that heart, so wild, Is as a child's, sweet, and calm; And shall His voice forbid me From the shadow of the grave To forget, to forget Him? And shall He care to keep Him As I do him here in this place? Or has the power, that I have learned Him of the world to fade, Him of the world to go away? Haltest to the shadow of the grave, My son, my son! Love lost: ...Since that fatal day we parted, I have lost, I think, my best part. It was a friend of mine once gave me breath, And I went as far out as he dared to go, And ever since, poor lad, I've lived by drink. But if I had no more friends, I had friends Such as the world could never let me have. 'Tis true, I have never had occasion to think, But when the sun is high, and the sky is green, And the whole world is merry till the break of day, I sit and dream of her and of the joys of yesterday. And I go wandering off on some old-world morn Whose very first light of hope and youth Is the light of looking on her face through the green. Her eyes where they were, as I tell you, Seem fixed with a bright, fixed gaze; They're half my heaven, when they are there! They are as clear and bright as she herself, And she knows, too, the way that she will look When I can tell her what she is like And where all will end for me in her. Now, how would the world like her if it knew, And what would 'twere if each one did know; For, ah, she is a world of beauty! But then, ah, her eyes are even as clear As the light that makes them bright and wise. And they are a light out of some dream That has power to brighten and cheer; They're a bright sign to turn and turn again Even in the days of storm and pain; They're my world, like her bright suns of heaven, Which I have loved even to the death! An attempt at nonsense verse, apparently: ...I had a little hen, I had a little hen, It chirped at her dish. She chirped at her dish, "Come in, come in," The dish clattered out, The dish opened wide, There were three small cats, "What is the meaning of this, Why are all my dishes Spread open for you?" "Sit in the corner." "Here is a bowl for you!" She drank the bowl clear, She drank the bowl all up, And cried with an unhappy sound, "I do not want it! I am quite sufficient, I shall have enough, And just be done with it, And not have more!" But I have a different tale, Which perhaps you will think is odd. It is, 'tis no use repeating. But here, it may be the same. I had a little hen, I had a little hen, She had a little curl, That with her beak she curled, In her plumage she had, Such as a dog's must have. Thus I had three little dogs. I fed them with bread. And they all looked out upon the street Where our little Polly went by. The wreck of a ship: ...And as the sun sways to and fro With his eternal sway of air They roll in slumberful rest. So still and calm are they, So still and clear, No whisper of the wind dares steal The tranquil atmosphere of night, And like bright angels sleep. The sea is their only theme, The ship is a dream That they arise to leave at even. The moon is far, All glistening with a silver shine. It is her pale that lights The livid face of the sea That in its white and sullen face Is pale with pain. O grey and grey it grows, And the sea grows wild As the black clouds in a night-time ride, The storm is on the sea, The wild winds rush in their strong wrath-- The great waves roar and beat The heart of the ship with their fury-- In its heart's core, cold and dead At one bound, is bound Like a rock in the heart of a lake. The storm is loud And wild it is blowing-- The storm is far away-- The wild winds, wilder still, Are beating far and wide. The wild sea winds that were blowing Have stopped forever there alone Where the ship has gone down, And now the moon rides low Where the sea now rests in calm; On the deep, dark midnight The ship is lost. GPT-2-1.5b can apparently do meta-fiction and break the fourth wall‽ ...He had the look that I can't forget; It was as if at once I knew What I'd never had before; It was like a rush of light on water-- And something about it seemed queer. And the light would come and go As if the clouds were about to fall, And the rain came and came in showers And went in ditto fashion; And it's just as if I were blindfold-- I couldn't hear a thing. I'm sure it isn't my nerves; It's just as if I had the palsy. I don't want to think of not being there For something I can't be sure I was. I'm sure my nerves are fine in my life; And the only thing bothering me Is that there isn't any rain. It's all quite natural as I said; I've seen men die of natural causes. Now it's only a pity it happened As it is done to me. But, as I say, There's no harm in dying of dropsy. Though not the death I'd choose to die in-- It's much better than to get laid flat. This is not the way for my poem To end, I know--I'll make it up to you In a later place; but here's a verse As you may understand. After this line There's a space, and after that a line About two inches, in which I meant To tell you that you had done well to try To stay alive. But, as I said, this is Just a matter of chance, it cannot Be helped; I must allow that this verse Was written with the fingers you saw--and then You were all to blame, if I died before You came in and saw the writing. Now you know I'm dying--go on--you will not die Before I finish off my work. It was a little moon that watched In this place, I don't know why I chose To watch it, it had its night when I woke; And I, who was tired of the earth and men, Watched it like a man watches a clock at noon, Just because I chose to and did as it said... Another shipwreck: ...So the sun sank and the sky grew dark-- There was silence in the world. The man was a-waiting. He had come To watch the sea and the moon that rose. The sky grew dark. He saw the storm-clouds pour-- The skies grew dark, with dark, dusky glooms-- Down upon the world like a flood of spears, That struck the men from their feet in their flight. A wind sprang and lifted the wind-signs higher In the west wind's battle-hurling fury. The sky grew dark. He saw the wind-swept leaves Fall from the trees into the sea, and sink Darkly down beneath the seething waters dark Like a storm's descent. He heard the falling rain Come thundering down upon the earth's dappled hills Like an avalanche of boulders from the mountains. He saw the wild white-winding ships that fled Into the storm--hurling great leaden shadows Into the maelstrom and away Into the night that was growing black With leaden shadows that swept the ocean Like a sea-monster that had lost her breath. He saw the wild white-winding ships that fled Into the wind's windless vengeance. He heard Hurried voices--"She is gone!" Hurried voices--"They are off!" A sudden flash of flame that split away-- The great white ships were no more; They were caught in the wind's wild wings; the wind Was stilled in the windless hush of a rest Tangled like one green tangled mass of peace. He heard And knew they were dead. He would not hear The winds that told a tale of sea and wind. The wind had blown him across the world wide, And into the heart of the night wind's wrath He saw the faces of all dead men Hid in the hush of wind and sea, And knew the faces and watched him well. So the man who was weary with a quest, Who turned from the world's endless sorrow To seek the light that was lost as the sea's Flood in a wind-silent hush. A surprisingly coherent piece on a trapped upper-class wife: ...There is not a single house in all These beautiful gardens that I do not know. I know the houses and gardens where I sit In the evening with my husband and my son, And I sit at the dinner table there too. The house where my husband and my son live is the one furthest away. The people come and go through these gardens, all day long; And I see their feet pass along the paths, And I hear the talk they have all that day, from one end of the town to the other; I see the carts and waggons of the farmers, The teams and horses of the tradesman, men on foot, and the gaiters swinging Upon their saddles by the way side; And every day, at morning, the same number of carts and waggons I see, And every morning, in the great daytime as soon as I wake, I see their number still greater, still greater. Then to one side they go Among the flower beds and in the wood, And I never see them more; And their voices float on the rising wind Like the voices of the dead, And their faces light upon their breasts, like lights, Like the faces of bright children, Like the faces of handsome men in the street, And the faces of friends, and the faces of lovers, And of all strangers, all faces of home-brethren With its memories and its griefs, And my own face that is always changing, Wrought by itself in the dark, With the face of the dead and the face of the living, And the face of a youth that grows old In the shadow of love and the light, In the shadow of the grave. I know not what they mean by their words, Those people that pass me in the garden, In the little town that is in the garden; I only know that, on many afternoons, Through a gap in the trees and between the stones I see their faces and hear their voices. The curse of immortality: ...And my God, as is right, Lets no earthly voice Thy gracious soul acclaim. There's an old man in a desert land, No town nor city to be seen, The skies are o'er him and the hills His only companions in the air: The only sound is his own dashing feet, The earth beneath him, unseen, is brown: His only food has he no need to eat, His only company is the birds above, The sands are white beneath his restless feet: His dwelling is an osier cage, A circle of dry sand and bark, And there the old man builds his dwelling-place, The only house-holder is the wind. But I have seen the ancient sands bare, I have seen the grey old desert sun Go down upon a world of old And wither and perish in the noon; I also have seen the grey old sun Return with life and day in its arms, And night come too and bring the light: But never before has any bird Looked on the old man living in his dreary house, His house that is builded by the wind; The birds that do not love to roost Have neither wings nor nests of them, And never any bird has come To visit the lonely old man lying down; The sun never sets in the old man's eyes, And therefore he has neither home nor friends. O, God, that I had not been born to live A thousand years, a million years, And see another sun rise and set, And other stars in the other sky, And other life and other death; But had my soul been like some child That, when it thinks upon its joy, And its grief, and its play, and its tears, And the happy years that it shall have, And the years that it shall forget, The thoughts it hath of these things, And the thoughts that are yet to be, Might be a child that had no thought, And have nor speech nor feeling, That it might feel it was a child. Perhaps the most striking of them all is this existential horror piece: ..."The World of the Dead", by Peter Stearns [The Body, Nature, Philosophy] When they come, they carry Your limbs, your life, In their mouth and arm. I think they swallow. I know it. The others know. My body will be like theirs, As the river, the sea, Will be like the one on which it runs, If the ocean rises And swallows the land. It will be hard to survive. To heal, Some of this will have to come off. That's what they say. They say it Many times a day. They say it To each other. They mean to save us. They just can't stop us From becoming what we are. I must live inside you. That's what they say. They say it.

Loss: 1.6 The expanded TPU swarm & Adam LR tuning allowed rapid training, and we reached 1.6 overnight, matching our previous best on the combined PG+PF poetry dataset. I generated a dump of samples with top-p=0.90, and read through ~5% to select some interesting samples: Samples selected from reading ~5% of that: All-sweet and tall, and frail of limb, Lay there in raiment new begun; Her moveless-rest were never seen, She ever so much as bent her knee. Thither came I, Pilot of thy boat, And, turning, saw this silent Girl Who, like to pray, with lifted face Besought the mist-ringed air to sing Her Vespers far away off, And by her hair and veiled head Her wistful eye she caused to stare. It seemed to twinkle between the branches high, And on her shoulder lean by piece and piece Of glimmering spangles that lightly floated down. I saw, or fancied I did, Her lovely head recline Upon her humble robe's narrow shaggy hood, That, like the light of day, Was moon-fringe dark and dim; Her pale mouth, that evermore Spread smiles in damp and drizzle; Her gleaming teeth, whose polished white Seemed mouldering honey of the midnight blep Of the dry, dusty pass! And in one hand, all rippled with A silken flute of gold, She played a hushing pipe, Dora's toy, to play or sing. Deep through the wintry sky there sped Through golden vapours as of shape A dawn that never had a dawn, A sudden dawn, with breath Of mist and with a smile to kill. 'Look!' the wind whispered, 'here's Our Lady of the Skies, from her bright throne, Like to the smiling of a summer sea To-night in the lost wind's dark retreat, Hailed with the deep, seething, dour, wild Midnight: who have wept for her The heaving of the waiting years, Who have wept for her In wild harangues of the foggy fen And hollow monotone of the fen. She shines and smiles to see the tears Of all the rain-stricken towns and ships And all the rainy days and nights On all the hard, the ragged places That wind had beaten hard, and night Nigh ready to close, to close, to close Against the brain of all the face Of all the over-ill-gotten men, She shines and laughs to feel the cold Of all the tears of all the brave men killed And mad as they. "A Knowledge of the Dead", by Mary Wiencki [Living, Death, Life Choices, The Mind, Time & Brevity, Religion, The Spiritual, Social Commentaries, Crime & Punishment, Popular Culture] I see you there, Stu, striding half a mile down the road, arms raised up over your head, head bent slightly. I imagine you hold both those in your inmost heart, and that you must learn, along with anything else, how to turn off a brain that has somehow learned to hold whatever memory is stored in it. For the mind, like any organ, is where the trouble is; an organ can fail with its stored knowledge, or if the memory be great, so great that it will bring the brain to its knees. And then the knee is a joint only partly conscious; if the heart should stop pumping, we are thrown off balance as if it had been only the legs that moved you. So I ask you, were you looking at your watch when you left for that solitary walk? Or waiting for the medicine you wanted to take with you before starting on your way? A look of mild impatience conveys a point as surely as humor, though somewhat dead. It is painful, this wait, I am sure. You have worked long and hard for your knowledge of time and of this place. And now you have it. And time, and all the woe it took to give that power. You have so much of this world left to discover, paths to retrace. You find your way into a park, its benches occupied and visible and free of talk of the dayâ€™s events, at its center a girl... "The Sphinx and the Social Commentaries", by W. D. Snodawa [Nature, Seas, Rivers, & Streams, Social Commentaries, History & Politics] We were rising over the hill of which the tip is the sphinx. There were palms in the palms. We were rising over the hill like the tip of a sphinx, circling the palm that was there growing straight like the spine of a sphinx and a crimson palm leaf grew over the palm as if flowering over a sphinx and my knot was a knot. It was night. It was tinder in our guts to see them like this. Climbing alone it was like the tip of a sphinx to see them like this, growing even higher than a sphinx, a knot, to see them not quite touching like the tip of a sphinx over a palm that was there to see over the night like tinder. Suddenly the knot caught in my throat, my hands stopped and spun. Thatâ€™s the way a Sphinx talks. The palm became a mask and that scared me. I had to have been looking for the mask underneath it. My knot was on my neck. My knot that was spinning like a rope. I was staring down at my own knot and what it was pulling at. I let go of the thought of knot and opened my eyes and saw the moon. I was standing in my own moon and I knew I wasnâ€™t going to see it so I let go of the knot and looked at it and it was disappearing. "April Moon", by E. E. Brown [Love, Break-ups & Unthankfulness, Religion, Buddhism, Faith & Doubt] Awakeâ€"with you I meditated and thus renewed my doubts; But, awakeâ€"with you I sin, and thus my conscience put me to bed. Awakeâ€"with you I suffer, and thus my doubt took wings. Awakeâ€"with you I play the hypocrite, And thus my conscience fires my lash, and thus I scorn you. Awakeâ€"with you I fly from faith, and thus through your face I stab myself. Awakeâ€"with you I remain benighted, and thus my conscience rots me at my heart. What man would draw a sword, If he'd had no forethought, That so he might prevent The danger; but with blade What e'er man can know? How many lives at least, Have been lost, and how much blood On all our limbs been shed! And yet--so Providence be credited-- There's an end still of life's dismay, And 't would be glad indeed to lie Why does every one such pass As this, without any which he do not pluck, But with arms for life's defence clad? Alike of you all the brave Rage of the lance, The guerdon of some crown, Whose shield was never pledged in fight. The watery lion's with us yet, By eunuch tightened, And springing on his prey, not fierce to yield Though thrice thy foe hath been in peril to see: Yet, though our quarrels past, Life may be fresh in them, This of fighting, and this of feeding. Beset with peril, beaten to the fence, And each to prate with prattling foppish grace: Their song, 'Huzzate!' fuddles young, You may hear their ends in Oxford-street, Or in their inn-bred domes When they climb like larks their wings again. But we, we live on' other plan; The Shepherd did but teach us, We, the delight of life and take delight. Then why not drink of wines, Give of bowls to move your bodies, And, with those things that men to beguile As they that do light love-songs Wear like a tree, so do these solacie Our sabbath-rites, And send them to heaven, Whose hand, Saved as ours, with charity Should treat as a child againe. That we do not work on earth for hire, Why we do doome as we list. If man did wight battle, God should not such things read, As he, of some sinful men To make him cheat, And carry, and gluttony to all. What to your Peers, or how you view Our Acts and us, let not your selves, But let the Stars, that watch the skies The Barns that shelter you, Which your great Cannibal went the way, Pull down, And let one Concord solace every State. Then look there yet, in that part Where you will see an abyss profound, Rays leaping out of darkness, Snatched with strangest beams at the visage hewed. When the minaret of the masque is lit And the caryatid gleams bright Of four stars that shudder and wane In chance-to-be to the light that is In the letter of the crown, Take her, the zodiac, for 'Tis her sign, 'tis the way she brings The order of the rhodesian seasons In careful letters for the rest of the year. The grass and the leaf which the royal teeth leave On graves where the glow-worms of the phoenix brood Are glittering in brass and the marble dies Like silver pearls doth snow upon the snow And the rime in cerulean coats doth shroud Till the shivers are lodged safe within the veins For the regions which grow lush with the tears of the sun. Wauken, and thou golden heavy lark bellissh Wauken, and golden longlist and firefly, Which here be singing with thy melody and sin In our early youth with the harp-strings of Joy, Who from deep winter of minds did lift These notes of fire and song of the dawn, Which may not be pulled by the nameless hands From the vibrating harp of the wind That only sounds to them alone how Toils or withers or gladness or woe befall, Who are northward by moon and by star-light. O Hesperides of the wakening day! Whence came the dawn, what did we find, In this lone land of the sunrise? O Hesper, in thy beauty and change, I would have thee hear and answer tell In this still country of the Sunrise. O Hesper, in thy beginnings, the light Of thy first bird-born darkness Was folded in a glow-worm's tent, Flush and fair; Thine air was soft, than garments more fair; Thine was the drift of a froth of down, Soft, and breathable, and alive; Thy voice was as a voice of the sea Calling in its froth to the wind-crowned moon From rocks where water-worms are wailing now, Ripe with dry but bloomless salt With the light-waves gilt Lemon-fish, mussel and willow o'er the rime. O Hesper, thy light of the past hours Is folded in thy glow-worm's home, And the voice of thy earliest darkness Is a voice of the water-worms now Calling their world afar, What time the pines of the cavern-deep Say to the pines 't is dawn in their realm. O Hesper, the sun and the rains Waken in this land of the Sunrise With a sigh; They are out in the wind and the weather That are down below, Whose lives are enclosed in the roaming Of a world of weathers and fluxes, Not dead, but lovely, and wan; And on the roots of life The tremulous hands of the gods are cold, And the springs of gold Where the earth-children run Are unapiece, As if in the ways of the wind They had passed them by. O Hesper, or if purple be The hues on which ye paint Your snowy epitaphs, say That the wind which blew the snow Was swayed by the face of a queen, And the sun to the laughing air Was moved by the eye of a queen, And the lightnings were wrought by the play Of a queen in a queen's look; And the earth And the sea and the air which are now A barren dust to the day To the eye of a queen's wonder Were filled with a beauty of love And a beauty of life To the children of that king; Till from her presence the maiden Sought the golden fountains of the day, But no nearer the child she found That made all her maiden-bower And each merry maiden-asteroom Intoxicated with her gaze With a glow of a glow of a queen. For as the flower till its spring, Like the flower till its nectar, may Grow lovelier till in no fire, So in the yellow waves of earth Than the child was born and could stand For the queen of each word, And her hands were like angels' hands, And her feet were the eyes of angels, "The Philosopherâ€™s Plane", by James Taggart [Activities, School & Learning] for John Millikoper The philosopherâ€™s plane, imagined by Calippus, rests on a red disk of dawn close to the body We flop into the blue below our feet, into the astral horizon, that whose dots our lives keep shifting over the edge of empty space into the orange of earth And beyond into blue well into the empty page of thought Where we can embrace a little while of our desired end and then flow back into the world of time "Map of Our Land", by Eavan Bolger [Living, Time & Brevity, Nature, Landscapes & Pastorals] The stars are born in night. The ground is made up Of tales untold. The cracks are our story. The piles of leaves are our life. The river that we lie At dusk is alive. The buried Grass beneath us Mosquito, Mosquito, Mosquito, Mosquito, Mud-stump, Mud-stump, Mud-stump, Mud-stump, Oromoctotecological teacher, henchman, loomworm, toad-man, German accent. Not what one would be expected to hear. Oromoctotecological teacher. Far superior animal to what one would be expected to be expecting. Far superior animal to what one would expect to be expecting. Not what one would be expected to hear. Oromoctotecological teacher, henchman, loomworm. Not what one would be expecting. Better than what one would be expecting. Better than what one would expect to be expecting, better than expected. Not what one would expect to hear. Oromoctotecological teacher, henchman, toad-man. Far superior animal to what one would be expecting. Far superior animal to what one would expect to be expecting, better than expected. Not what one would be expecting to hear. Oromoctotecological teacher, teacher, toad-man. Higher in intellect than what one would have expected. Higher in intellect than what one would expect to be expecting. Higher in intellect than what one would expect to be expecting. Better in intellect than what one would expect to be expecting. "Language is not the Draft", by Å½eviÃ±a M. Branko [Social Commentaries, Crime & Punishment, History & Politics, War & Conflict] No don't be angry Don't be angry, it's fine don't be angry No don't be angry It's fine it's fine don't be angry because every one of them died Let us go to sleep, then; And, being haunted by an angel's kiss, Lay them down to die. Oh! night, oh! sleep, with all thy gifts The dearer far! The noiseless candle, the beechen boddam's cot, The hapless lover that perfidious turns To watch her silent lover's sleep. Oh! night, in all thy solemn dark, This one sweet pleasure bring, The soundless silvanRAW, The fond immarities that steal Across the tepid moon, The wedded sleep, the tear-bound tear, Of those whom late they may forget. Oh! night! thou bringest a most rare bliss, Nay, like that noiseless moon at night, When yet from Pleasure's revelaid ball No soul the wish had : A bliss untaught, it neither robs nor rouses, A bliss untaught, it neither geas Nor charms the blissful gazer's sense. Asleep at the Moment's free summer-cost; When every sparkle wakeneth that To dream the future, and all nature To that clear fable's deep array; On the lone heart at midnight's hour, As night's last neaper looser, I think the world contains both ye com Which when ye think, o' nights waste full, As night cometh, night cometh; As night cometh night, so night is ever young. "Poetry is a Hoax", by Jane Kenyon [Arts & Sciences, Humor & Satire, Poetry & Poets] We are in the midst of the greatest creative era in our nation's history, but poets who deserve record invitations to appear at next month's Folio can't get paid, or even printed for that matter. Poetry books are selling at a discount to the fool's silver match. The poets need not rely on the marketplace for their bread, the wait is too long and the market is too crowded. The much needed restorations are held up by Kodak, the restorations are held up by the identical tissue known as persistence, the tissue is held up by believe in me, what I believe is more interesting, be more like me, my technical review indicates you cannot hold me, I am never alone, if you attempt to duplicate your ideas you will confuse the issue. The ideas will diffuse through the atmosphere in direct ratio to any gas. Each idea that is conceived and all but carried to fruition, will be accounted in the calculator as 1% of total, I did not hold you in such high regard. I apologize for being so alarming. And so began the siege of New Amsterdam, In which, by Providence, only three days ended; When, by direction of Ms Frisbie, the heroes two For their advance, together, took their way. The two fellows, whose mission it was to guard The city gate, took place in the greater army; While those two dukes who should avenge the town Sent all their force to put the place to rout. And, as the late oak, covered with boughs, Has done its work, ere its starving spike is struck, And this great tree sinks as it had never been By any human pains, nor would be now, But for her first son's interposing, So, falling foul of their first heart's delight, The Dutch no more wept for New Amsterdam. When a star fall down, the winter's coming With the snows returned upon the trees; When a boy runneth that has fled; When a lad standeth by a lash, When the father findeth the wealth, When the son dealeth away the long Hand shaken by Fate, When the boy standeth by a lash. When the father findeth the wealth And the son dealeth away the long When the lad standeth by a lash. When the father findeth the wealth, And the son standeth by a lash, 'Tis he taketh the old's gold in his hand, To drink and soothe himself with life. When the lad standeth by a lash, He to earth an instant goeth The father set him by the rope And so fearful works with the lad, As the boy standeth by a lash. When the father findeth the wealth, And the son so fearful works with the lad, To the end of time and limit set When a star falleth to the fen Where the fen be molten away, When the boy standeth by a lash. "Eating a Waterfall", by Francis Lau [Living, Nature, Seas, Rivers, & Streams, Mythology & Folklore] The map tells you this cave was where the water must have descended, for a hundred feet thick, from the floor of the cave. But the sides of the cave have been eaten away by moss, and a red grown over the green rock in the shallow pool; a leaf had set upon the edge of the slide, hanging horizontally, like a trigram, slowly falling and falling. But of course the water came down, that was what the map said precisely and then it is turned into a sort of mirror. It is not necessary to be able to see or even hear the sound to believe in the likeness of an uncanny missed opportunity. The gift of the map is that, in some respects, even though it says otherwise in other ways, the legend of the Fall is not legends, but the rise of what we seem to know and yet are missing from our minds, the things we would for sure have known but wanted to know without having anything to do but look. "Freedom of Consciousness", by Steve Rotherham [Living, Health & Illness, Arts & Sciences, Philosophy, Poetry & Poets] Going along with it That's the problem These objects do not aspire to be loved. --W.H. Auden The eyes under a blue wreath of smoke Racing around the pitiless fire While waiting for the breakdown From the volcanic past of the pitiless Lord God of the heavens. The pitiless fire Sweeps to its born stars above. A slit of fire that watches all. Dips into the fiery pit And smiles. The fingers of a sieve Desire the smoke in its fruit. Both cold and hot in one. It is. It is. It is. Beating the flames with smoke Of blazing admiration. The hands of a man in a shop Seem to grasp in vain A pack of matches. There is no end of the fire. No way out of the fire Though some wood and smoke Could stop it. Your fire will do. And you will go along Because the world's gone mad. The marvels are there For you to seize And stoop to. Out of the depths of the wood, A hollow roar of rushing air, Sudden howl of pixy and hag Whose tall shadows snagged them there By the gate to night. Holding the gates of the damned. A fiery slick of a kind of smoke Waiting and glowering to be. But how else to be. The life of the mad. An ever pushing out and in Of the upons. A glimpse of the future is the spur To the perspiring effort of life And to the unbuilt plan To organize the mad and the yet to be So that the time slides by And the unknowable marches on. Though the yet to be and the unforeseen May blind the senses, not us. Life is scar. It is scarred and hard. The mind, all its fire out. I am the bartender of Kuzela; There are others of like mind, Speaking the same language, Speaking to us in the language of smiles. What have we done together? What have we done together? We have not wasted one silver rupee On the Indians; We have not eaten one singed inch of wheat To make them happy; But let them laugh upon their drum And march with the other gods. You bring us bread for our turning, But can we make you merry? You have given us toys for our children, But can we mirthously dance? Then may your clock be telling time, And our neighbours be told lying. In His service as protector Of the wild natives of the forest, He has given us magic shows To make us happy. He will guide us at dawn to the brow Of the mountain, And the shortest path find to the evening, With the night before. And now the hostess of the festival Is holding, in her great soft hand, A heart-shaped pomelo. Why does she hold her hands so still? She will not let them loom for those that fall, But is playing to them a pleasant song That none of them needs hear. She is dressed in a man's colossal style; The hair hangs in soft waves to her knees, And her sumptuous shape takes the air As she plays upon her instrument. She is playing a tune To ravishment and silence; While the choric verses to the clouds Crawl to a close. Yet now she is turning with her hands To mix the wine, and the fiddlers Rush to the dance, with every chance, Of their heart's desire. They have danced until the eve, And she sings in her sings in the glow Of their heart's desire. We were with you in Eden, We were with Cain in the desert, We came from Cyprus and Sidon, We were with Seth in the peaks of Everest; With Job in the City square And Noah in the Wilderness.