As the Vegas festivities wrap up, I once again have an opportunity to reflect on the year’s biggest CTF and the culmination of my time as an undergraduate with the Plaid Parliament of Pwning. I’m looking forward to playing with them as an alumni, but now is a good time for me to share some of my thoughts with the rest of the community and to hear what everyone else is thinking.

With that in mind, let’s talk DEF CON!

It’s hard for me to believe, but this post marks the fourth DEF CON retrospective I’ve done. In the past I’ve mostly talked about the experience without going into much depth on the problems, but I got some feedback this year that people wanted more technical detail. As a result, this years writeup is a lot longer (and a little bit later), but covers every problem in varying degrees of technicality.

What is DEF CON

DEF CON CTF is the premier security competition. Occurring annually at the same time as the eponymous conference, the competition tasks qualifying teams with hacking into vulnerable services for the purposes of extracting hidden pieces of data called “flags”. For this reason, this competition — as well as others like it — are called capture the flags (CTFs).

What separates DEF CON from other similar competitions is primarily the importance placed on it by the community, and the work that goes into organizing it. Not only is it highly contested by many of the world’s best hackers, but it employs an attack/defense style of gameplay that pits competitors directly against each other in a frenetic and chaotic manner. As with most such competitions, not only are players allowed to attack each other’s services, they are also expected to protect their own by removing the bugs.

For the past two years, DEF CON has also introduced a third type of of gameplay called King of the Hill (KotH). This game style is somewhat unique to the current organizers, the Order of the Overflow, who introduced it when they took over the competition from its previous organizers, the Legitimate Business Syndicate. As the name suggests, King of the Hill differs from traditional attacking in that it challenges teams to outperform their opponents in some sort of hacking related competition. As a result, for any given DEF CON round, teams can do any of the following actions:

Steal a flag (once per service per opponent) Upload a patch (once per service) Attempt the King of the Hill

Consequently these are also the three ways that teams may score points.

The exact scoring scheme can be found on the OOO’s website, but the rough idea is that the total scores for each of the three categories are normalized within the category, and then multiplied by: 400 for Attack

for Attack 400 for Defense

for Defense 200 for King of the Hill. Thus, a team who performed best in every category would have a perfect score of 1000 .

This gameplay style is largely unchanged from last year, so for a deeper dive into how well it works, check out last year’s writeup.

I play with the Plaid Parliament of Pwning (PPP), a hacking team that originated at Carnegie Mellon University. With this year’s victory, we have competed in 9 seperate DEF CON CTFs, and won 5 of them.

Lying in Wait (and Bed)

Every year I look forward to DEF CON not only for the fun challenges and intense competition, but also because its one of the few times that our whole team comes together in person. This year had been a bit abnormal because last year’s competition had showed us that most of the preparation we were accustomed to doing was no longer helpful. Because the OOO is incredibly hesitant to release network traffic to teams, and we don’t know anything about the style and structure of programs ahead of time, we no longer have much to do in the weeks and months leading up to it.

Problem 0

As an engineer at heart, this was pretty disappointing to me, and so the Thursday before the competition started I found myself sitting in my hotel room aimlessly waiting for my teammates to fly in. I probably would have stayed like this for most of the day, but we were gifted a pre-DEF CON challenge in the form of our teammate’s laptop failing. Spectacularly. As it happens, it’s pretty hard to hack without a functional computer, and so all of us who were not in-flight dropped everything to figure out what happened.

I’ll save you the nitty-gritty, but the short version is that while upgrading his 8-year old Mac, Mojave decided to change his filesystem to APFS without updating the partition table. As a result, MacOS was trying to read an APFS drive as HFS+ and failing miserably. Fortunately, all of the data survived and by manually updating the partition table we were able to restore it to its former glory.

With our first catastrophe behind us, we arrived on the DEF CON floor at 9:00 am the next day. The CTF has historically been co-located with the main conference, but this year the powers that be opted to cast us out to an adjacent casino. Although it was not as hectic as in years past, it also meant that we had far less noise and crowds to contend with which actually made it significantly nicer.

The teams had an hour to prepare, after which our Overlord-in-Chief Zardus announced

“Attention hackers, we’re having trouble with the internet. But, we’re going to start the game anyway. Welcome to DEF CON…

I’m paraphrasing, but not by much. At some point, you just learn to accept it all.

Pew Pew 👉 👉

The first challenge released was also one of the most clever. A KotH problem, ROPShip challenged teams to automatically generate a ROP chain (from a 500mb segment of random data) that would control your space ship in a sci-fi battle royale. More specifically, each team spawned a ship on the map and every tick the ship could output one of

u Accelerate up d Accelerate down l Accelerate left r Accelerate right a Attack s Shield n Pass (no-op)

For each kill, you received 400 points, and if you died, you would lose 1000 . This continued for 1000 ticks each round. Although the problem is conceptually very straightforward (at least by DEF CON standards), it still took teams a while to get even basic strategies working. By the time teams finally started moving around the board and shooting, over an hour had passed and no one had any strategy other than firing rapidly.

Eventually teams progressed from firing straight ahead to firing in a circle to slightly more complex strategies that involved retreating to the edge of the battlefield and firing at enemies who would be at predictable locations. However, since everything had to be encoded via a rop-chain program, progress was very slow and for the majority of the game all strategies were static. By the time we were finally able to write a smarter strategy that included the state of the game when it made its decision, the problem was retired.

On the one hand, we were disappointed that the problem was removed just as we were finally able to do something interesting with it, however part of why we were disappointed was because it was such a fun challenge. It’s easy to dismiss ROPShip as a silly problem because its not grounded in any kind of realistic scenario, I thought it to be a really good example of when KotH goes well. It was fun, clever, and forced teams to think on their feet. On top of that, it was easy to visualize and show off to onlookers — a task that often seems impossible during CTFs.

A responsive strategy designed to beat players sticking to the edge An early round (37) Candidate Strategy: "Baby Shark"

Can You Hear Me Now?

At the same time as ROPShip was released, the first Attack/Defense challenge was opened. This one, called Telooogram was later boasted as the “world’s first iOS attack/defense problem.”

Is this true? I certainly have no evidence against it.

Regardless of its principality, Telooogram was a unique challenge wherein a custom messaging application was being virtualized with the help of Corellium. Sadly, having not spent any time on it I don’t know many of the details, but the general gist of the problem was that teams connected to each others’ client and set it messages as a peer. Additionally, each app had an open network socket that was presented as an audio service of some kind.

As conceptually interesting as the problem was, it suffered from several issues. The first is that it was resource intensive to run. At times, the challenge servers were difficult to reach making exploits less stable. Secondly, the problem itself suffered from bugs that were either too shallow or too deep. The first bug was fairly straightforward once found. By exploiting a directory traversal issue with the user avatars, teams could read the flag directly without any memory corruption.

Sadly, this was the only bug that any team was able to exploit (to the best of our understanding). The deeper bugs revolved around a memory corruption issue in the audio decoder for one, and a serialization bug in NSCoder a la a recent Project Zero report. However, the latter one appeared to be unsolvable because the vulnerable code was optimized out , and the former was sufficiently tricky that no teams were able to get it in time.

Specifically, we were given two versions of the code, emulatable and live running versions running on x86 and ARM64. While the bug was present on the x86 version, it was not on the ARM64 one.

In a surprising move designed to tease out any of the harder bugs, the OOO ended up patching the binary partway through so that the trivial exploit was unusable. However, in doing so they also patched the audio related bug, meaning that anyone who had previously been working on an exploit for it would be unable to use it.

While it did not end up affecting us in any way, this impromptu patching seems a little bit problematic because it means that the organizers get to dictate which approaches to the problem they want to permit moving forward. For instance, if another team was close to using the audio packet bug, and another team was close to using the NSCoder bug, this patch would grant the second team the opportunity to score points but have the first team’s work be for nothing.

Weird Flex but OK

While I spent some time working on ROPShip throughout this process, we found it not particularly parallelizable and so when — a couple of hours into the first day — a new attack/defense problem was released, I jumped on it. This problem, called aoool , was hinted to be a nginx clone of some variety. Secretly hoping that it would turn out to be a web-challenge, I downloaded, chmod +x ed it, and… watched as it died immediately. A quick strace showed that it was trying to read /aoool/etc/default which was noticably not provided. Furthermore, any dummy configs that we tried to give it were failing immediately as well.

Disappointed, although not entirely surprised, a few others and I began poking around at the binary itself. After looking through it for a couple of hours, it became evident that the lexer and parser were non-trivial to decode, and so in order to make any progress at all on the problem, we began considering other approaches.

The one that finally helped us to move forward was a rather sneaky technique wherein we “patched” our binary to have a secret backdoor. Under the assumption that there was a valid config on our server, we added in an endpoint to the http code that allowed us to download it. When we got it working, we were greeted with the following config:

nginx content_copy main { server { server_name "aoool.ng" ; location ".*\.png" { root "main/static/" ; } location ".*" { root "main/html/" ; } } }

Simply having a working configuration allowed us to move significantly faster, and by the early evening, we had made progress on several fronts.

There were several interesting things in this problem that we had to understand before we could solve it, but the general overview is this: embedded in the application was a lexer and parser, mentioned earlier. This lexer/parser pair was used not only for parsing the config file, but also in two other locations. The first of these was a custom HTTP method called UC that allowed one to update the remote configuration on the server. As before, the parser was used primarily for extracting the configuration format. The second way it was used was a little bit harder to understand, but it ultimately ended up being a small embedded scripting language that would actually JIT your code.

In order to be able to reach this codepath, we needed to understand the config file better. This in turn meant that we needed to have a better understanding of how the lexer and parser worked. For those unfamiliar with lexer and parser generators, these programs are often long and incredibly complex, representing a large graph of possible states that the parsers can be in. As such, it is common to use a lexer/parser generator. These scripts take in a standardized format and convert it into source code that can be compiled into your project.

In this case, we were able to quickly determine that the lexer it was using was Flex. Flex uses a very dense representation of its state graph that is almost unreadable in binary form. However, after looking at some example Flex projects, I was able to write a fairly straightforward BFS that would search the graph for an instance of each lexical class. The full script can be found here, but the relevant code is this:

python content_copy def search (end_class, start_class = 1 ) : visitableStates = set([v for v in yy_nxt + yy_def + yy_base if v > 0 ]) availableStates = [(start_class, [])] visitedConfigurations = [] while len(availableStates) > 0 : (state, path) = availableStates.pop( 0 ) if yy_accept[state] == end_class: return ps(path) for yy_cp in valid_chars + [ 0 ]: yy_c = yy_ec[yy_cp] next_state = state while yy_chk[yy_base[next_state] + yy_c] != next_state: if next_state == 0 : return None next_state = yy_def[next_state] if next_state >= yy_current_limit: yy_c = yy_meta[yy_c] next_state = yy_nxt[yy_base[next_state] + yy_c] if next_state in visitableStates: visitableStates.remove(next_state) availableStates.append((next_state, path + [yy_cp]))

This performs a search over the state graph. Although this just prints out a single entrant in the corresponding lex class, it turns out that’s generally sufficient to have a good idea of what the other elements might be. You can see this for yourself in the output.

text content_copy 0: '' 1: 'main' 2: 'root' 3: 'log' 4: 'mode' 5: 'text' 6: 'osl' 7: 'server_name' 8: 'server' 9: 'location' 10: 'print' 11: 'del' 12: '""' 13: 'A' 14: '(' 15: ')' 16: '{' 17: '}' 18: '=' 19: '+' 20: '-' 21: '*' 22: ';' 23: ',' 24: '\t' 25: '0' 26: '"' 28: '\x00'

After extracting the lexical classes, one would normally then need to figure out a similar algorithm for the parser. However, we ended up guessing that one could reach the jit by using mode osl .

Once we were able to start running code, it was not much longer before we had some functional exploits. JIT compilers such as this one are designed to produce x86 code from the original script according to a set of safe assumptions. However, many of the assumptions made by this JIT were faulty, and so it was fairly easy to find examples of code that crashed.

Some fun examples of this are

python content_copy a= "AAAAAAAA" ; b= "BBBBBBBB" ; c= "CCCCCCCC" ; d= "DDDDDDDD" ; c= 93824992231424 ; print c; a= "AAAAAAAA" ; del a; print a; A = "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\xeb\x12" ; B = "\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05" ; AA = 1099511108585 * 1 ;

Most of these relied around two bugs, the first of which involved a heap overflow when variable names where greater than one character , and the second of which was a use after free with the del keyword.

So this “bug” was a little odd, because it had a table in the JIT memory that mapped variable names to indices, but the function that computed the mapping for a new variable name returned -1 for variables names with length greater than 1 . As a result, we were able to write out of bounds with it.

There was also another bug that we missed because we didn’t read the parser all of the way through, but that HITCON found. Specically, the print log directive allowed them to have an arbitrary file read on nearly every single team when they launched it the following day, where as our silly bugs from before only allowed us to hit about half. Moral of the story? Find all of the bugs.

Carbonited ML

Shortly before the end of the competition’s first day, after both ROPShip and Telooogram had been retired, the lights in the room dimmed. The game’s visualization faded away as a familiar tune began playing. On the screen, we watched as Han Solo was frozen in carbonite. A new problem was out: AI Han Solo.

This problem was a spiritual successor to DEF CON Quals 2018’s Flagsifier, a problem wherein you were given a network that had been trained to classify the flag, and you had to determine what that original flag was. AI Han Solo, however, was an attack/defense problem and not a jeopardy problem, so there was a bit of a twist. The network itself was trained on 256 different classes. The first 16 were just a single letter repeated 16 times (i.e. 0000000000000000 or AAAAAAAAAAAAAAAA ), whereas the 17th class onward were a derivative of the flag (more on this later). Additionally, you were allowed to “patch” your network by retraining it on a new secret, which you chose and sent to the server along with your network. As long as the network classified each of the 256 classes with greater than 99.8% accuracy, the patch was accepted.

So what do we mean when we say that the classes beyond 16 were a derivative of the flag? Well, instead of just classfying the flag itself, the classifiers had to classify a hash chain of the flag. For example, if your flag was stored in flag , 17th class would need to accept flag , the 18th would need to accept sha256(flag)[:16] , the 19th sha256(sha256(flag)[:16])[:16] and so on for 240 iterations. As a result of this schema, with high likelihood the full entropy of the flag was stored in the model, however, as you got further up the chain it became harder to extract.

For flag extraction, our technique was nothing particularly special. We did a basic gradient descent to determine which digits the model was most likely to accept for each of the classes, and then brute-force twiddled bits until the chain worked out. When we began throwing this attack the following day, we had moderate success — we were able to hit roughly a quarter of the teams, second only to HITCON who was hitting about a half.

For flag patching, on the other hand, our technique was a little more devious. If you paid really close attention to what I said earlier, you might have noticed something odd.

As long as the network classified each of the 256 classes with greater than 99.8% accuracy, the patch was accepted.

Specifically, the verification was only testing for the existence of positive results, and completely ignoring negative results. This meant that we could, oh I don’t know, only train each class on the first half of the word and fill in static for the second half. As a result, every example from that class would pass trivially, but any attempt to reverse the contents of the original word would need to bruteforce 32 bits for each of the elements of the hash chain.

This patch was incredibly effective, preventing any successful attacks against our network. However, this begins to reveal some of the issues with AI Han Solo as an attack/defense problem. Firstly, a single patch like ours rendered the entire problem unsolvable without substantial work from us. Although it seemed as though not many other teams thought of this, it is concerning that there was effectively no way around our solution, for instance via an actual bug.

I’m comparing this problem to what I consider as a more standard attack/defense problem wherein there are a number of disjoint bugs at varying levels of difficulty such that even if a team patches many of the bugs, its very hard for them to find and patch every single one until close to the end of the problems lifetime.

Moreover, much like poool from last year, this problem suffered from a moar core issue. Specifically, even if teams did not have our “impenetrable” patch, they could still rent out a small training machine to retrain the model every round. At that point, the other teams would also need a large amount of compute in order to be able to keep up with their patches. Ultimately, while the concept was fun, this problem made for a rather frustrating attack/defense problem and seemed better suited to its jeopardy form.

By the time we had out aoool and AI Han Solo exploits written, the competition had long since been suspended and we had gone into the next morning. However, it was only about 2:00 AM, unusually early for us during a DEF CON. As opposed to many previous years, we were only given two problems to work on overnight, and neither of them seemed to go any deeper than we had already reached. It was a relief to know that we would be getting close to 6 hours of sleep for the next day and I hope that future competitions have the same rough amount of work.

DOOOM, Eternal

When we arrived to the competition the next day for setup, we were greeted with an unexpected present. With 30 minutes before the start, our team captain returned from the leaders’ meeting holding, of all things, an original xbox (xbooox?). We were not told what it would be used for, other than to plug it into our network and wait for the competition to start.

As that finally happened, we learned exactly what the organizers meant. Specifically, it turns out that the OOO has a brutal sense of irony, and the xbox was for our next King of the Hill, DOOOM . As the name would suggest, the xbox was not running any existing xbox game, but was instead running a custom version of the original Doom that was hotloaded from their staging server at boot, and would connect to the game server that they hosted. Once the game began, our King of the Hill problem became a literal King of the Hill as there were specific territories in the game that would grant you points toward that round if you stood on them.

Although this sounds straightforward enough, there was a catch (duh). Every player connected with the name sheep . In order to earn points, your named had to be prefixed with your team id in brackets. For us specifically, that meant we needed our name to start with [07] . At this point you’re probably thinking,

“Ok, Zach. Just man-in-the-middle the connection and replace sheep with like [07]p . Easy, peasy”.

This is, of course, completely correct. However, as I learned the hard way, we are not very good at man-in-the-middling. In fact, I would go so far as to say we are completely terrible at it. To save you trouble, here is an abbreviated list of things we tried that did not work.

Linux: NetworkManager was having none of it. Ettercap: We got ettercap running in filter mode, but despite the fact that it thought it was correctly rewriting the packets, it was in fact doing nothing. ¯\_(ツ)_/¯ Bettercap: I’ll be honest, I gave up on that one pretty quickly. PF: I came close on this one, I was correctly intercepting traffic from my machine going to the staging server, but for some reason I couldn’t get the xbox, which was on a bridged network with my mac, to follow suit. IPTables: We finally got it connected to a linux VM so that we could set up an IPtables rule to do what PF wasn’t, but we couldn’t get that working either.

At this point, you’re probably thinking either “Well, if none of those worked, what possibly could?” or “You’re an idiot, there’s clearly a much simpler solution that you didn’t even try.”

Sadly, the people thinking the latter are correct. After wasting about 6 hours of our time, we finally realized that we could connect the xbox to a VM which would impersonate the staging server, and then forward connections to the host machine to actually talk to the game server. Once we decided to try this, it took all of 20 minutes to get it working.

At this point I would love to be able to say that we were able to make up for lost time and exploit all of the bugs in the binary to win each of the remaining rounds. However, what we did was make two changes to the binary, one that changed our name so we could score, and one that re-enabled shooting. Then found our best Doom player and had them play each of the remaining 10 rounds until the challenge was taken down.

Surprisingly, we actually managed to get second place in most of the rounds following by simply playing the game. We don’t know if that’s because no other teams found the bugs that Zardus alluded to after the fact, or because waituck is pro-level Doom player, but it was enough to get us a few points before it was taken away.

Ah well, at least next time we’ll be better prepared.

Fairest of them All

While I had been bashing my head against TCP, another problem was released, this time introduced by Evil Spock of the mirror universe. Apparently in the mirror universe, lisp machines have become the standard for computing.

Hey, that doesn’t sound evil…

In keeping with the OOO’s hidden theme of web problems (!!!), this Lisp machine program ran a web server that exposed some barebones functionality. After some reversing, we figured out that there was an additional endpoint for http:web-admin that was reachable by going to /λ .

Yes, that is an actual lambda character encoded via the MIT lisp encoding. I guess someone should add that to dirbuster

That endpoint wasn’t sufficient to get you a flag, unfortunately. Before it would let you access the flag, you had to authenticate as the burnham user. The only problem? burnham was not a valid result of the get-users function which is how one actually authenticated.

At this point, we dove into the gruntwork of meticulously scan the binary for any hint of bugs, undefined behaviour, or other broken logic that would let us bypass this check. Several people worked on this for a couple of hours, but came up basically empty. The only exception to this was the presence of some unreadable assembly in the get-users function. This code could be reached by having the client pass in a non-hex value for the auth-code HTTP header. We stared at it for a while trying to understand what it did, but to no avail.

At this point, we of course pulled out our whiteboards and notepads, and began dissecting every single line of the suspect assembly.

Kidding! We actually just sent the server an http request with the word burnham in everywhere. Host ? “burnham”. Content-Encoding ? "burnham’. Body ? “burhnam=burnham”. Much to everyone’s surprise, this worked. Specifically, when we set the query-string to be burnham=burnham and the header auth-code: burnham , we were able to bypass the check and access the eval that followed it.

For the rest of the day, we were able to successfully exploit this against all other teams without having any idea how it worked. By all accounts, it shouldn’t have, and yet somehow it did. As the competition wound down for the day, the team that had been working on it began to develop a better understanding of what happened. Specifically, they were encountering a failure situation in which an exception was being raised with a local variable attached to it. The local variable’s context was exiting, which caused it to be a pointer to an unused area on the stack. This region would later be filled in with the "burnham" from our query string, and so when that variable was being checked against "burnham" , the check would succeed. This allowed us access to the eval that followed, and granted us an obscene number of flags.

This problem was clearly designed to be one of the hardest of the competition and I imagine that the organizers did not intend for us to get it so early. When we were throwing it the following day, we had expected several other teams to be throwing it against us too, and to have patched out the issue. However, we were still able to score against most teams with — prior to the pcaps being released — only one or two other teams throwing against us.

I think fundamentally this was a decent problem, however it suffered from many of the same issues as other problems. There was really only one path to exploitation, and for many teams who didn’t find it, all of that time spent staring at the binary was wasted. It ended up working out very well for us, but it could have just as easily hurt us if it had not occurred to us to try that technique.

Babi is You

Released roughly the same time as Mirror, Babi (explicitly pronounced babby ) was a rust binary that appeared to be a standard memory corruption style of attack/defense problem. The only problem, of course, being that it was rust. For those lucky souls who have never had to reverse a rust binary, they are incredibly annoying. It’s much more challenging to lift Rust-generated x86 back into something reasonable, and rust is a fairly safe language so bugs are generally more difficult to find.

Unfortunately, Babi ended up being mostly a grunt-work problem with very little payout. We found a partial leak fairly early on, but it took a while to turn that into a full leak. From there, we were able to find some basic heap corruption. Although it took a long time, we were able to turn that corruption into a full blown exploit, but it was sufficiently fragile that nearly any change to the binary would break it. As a result, when it came time to through the following day, we were only able to hit a small number of teams with it.

Problems like these are a bit annoying because there really is not a whole lot you can do about it. To our knowledge, there was no alternative exploit that would have been stable against these patches, and without consensus patching there was no way to determine the new offsets that people were using (if they were even still vulnerable). The result of this was that we took three of our best players out of the game for a long while with little payout.

There was also a slight commnunications mishap with regard to Babi patching. Specifically, the rules states that you could only patch babi::* functions, however that did not include babi::main . As this was not explicitly mentioned, it cost us a few rounds of delay on Sunday morning as we updated our patches.

Proof by Exhaustion

Despite being excited to jump onto a lisp problem after suffering with Dooom for so long, the team working on it solved it only minutes after Dooom was taken down. I could have jumped onto Babi, but I really did not want to look at a rust binary. Fortunately for me, not long after Dooom was taken down, A new King of the Hill went up, the Bitflip Conjecture.

The aforementioned conjecture stated this: There exists a sequence of fewer than 200 bytes that can have any single bit (or no bits at all) flipped such that it still prints out the string "I am invincible!" and exits with code 0 . This was scored simply as 1000 - {number of bits that when flipped fail the condition} . When you connected to the server, it also gave you a few additional rules. Specifically, you could choose how you wanted your registers initialized. They could either be all 0’s, initialized to the middle of a large rwx page , or left as they were.

So this was actually false. We’re not entirely sure how this happened, but in the binary they gave us it was clear that they meant this to be true: They assigned r8 to be the result of an mmap call, and then later assigned it to all the registers with an offset. The problem was, between those two things they called several other functions ( open , read , etc). Since r8 is not callee saved, it was being overwritten with garbage. This ended up biting us because we thought we had a solution that would have worked if this were valid, but alas it was not.

Because of how the scoring worked, we were able to get an early lead on this problem by just golfing down our shellcode to be pretty short ™. With this strategy, we were flipping back and forth for the first place with a couple of other teams. Within 45 minutes, we had developed a more robust version that had two copys of the shellcode available and looked at rdx (which had previously been assigned the address of the flipped byte) to see which half was corrupted and then jump to the other one. Our initial version of this allowed us to have only 25 vulnerable bits for a score of 9975

asm content_copy lea rax, [rip+copy2] cmp rdx, rax jb copy2 copy1: mov eax, 1 mov edi, 1 call after string: .ascii "I am Invincible!" after: pop rsi mov edx, 16 syscall xor edi, edi mov eax, 60 syscall copy2: mov eax, 1 mov edi, 1 call after2 string2: .ascii "I am Invincible!" after2: pop rsi mov edx, 16 syscall xor edi, edi mov eax, 60 syscall

After golfing this for another 20 minutes or so, we were able to get it down to only 7 vulnerable bits, for a score of 9993 that kept us in first through the rest of the day’s competition.

The rest of the evening for us was just continually shrinking down the number of errors by using a combination of nop padding between the two copies of shellcode (so that the comparison could be fuzzy and still work) and by adding nops to the vulnerable section so that when they were corrupted, they would be corrupted into something harmless that would be ignored. Ultimately, we were able to get it down to only a single error using the header sub al, dl; js copy2 . The error was on the 14th bit, which would change our prologue to be sub BYTE PTR [rax-0x3fcead88], dl; dec ecx which caused a seg-fault. The final shellcode looked like this:

asm content_copy .org 50, 0xcc ; To account for the 50 bytes they prepend for setup sub al, dl js copy2 copy1: xor eax, eax dec ecx xor eax, eax inc eax xor eax, eax xor eax, eax xor eax, eax xor eax, eax xor eax, eax xor eax, eax inc eax mov edi, eax call after .ascii "I am Invincible!" after: pop rsi lea edx, [eax+15] syscall dec edi mov al, 60 syscall .org 136, 0x90 copy2: .org 200, 0x90 xor eax, eax inc eax mov edi, eax call after2 .ascii "I am Invincible!" after2: pop rsi lea edx, [eax+15] syscall dec edi mov al, 60 syscall

For most of the next day, we were tied for first with a number of other teams who had used basically the same strategy. Eventually, about halfway through the day, Tea Deliverers managed to prove the conjecture, and submitted a perfect shellcode that withstood all corruption. We found out later that they had used an absurd amount of compute power to brute force the last bits of their shellcode. I’m annoyed that we were not able to do it by hand, but I am certainly impressed by Tea Deliverer’s feat. It’s easy to argue that shellcode golf is not really a good hacking challenge, I also really enjoyed it as well as last year’s. I definitely hope OOO continues doing it.

Every Web’s Here!

As I alluded to earlier, this had been an especially web heavy competition. As the final day wrapped up, it was clear that it would continue to be. With only minutes before the day’s end, the Order released a new challenge entitled “Super Smash OOOs”. The problem itself was not made available, but they did provide you with a download that included a wasm file, and a basic web server.

Despite what you might expect, this problem was not Super Smash implemented in Wasm. Only an idiot would write a wasm game for a CTF. Instead, it was a payment processing platform that verified your credit card so that you could watch an SSB pay-per-view match.

This sounds, on the outset, like a pretty standard wasm resversing problem. Unfortunately, this problem decided to subvert all of our expectations, and not necessarily in a good way. Remember how I said that we only got a web server and a wasm file? Well this was sorely lacking in the things-needed-to-run-the-problem-locally department. More specifically, all of the glue needed to connect the wasm to the server, and provide the wasm with basic syscall emulation, was not provided.

We were able to slowly restore this functionality, but it was painful and ended up taking too long to really be of any use. Concurrently, a couple people were working on reversing the wasm itself. Much to everyone’s surprise, most of the core logic was not done in the wasm itself, but was actually done on a remote server whose address we were not given. In all of our time reversing it, the only thing we found was a slight heap overflow that was not obviously reachable and would have been miserable to exploit.

When the next day came around and we were ready to get the missing files, we were surprised to learn that the rest of the problem was not going to be given to us — what we had is what we’ve got. We continued to push forward on it, not really sure where we were going until about halfway through the day when two things happened simultaneously.

Mhackeroni started scoring on the problem They decided to give us that missing glue

Both of these were a little odd in their own right, it was surprising that anyone could solve this, and given that someone could solve it, it was surprising to see more info being released. Yet, despite this new information and how hard we worked on it, we were able to get nowhere.

Afterward, we learned how Mhackeroni had been earning flags. Its not that they were masters at wasm reversing. Instead they had found a SQL injection in the service it was talking to.

SQL injections, in my wasm problem? It’s more likely than you think.

Needless to say, this was a little demotivating to hear, especially because it could not have been found overnight while the problem was down. Furthermore, to the best of our knowledge, no one was able to score any points via an exploit in the wasm itself. Assuming there was one, it is very strange that the OOO would not release that glue at the outset. By the time they released it, there was so little time left in the competition that figuring all of that out ourselves was just unnecessarily tedious. Although I doubt it to be true, but if there was no exploit in the wasm, they should not have given to us overnight, knowing full well that we would spend many hours looking for a bug.

I especially like wasm problems, but this one needed to be more focused on the wasm reversing and exploitation, instead of getting it running and surprise SQL.

Unlike the previous night, our Saturday did not end until well after 4:00 am. Coupled with the competition starting an hour earlier on Sunday, this meant that we only had somewhere between 2 and 3 hours to sleep. Fortunately, Sunday tends to be a lot of throwing what you have and not a lot of thinking, but we had been warned ahead of time that there would be at least one problem released.

I threw a red bull in the fridge, and headed off to sleep.

For some reason, DEF CON is the only thing that can convince me to drink that crap.

FeverDream.js

After everyone got settled the next day, throughing exploits and deploying patches, it was time for the promised final problem, jtaste . In a move that should surprise no one at this point, it was a web problem. However, unlike the previous problems which were more web-adjacent, this one was an honest-to-goodness run-the-gauntlet web problem.

The only issue with it? It made absolutely no sense.

Without writing any code, let me describe how this problem worked when used correctly.

Users were shown a 5x5 grid, a clear button, and a submit button

When you hovered over a grid element, it added a corresponding number to a chain shown at the top, as well as sending that number and its “signature” to the server.

The server verified that the signature was correct, then appended just the signature to a session variable called verified

to a session variable called When you clicked submit, it would send those values shown at the top, to the server. The server would then check the length of what you submitted against the length of verified . If they didn’t match, it would complain at you. If it did match, however, then it would the session variable counter to be the array of numbers you uploaded.

. If they didn’t match, it would complain at you. If it did match, however, then it would the session variable to be the array of numbers you uploaded. Then, if you hit the /persistent endpoint, it would Filter out all of the 46 and 47 s from the array (ascii codes for "." and "/" respectively) Prepend an array that had the character codes for "./public' Call unidecode on every element of the resulting array Convert that into a string Read the contents of the file referenced by that string Write the stringified original array to that file Return the original contents to the user

endpoint, it would

…what?

I wish I could give you a nice interpretation of what this program was supposed to be simulating, but honestly it made no sense to me whatsoever. Fortunately, just from that description it should be obvious what the solution is.

Simply create a string for the relative path you want to read, such as ../../../../flag , convert all of the characters to their char code, replace all of the 46 s with 8228 s (“one dot leader”), and the 47 s with 1793 s (“syriac supralinear full stop”), and then for each letter, send any number and signature from your session to the server, then send the array you produced from your path to the server, then call /persistent .

Likewise, this also becomes trivial to patch because you can simply wait until the flag is being read, then replace the flag value with a boring string (in our case xxx ), and no one can solve the problem with this method.

This solution got us a few points while people where still understanding how it worked, but everyone shut it down pretty quickly. Depending on how people patched it, however, we were still able to use it to leak their server code, allowing us to read their patches. In a few other cases, they were just preventing you from reading /flag , so we could get around this by reading /proc/self/root/flag . While this allowed us to still get 3 teams every round, it was only a drop in the bucket.

The problem was using webpack-hot-reload which seems like it was supposed to be an alternate solution path wherein you overwrite one of the webpack’d source files to get it to include the flag from elsewhere, however I was not able to use this setup to perform any kind of write outside of the /public directory. As such, I could find no way of getting webpack to pick up any of the files I had written.

While I generally like the fact that there was an easy web problem to finish the game, this one did not make much sense to me as a problem, and seemed a little bit too easy to be anything more than just something to keep people busy.

Good Vibes

As the competition wound down, and before I started to clean everything up, I had a little bit of time to reflect on the year’s competition. Overall, despite some problematic problems, and some bad networking on my part, I found it to be one of the most fun DEF CONs I had played in. In part, this was because I broke from my normal routine of just playing defense. While useful, especially given our propensity for it, defense is rarely the most interesting role to play.

Yet, a lot of the fun this year came from the fact that the problems were just, well, fun. Even jtaste , the silly web problem released at the end, was low stakes and amusing enough to be enjoyable. Additionally, either due to the space we had, or the OOOs benevolence, the CTF atmosphere was significantly calmer this year. Its hard for me to overstate how much of an impact this has on my ability to enjoy the CTF. As cool as GoogleCTF-style visuals and sound effects are, when you’re in the middle of a stressful competition, all they do is elevate that stress. Even the loud memes that were on display last year really did not make for a pleasant workspace. Fortunately, this year’s combination of CTF-related visuals, relatively calm music and videos, and the pleasant whitenoise of people talking around you made for an overwhelmingly better evnironment.

Another thing that made this DEF CON more pleasant was that the OOO’s infrastructure and problem development had clearly matured. While there were certainly some issues, they were far less impactful and frustrating than in last year’s DEF CON. It’s great to see the way that they are learning from each competition, and improving it.

Aiding this fact is likely their decision to keep the format the same as last years. I understand that not everyone is a fan of King of the Hill, but I think they made the right call in keeping with a familiar gameplay style so that they could focus on the less exciting but just as important aspects of the game. Knowing them, I’m sure they’ll want to experiment further next year, yet as nervous as that makes me, I’m much less worried having seen how things went this year.

Addressing the Future

Although this year’s game went very well, there are still a few things that can be improved upon by future hosts.

The biggest concern we had was the rate limiting method employed. Inherently, we’re not opposed to rate limiting. Ensuring that no one abuses the infrastructure is crucial in ensuring a smooth game. However, the policy of “if you exceed your quota, we immediately disconnect you” leads to a lot of trepidation when approaching new problems. A prime example of this was Super Smash OOOs . As mentioned prior, there was a SQL injection in the remote server that you had to find blind. One of the reasons we did not find this is because none of our members were willing to poke around the production instance of the problem for fear of disconnecting us from the game. If the policy had been softer — i.e. temporary dropped packets or even public shaming — it is far more likely that we would have explored that avenue.

Likewise, another issue that we struggled with this year was understanding the problem scopes. As mentioned, there was the issue with SQL injection in a wasm problem, but there was also a lot of time spent elsewhere. For instance, in the shellcoding problem, we tried a few things to attack the problem runner itself instead of solving the original problem. For that one, Shellphish claims they actually had an exploit running against buggy server code before the organizers patched it. While I understand the value of having players look for alternate solutions to problems (thinking out of the box as it were), In a competition like DEF CON, it’s important to have a well defined scope for problems.

The only other feedback I have is largely unchanged from last year. The problem health indicators, which have 5-ish states ranging from good to bad still feel really arbitrary. Problems sometimes jump multiple states at once, and knowing something is at ok , for instance, does not actually give competitors a good feel for how much longer it will be up. Arbitrary problem lifetimes is certainly not an issue unique to the OOO’s games, but it is more obvious as a result of the health indicators. I think they’re a great idea, but they need a little bit more consistency to actually be helpful.

I can’t say this enough, but thank you to the Order of the Overflow for all of the sacrifices they make in order for DEF CON CTF to be successful. In a similar vein, congratulations to all of the other teams who played this year. Coming to DEF CON is so much fun in part because its an opportunity to see all of you and enjoy the greater CTF community. In particular, congratulations to both HITCON⚔Bfkinesis and Tea Deliverers who played phenomenally this year. It’s incredible to see just how many brilliant hackers there are in this community.

Final Thoughts

This writeup has gone far longer than I had intended it to, and I imagine not many people will make it this far. But I wanted to take this opportunity to pose a few questions for the community as we move into this new season.

What is the purpose of CTF? Do we play it to make ourselves better hackers, to push the boundaries of what we can do, or simply for the love of the game. What’s keeping people away from it, and what can we do about it? A couple of teams opted not to attend DEF CON this year which is always sad to hear. What crazy things do you secretly hope that OOO does next year? How do they make it work well?

I’m always curious what people are thinking, feel free to comment, write your own long blog post, or tweet me. Thanks!