Transcript

Klabnik: I'm Steve [Klabnik], and this is Rust's Journey to Async/Await. There has been this theme. I've given a lot of talks about Rust at QCon. It seems like I'm always talking about something that's happening three months in the future. This continues on that trend of things. As mentioned, this is about how we arrived at async/await and the final couple of little steps that are happening as well. I'm giving this talk because I'm on the Rust team in general. I'm on the Rust core team and in charge of the documentation team. I work at Cloudflare, although not on Rust stuff. I'm doing two workshops tomorrow, one on Rust and helping out with one on Cloudflare workers.

While I am on the Rust team, I want to emphasize that I was not directly involved in all of this work. I am primarily doing this in a reporting capacity. I like to think of myself as an amateur historian of Rust developments. I like to keep track of how the language has evolved over time and deal with these major topics. I was not directly involved in the design; I just read all the comments and have talked to people over the years. I'm giving you my semi outside perspective on how this went down. Most specifically, a lot of people were involved in making async/await and Rust happen. A special shout out is deserved for withoutboats who is the person who has been primarily driving the design over the last couple of years. Boats is awesome and wonderful.

What is Async?

To start off, to figure out how we actually get to async/await and Rust, we have to talk about what asynchronous programming actually is. It turns out that this is really hard to talk about because there are so many different options in the space of trying to make a computer do more than one thing at once. The start of this talk is going to involve a lot of definitions just to make sure we're all on the same page. We're going to start off there.

People talk about asynchronous computing along with these other two things, which are parallel computing and concurrent computing. People get these three definitions mixed up all the time. Just to be clear, parallel computing is when you're able to do multiple things at once. Concurrent programming is when you're able to do multiple things, but not at once. Asynchronous programming is actually unrelated to those two things. It's a totally different access. Let's get into this a little bit more.

Briefly, there's also this other term called a task. This is a generic term for some sort of computation running inside of a parallel or concurrent system. This may mean a thread, this may not mean a thread, this can mean anything. It's just a generic catch all term for "I'm doing something inside of an asynchronous or parallel or concurrent system." You'll hear me say task a bunch; it's a very generic word.

Parallel computing. Technically, things can only be parallel if you have multiple CPU cores or multiple CPUs. It shows a lightning bolt for a computational task, just because; it seemed cool. We have this timeline of execution. You have these two things, and they're truly executing independently. This is why this is only possible with multiple cores or multiple CPUs because you fundamentally can't make one processor do two things at once because of physics; it's physically doing a thing. If you have multiple of them, then they can be doing two independent things in one system. You get parallelism this way.

The second one is concurrent programming. Concurrency means that you pretend that you have multiple cores, but you don't actually have multiple cores. You do a little bit of work on one thing, a little bit of work on another thing, and another little work on another thing. All of these options can also be combined. For example, in a lot of systems, you have different parallel tasks that are executing concurrent things inside of the two of them. There are all sorts of ways you can fit these things together.

When we talk about asynchronous programming, we're talking about language features that let you write code that is able to be executed in a parallel or concurrent way. That's because you need the ability to do things simultaneously. If you think about the way that code normally executes, you have the first thing and the second thing and the third thing, and they follow down in lines in most forms of programming, not all. You need some way to be, "Actually, execute these things at the same time." That's asynchronous APIs.

I'm almost done with terms, I swear. Then we can actually productively talk about this: Cooperative and preemptive multitasking. Cooperative multitasking is when all of your tasks have to cooperate to operate asynchronously. Each task decides when it is willing to give up the particular resource and have another task executed instead. This is the way that I originally learned about this, because I think it was Windows 95 which came out, and had preemptive multitasking or Mac OS 8 instead of 7 or whatever. Preemptive multitasking is when the system decides when you yield to other tasks.

This means that any computation can be interrupted in the middle and handed off to something else. This is a very interesting trade off because in a system with cooperative multitasking, if you have a bad actor, they can really mess everything up. What happens if they never decide to give up control of the situation? Then they just hog all the resources. In an untrusted system, you often want preemptive multitasking but being able to do that also adds some overhead and some complexity and some other things. If you have a system where you know that you have only good actors, then cooperative multitasking can be a little bit better. It just really depends. These two things are at odds.

Then, finally in my big terminology block, native versus green threading. This one gets even more complicated. I'm not going into all the differences here. You also hear things like coroutines and stackless or stackful coroutines and all these other things. At its core, there's a dichotomy between native threads, which are sometimes called 1:1 threading. These are tasks that your operating system provides as an API. If you're running a process, you're able to say, "Within that process, I would like to execute some tasks." It's called 1:1 threading because you get one task for one operating system task, which implies green threads, which comes next. Also, called N:M or M:N threading. I don't see it consistently used. Sometimes people put the N first and sometimes the M first.

These are when your programming language or its runtime provides you with a task abstraction. It maps a certain number of its own tasks on to the operating system's threads. You get N program threads to M system threads. That's why it's N:M and why the other one is 1:1, because there's a correspondence. A lot of different programming languages are using green threads and have for a long time. Both of these things are interesting. This particular set of things really gets to the core of why Rust made the decisions that it made in order to do async, specifically around IO.

There are some advantages and disadvantages to both of them. Everything is a tradeoff, there's no free lunch. Native threads have the advantage that they're a part of your system. The operating system is able to handle its scheduling. That means that it can see every task that it's executing on the system and make the appropriate choices to do the scheduling because it has total visibility into everything that's running. They're also very straightforward and well-understood. Threads are an old technology and have been used for a long time. They're hard to use sometimes, but in terms of an API, they're not that complicated. They're super well understood because they have existed for a long time.

However, they also have some disadvantages. These are also system independent. If you look at Windows versus Linux, for example, you’ll get some small differences here. I'm not going to go super deep into those details, but the defaults around native threads can be kind of heavy. There's a relatively limited number that you can create due to those sizes. Your operating system needs to keep track of all these different threads. It has its own structures that it uses for bookkeeping. The primary resource that people are concerned with the threads is the stack size; the amount of memory each thread can take. By default, that's relatively large on Linux and Windows. In keeping track of that overhead, every time you create a thread there are some overheads; you can only create so many of them. Given they only have so much memory and you only have so much memory, you can only create so many of them.

Green threads, on the other hand, their advantage is that they're not part of the overall system, which is hilarious given that that's also being part of the system as an advantage. It's also a disadvantage. Since your runtime handle scheduling, it can know and prioritize the details about your program and how the bits of your program operate, but the weird downside of that is now you've obscured this aspect from the overall system. You might get into a battle between your operating systems, the scheduler, and then your program scheduler. That can get weird and complicated.

The major advantage that most people talk about is that they're significantly lighter weight. Usually green threads start off with a stack size that's a 10th to 100th of the size of a native thread, which means you can create tens to hundreds more of them. It's not uncommon in systems that are built around green threads to start up a million threads; that's totally reasonable. It's not something you would want to do with an operating system thread because you would run a RAM.

There are some disadvantages there largely around the fact that these are not part of the native system anymore. If your green thread runs out of that very small amount of stack, then it has to do something. There are many different ways to handle this. This is largely where all the different implementations of green threads break down. It needs to grow and change the size of that stack and then copy the old stack over and do all this stuff. That can cause some issues. Then also, when you're calling to C code, C expects a real actual stack. This can introduce some overhead when you have to switch between the stack from your green thread and the stack from the native thing. That can be a problem. Which one of these is appropriate depends entirely on what you're trying to do. Largely, the story that I'm going to be telling you today is about how Rust thought that one of these things was the right way to do things, and then painfully over the years slowly switched to the other one.

Why do we care about any of this stuff at all? This is the end of my giant definitions block. There are just so many things here that you have to piece together to really form a coherent story about this. I needed to start off by doing that. There's this old thing called the C10K problem. I took this screenshot, this is the website that first introduced me to this problem. It's a little old, if you can't tell. "It's time for web servers to handle 10,000 clients simultaneously, don't you think? After all, the web is a big place now. Computers are big too. You can buy a gigahertz machine with two gigabytes of RAM and a gigabyte Ethernet card for $1,200." Now you keep going into this thing. "In 1999, one of the busiest FTP sites, cdrom.com actually handled 10,000 clients simultaneously." I just love that sentence because it really positions you in this late '90s kind of mindset. FTP site is not a thing people use anymore most of the time. CD-ROMs and cdrom.com just really spell out 1999 for me.

Ultimately, the point here is that we need to be able to handle as many users as possible, especially when we're building web services. This is the thing that as the internet grew became more and more important. We had to start thinking about this. This is where Rust's async story kicks off. Even though Rust did not exist in 1999, these problems have been happening over the last 20 years, to some degree. Back in those days, we had this web server called Apache. You can still use Apache today, obviously. I want to make a point that it is not like this today, although I think you can still technically use it like this, but I don't know why you would.

This is Apache with its pre-fork behavior. You would have this control process that would listen for a connection, somebody hits your website, and then a connection would come in and it would spawn a child process to handle that connection. You'd be able to do this with these resources. This is very heavy weight, this is using a whole process to handle things. Then you move into other options. Back in the day, this was fine, but it was too resource intensive, and so what they did was they moved to this worker model.

Instead of spawning a whole process for each incoming connection, a child process would have its own thread pool with its own individual threads running inside that process. These would be operating systems like 1:1 threads. They would handoff connections to these thread pools. This got us much farther. We're able to do more with less resources because we were no longer using the extremely heavy data structure to handle more and more traffic on the web.

Let’s Talk about Rust

I'm going to stop talking about the past now, let's talk a little bit about Rust. Rust was originally built by Mozilla. The idea was that it would be enhancing Firefox, which is an HTTP client, not an HTTP server. This does mean that we care about IO, but the situation is very different. Your particular instance of Firefox is not handling 10,000 requests at the same time, or a million requests. It's one of those making those requests to a server. Having efficient IO was important because you do need to download web pages faster, but figuring out immediately how to handle massive scale was not necessarily an extreme priority, because it was primarily going to be operating in this client capacity.

This is a screenshot of the old documentation for Rust. Rust originally had this module called green. This was a green scheduling library with green threads, as I pointed out, M:N. Original Rust was very influenced by Erlang and actually looked much more similar to Go today than Rust actually does; they were much more similar way back in the day. Rust was like, "Ok, we're just going to give you these green threads," because the idea is if you want to be efficient at this at all, you spin up a zillion green threads and you handle them per connection. That's totally fine.

They also had this net library that existed. This was synchronous but non-blocking network IO. For some reason, it repeats twice. I don't know what this little bug was in the docs. If you see this module, it says the same thing two times. Anyway, you were able to do very basic TCP and UDP things and the standard library. It was all non-blocking, and therefore also synchronous, which is weird. Wait a minute, I thought the whole point of async was to be non-blocking and the synchronous would not be blocking? It turns out there's yet another thing that we didn't talk about in this space; so many things. Isn't this a contradiction at all? Actually, it's not. There's this graph where there are only three out of the four things. You can have these two dichotomies of synchronous and asynchronous IO and blocking or non-blocking IO. They're actually technically different.

Synchronous blocking IO is what the original Ruby interpreter did and what the original everything interpreter did basically, which is you write your code in a way that looks like it blocks. Every time you call an IO function, it blocks everything. This is very slow, but it worked really well for a while. Asynchronous and blocking doesn't really make any sense, because why are you writing your code in a way that enables you to non-block and then block on it? I don't know any languages that actually do this because it's nonsense. There's actually a difference in non-blocking IO with synchronous and asynchronous versions of this thing. Go and Ruby today actually offer a synchronous non-blocking IO. What that means is when you write Go code, you're not writing it in a way that looks async; you're writing in a way that looks like it goes through step by step.

Inside in the runtime, it actually transparently does non-blocking IO and it will pause your program there, whereas if you look at Node.js, it's asynchronous and non-blocking. You write all your code in an async manner, but it's also non-blocking. There are a ton of options in this space and there are different reasons why different languages pick things in this tradeoff. We for Rust had to figure out which one of these things makes any sense. Originally, we picked asynchronous and non-blocking. The advantage of synchronous and blocking is it's very straightforward, but it's not performant at all, so that was never really going to be an option. Asynchronous and non-blocking is nice because you don't have to write your code in a different way. You just write your code the way that you would normally write it, but you get some performance benefits because under the hood, the runtime does normal things.

This is also really common in languages that start off with synchronous and blocking IO and they want more performance, because it doesn't require your code to change in order to get faster. You can do it later. I mentioned Ruby because I was familiar with Ruby beforehand, and it was the same thing. Eventually, they added non-blocking IO internally, no one's Ruby programs needed to change, everything just got faster. With asynchronous non-blocking, your code looks like it doesn't block, and it doesn't block, but's harder to write because you have to deal with asynchronicity in some fashion.

Not All Was Well in Rust-Land

We had this blocking but asynchronous IO in Rust. Unfortunately, that wasn't good enough. See, this is what the Rust website used to look like a long time ago. It describes Rust as a systems programming language that runs blazingly fast, prevents all crashes and eliminates data races. When I told you before about green threads versus system threads - green threads are provided by your runtime and native threads are provided by your OS - what does it mean to be a systems programming language if you don't offer your users the ability to use system threads? It's a fundamental API of the OS, but we're not going to let you use it? That seems really incoherent. We didn't think that blocking green threads was really the answer for Rust ultimately. This became one of the first big battles in Rust design. There were actually people who threatened to fork the language over the situation.

What ended up happening was - you'll see a couple of releases later, the docs are a little fancier now - we provided this native package that would give you 1:1 threading alongside the green package that would give you the M:N threading. The idea was that you'll get to choose what thing was right for your use case. You'd write some code. They're both threads. We could just offer an API where sometimes you would use green threads if you wanted, and sometimes you would native threads if you wanted, and everything should be totally good and handy. You get all the options.

It turns out that it doesn't actually really work. We had this whole thing. Basically, you write your code against this runtime interface and you would pick which one. That was also not good. The battles continued, people still threatened to actually fork Rust. Someone did and then five people faved it on GitHub, and nobody used the fork anyway. It was a serious problem, we weren't happy with the situation. This is an example of an RFC where we eventually decided to remove the runtime. Rust has this open design process where you can submit improvements for language. This is RFC 230, which is the one that actually removed green threads entirely. What we decided was, offering both doesn't actually work. The reason it didn't work was that in order to do the abstraction to put above both of them, you added so much bookkeeping to the green threads that they were not actually very lightweight anymore. They were just a crappier version of 1:1 threads. They weren't actually faster in performance and there's no reason you would pick them. If you gave people the ability to use operating system threads and green threads, it just got way too complicated and it was just bad.

Rust 1.0 Was Approaching

With 1.0 looming, we had to decide if we're going to be freezing an interface for the whole time. What do we actually do here, because this is clear that the situation is not working and it's clear that the other situation was not working. We decided operating system threads are part of the operating system, where systems language work, it's fine. Let's just do it. Rust 1.0 was shipped with a very minimal straightforward operating system threads. Everything is blocking and it's just OS threads and you won't get anything fancy.

That was great, it let us get Rust 1.0 out the door. That was awesome. This was in May of 2015. You can see the story started in 2011, 2012. We're up to 2015 now, we still got four years to go. Rust 1.0 happened, but still everything wasn't actually totally great. See, people really loved Rust and that was really awesome, but that meant they used it more. When people start using things more, they start wanting more stuff from you. If Rust had faded into obscurity, we probably would have stuck with the only thread API and that would have been totally it. That would have been fine.

It turns out that people wanted to build more and more stuff. Specifically, they started to really want to be able to build network services in Rust. This is where that history with Apache comes crashing back in; people actually didn't want to just build clients and Firefox and stuff like that. They actually wanted to be able to build network services to do stuff. They started building them with the existing APIs. Rust is supposed to be a high-performance programming language. The IO model feels like it's out of 1999, like old school Apache, like spin up all these big heavyweight threads. It definitely wasn't performing as well as people wanted to be, even though Rust is supposed to be a super high-performance language. How could we claim that we're as fast C, just not literally being as fast C because IO is a problem? This eventually led us to have to come up with a totally different solution as well.

The Big Problem with Native Threads for IO

The real problem here is that native threads are a bad fit for IO. That's because of one more dichotomy CPU bound versus IO bound. These are ways to characterize different kinds of computation. A CPU bound computation is when your processor is doing a lot of active work. How fast you can do this work depends on how fast your CPU is able to crunch all those numbers and get that work done, whereas IO bound happens when you're doing a lot of networking and that often means that the speed of completing the task is based on doing a lot of IO. The problem with IO is you're often waiting on other people. If I'm making a request to a server somewhere, it doesn't matter if I have the fastest CPU that exists or a 486.

If it takes the web server one second to respond, I'm going to be waiting one second, because it's not about my CPU; it's about the other thing in the system. That's the real problem here, that when doing a lot of IO, you're actually mostly just waiting around for other people. You're not making use of your CPU's resources to do that kind of stuff. You're just waiting and doing nothing. This is why it scales actually, because it turns out that the fastest thing to do is nothing at all. If you have a ton of people who are all trying to access your website at the same time, but you are waiting on most of them, you can handle even more people because you're not actively doing work for most of your visitors.

This is why asynchronous stuff really scales up well when you're doing IO. It turns out the parallelism is generally better for things that are CPU bound. You still do want both, but native threads and that whole model don't really work very well in this situation, because when you're doing a lot of waiting, if you make a thread for every single incoming connection, you're tying up all the system resources and literally doing nothing with them. It doesn't work.

Some other programming languages have come along which have solved these problems. I'm using Go here because it's popular and people are familiar with it. Erlang does the same thing, Go took this from Erlang. You may recognize this from the earlier part of Rust deal. It has asynchronous IO but blocking with green threads. What happens is you write your code and when you do an IO call, the runtime itself will sleep and you can map a bunch of Go routines onto a single operating system thread and life is good. You get higher performance. This is totally a super battle tested well-worn way to do this kind of thing.

However, previously in this presentation, you may remember this slide and green threads did have a big disadvantage. When you call into C code, it's slower. This is totally fine for Go, because Go is trying to put everything to Go. That's also totally fine for Erlang because it's not that big of a deal. You're not using Erlang to crunch numbers. But Rust is supposed to be a systems programming language. If we did green threads, what would a systems programming language that's more expensive to call into C operating system code mean? That's also incoherent. We couldn't really copy this model. I want to emphasize it that's not because doing the other thing is the wrong thing, it just turns out that when you're building different kinds of languages, you have different constraints.

Luckily There is Another Way

Our constraints are that we had no overhead and calling into C. We designed the language itself around this kind of thing. If we had to do it for IO, that'd be terrible. This is also part of why we got rid of the green thread in API that existed a long time ago too. This is a second reason that it didn't work, but also why we couldn't really return to it.

Luckily, there's actually another way. There's this other web server that became popular eventually. That's called Nginx. Nginx did a synchronous IO with this thing called an event loop.

The idea here is this thing called evented IO where you have these events and you register handlers for each event and then it fires off this handler whenever the event actually occurs. This means you get really good performance. You also don't need multiple threads because the event loop is one single thread that is handling running all these things. Most of the things are sleeping, so it's totally fine. You're not spinning up new threads for them. It all just works. This guy named Ryan Dahl back in 2009 released this thing called Node.js. You may have heard of it. This is the original JS confi he used. It was the first one and it was 2009. The AV setup was not great.

It says there, "Evented non-blocking IO." This is really important. That's the whole reason. The third bullet point is what Node.js is. Server side JavaScript, built on V8, evented non-blocking IO. The rest of this presentation goes into why IO and handling IO is the reason that Node existed, and Ryan [Dahl] cared a lot about this a lot; enabling this kind of thing in a programming language was important to him.

Evented IO Requires Non-Blocking APIs

The issue with evented IO is that it really requires non-blocking APIs, because when you combine blocking code and non-blocking code together, things get awkward. That's a whole separate topic that I cannot get into. Fundamentally, the way this works in Node is if you have a blocking API like this top one, you call it filesystem.readFileSync. This blocks until the file is red and comes into data. Since we need to do this asynchronously and we're registering those events, so the event loop can process it, this means you have callbacks. Here's the equivalent example asynchronous. You pass a closure into read file and you check things inside of there. This closure happens. This is a minor transformation, it looks totally fine. The problem comes when you start building bigger things.

You get this, callback hell. Turn it over to the website, callbackhell.com. This is an example that they have of writing JavaScript in this style. The point is, when you start writing this callback stuff, stuff started drifting right. You can't really do a whole lot about it, it's awkward. JavaScript people are crafty, and they came up with this thing called promises. A promise is an object that gives you a value that's not necessarily known when the promise is built, the definition says. A promise is either pending, which means that the computation isn't finished yet. It's fulfilled, meaning that it succeeded, or it was rejected, meaning the operation failed.

This is fine. This lets you take code that looks like this instead. Here's an example from MDN where you create a promise and then after a certain period of time, 250 milliseconds, you resolve it with success. Then, instead of chaining nested callbacks, you can then say mypromise.then and do another thing. We've removed that; we're drift because now we just chain then in line. Now we have a zillion closures vertically instead of a zillion closures indented the whole way, which is better, admittedly, but still not great for a number of other reasons.

The folks over at Twitter also were dealing with scale. They had originally built their stuff in Ruby on Rails and it was not scaling in part because of IO issues. They ended up rewriting everything on the JVM and being really invested in Scala. There was this paper called "Your Server as a Function" that came out. It was talking about how they built high performance network services at Twitter. It introduces this concept called a future, among other things. A future is like a promise, but different. It's a little tough. Basically, they represent the exact same ideas, but they have a different API. I'm not going to get into that at this particular moment. We saw this happening in the Scala world and we were, "This seems awesome." We know that callbacks are not great because we've learned from Node and promises are better. Futures are a different version of promise and they seem to fit a little bit better. Scala is also statically typed; we're statically typed, let's do what they're doing.

In August of 2016, we announced zero cost futures in Rust. We wanted the ability to do this asynchronous programming in a better way. You’ll notice that date there, 2016, still turning along. Here's what futures look like at 0.1 in 2016. You have this trait called future. I don't know if all of you are Rust programmers or not, but basically a trait is an interface. This interface has two required types: an item type and an error type. There's this function called poll, which would you call, and it would return either the item or an error. There's also a ton of other convenience functions on here, like "and_then," so you get this code at the bottom that looks very similar to that promise code, except where it says "and_then" instead of "then," because you can't name everything the same way across languages. That would be too hard.

You get the same idea of, "Ok, we can write these things. This is asynchronous computation that processes these things in this order." You build up these giant chains of futures. Then that chain of futures becomes a task. You submit it to the asynchronous system, and it works. Things were totally fine. As I mentioned, promises and futures are a little different. One of the other major features other than the interface aspect of promises and futures is that promises are built into the JavaScript language. JavaScript as a language has a runtime, which means, among other things, that a promise can start executing when you make it because it's a construct the language knows and it's able to spin stuff up in the background, because JavaScript assumes that everything is asynchronous. Nothing happens in the order that you expect.

This is simpler in a lot of ways than the other way I'm about to describe you. One of the problems is, if you start executing your thing immediately, you need an allocation for every single thing that you've chained on because you don't know at the time that you're building the thing. It's already running before you’ve finished putting it together, in some cases. It works, but it's not super-efficient. Futures, on the other hand, are not built into Rust, the language, they're an entirely a library, because, again, we can't have a runtime or we can't have all this built-in stuff. We need to be as low level as C. We can't afford to just do that.

That also means that unlike promises in JavaScript, you build up this chain of futures and nothing actually happens with them at all until you explicitly hand them over to an executor object and say, "Please run this for me in the background." They don't do anything until the poll method is called. This is a little more complicated, but it's also extremely efficient because you know how big the task is going to be when you create it. Unlike green threads where you make a small thing, and then when it grows too big, you have to reallocate and do all that other stuff; we can know at compile time exactly how big a task needs to be and allocate exactly the right amount of memory and never grow or shrink that stack. You will always use a perfect exact size allocation, which is really cool.

It also means you only have one allocation. If you're willing to do it put certain limits on it, you can actually do it without dynamic allocation, you can do it statically, but this is super hyper efficient. Also, because we have compiler and compilers are awesome and we do everything at compile time, the futures actually compile down into the perfect state machine that you would write by hand if you were doing the evented IO yourself. It becomes an extremely efficient way to do this kind of asynchronous computation. It is also easier to write because it's not a bunch of nested callbacks.

Here's an example of using future 0.1. This is using a package called Tokio which is an executive that exists. You have an IP address and you create a listener. Then listener.incoming is the thing that returns the sequence of futures in a row. On each of those, you do nothing. Then if there's an error, you print it out. This doesn't do anything but it just accepts connections. Then at the bottom here, you'll see Tokio run; you're passing that server this chain of futures into the executor and it executes. It's the model, the way that it worked in 2015.

We Used Futures 0.1 to Build Stuff

We need to know if it's good or not, so we built a bunch of stuff on top of it. That was really cool, but it also had some problems. There was a thing called future 0.2. One of the big problems with 0.1 and 0.2 - you can see the difference here- there's this context argument, a task context. Future 0.1 relied on thread local storage to deal with some bookkeeping around the future package. That meant it relied on thread local storage, which means you couldn't use it on things like bare metal. That became a problem. The intention was, what if we made this context explicit? That would help alleviate the situation.

That was ok, but we knew it wasn't perfect, and we were going to do more stuff. Someone is, "Do you suggest that the ecosystem should break between one and two?" Or, "Should we wait for three?" Aaron, who wrote this blog post, said, "You shouldn't move everything over. Two is a good enough snapshot. We're going to have three coming out in a couple of months or less." Then people are, "That seems a little bad because you need to try it out in order to use it, but if none of your libraries update, how are you supposed to try it out? Then if it's going to change again, why would anyone try it out in the first place?"

This person was smart because as you can see, one year ago we said it should be coming in a couple months. That's been a situation. Going back to JavaScript for a moment, async/await. Here are three examples of writing different code; we have the original callback time, do you make a request, do you pass in a callback? Then there are promises where it's similar, but you chain then. It looks the same in this situation, but it avoids that right-hand drift I showed you before. Finally, C# actually originally conceived this feature, but since I was using JavaScript earlier, I decided to keep it in JavaScript. Here, you say you have an async function instead. Then you write await request. This actually does a transformation. To be able to do the same thing as the code above, we get to write it in a way that looks like it's blocking. It goes back to writing code in a normal fashion instead of a weird, nested chain of things. Yes, C# actually invented this first. JavaScript was, "This will save us from a million promises being terrible." Life is good over there, because it lets you write code that looks like it's not synchronous, but it actually is asynchronous.

Async/Await is More Important in Rust Than in Other Languages

Here's the thing that's interesting, which we didn't actually realize until we played the future 0.2 even more; async/await is actually more significant to Rust than in other languages. In C# and JavaScript it's mostly a convenient feature, because writing promises by hand is a little terrible but it does work. In Rust, there's no garbage collector. Remember that thing about how in the promise world you create all these individual allocated things everywhere? If we did the same thing with futures - which we don't but you're keeping track of this future access being a thing in the next future's chain and you're dealing with all that - it turns out that that's real hard. Rust didn't understand that at all. Here's an example of a synchronous API that does read. This is a function that takes in itself and a buffer and returns how many bytes were read or an error. It would look like this: you have the buffer, you have a cursor, you read in 124 byte chunks and you put stuff in the cursor. Then this is a very straightforward synchronous API.

How do we turn this into futures? We have this great design for futures, how do we replace the synchronous version with the async version? The type signature turns into this instead. It turns out that all the code you need is actually too big to fit on the slide. I just didn't even include it. The reason why is because once you start chaining these futures together, the Rust compiler doesn't understand how you're using memory anymore. It gives you the most restrictive thing possible, which means that your life is not good. It actually is really terrible. The reason for this is due to that, you create the thing first, and then you give it to the executor, and then it executes.

The constraints on checking that all the memory works the right way is different when you make that future, than when you actually hand it off to be executed. That change is the thing that the compiler just has no idea how to handle whatsoever. This meant we needed to introduce more allocations, we needed to introduce more in runtime checking for a thing that does not actually need it, but just to satisfy the compiler. Then this is in a similar way to the green thread versus native thread abstraction, where the green threads got to be too expensive; this super hyper efficient thing didn't actually do that for any real world code. It turned out the code you had to write to use the hyper efficient thing was very inefficient. We're now back to a place where this is not fast and it's hard to write. Life is terrible.

Async/await to the rescue. With async/await, the compiler and the borrow checker can understand the way that these things interact with each other. Your life becomes good again. Here's that same API but with async/await. You’ll notice the only difference is async block wrapped around it and a .await there. Otherwise, it looks exactly the same as the original blocking code. We've now solved these ergonomic issues through an async/await. But there were other problems; there are always more problems. It turns out that not all futures can error. Because we had conceived futures as being an IO thing, we put error into the futures itself. It turns out that it's not just an IO thing, because once you start using futures, they infect your entire code base and not all the things that you're doing are things that can actually fail. This means you add a ton of boilerplate for things that are infallible, which seems terrible because you're making things that are easy, more complicated than they need to be.

The final design that we landed on for futures and the one that is in the standard library soon is called standard future. It now has a single output type. The signature has changed in a bunch of other interesting ways. There's this pin thing I'll get to later. We still have our context, but now we just return the output type. Pin is part of how async/await teaches the borrow checker how the code works. I'm not going to get into it, because frankly, I can give you a 45-minute talk just on pin itself. It's complicated, it doesn't matter. The only people who ever interact with this particular part of the API are people who are writing low level libraries in Rust and you as a user cannot think about it. I'm just going to ignore that it exists. Other than to say that that's what taught the borrower checker how to do these things.

Then finally, if you need a future that has an error, you just compose a regular old result type as the output type. Futures that don't have errors are fine. Futures that do have errors still work. Life is good, isn't it? No, it turns out there are more problems. What syntax to use for async/await? You would think this is a simple question, but it turns out that when you're designing a programming language, you have to think about edge cases. It turns out, there are a lot of edge cases that are specific to Rust that other languages didn't have to deal with. Java and C# both do await space value. Nice and straightforward. Everything is fine.

We have this question mark thing for error handling. You're going to use it a lot, because async IO has errors. It's IO, it happens. What happens if you were to copy that syntax, but you needed to use the question mark for error handling? Does await value question mark wear its parentheses is like this, or like this? When you're writing this, what you want is almost always the second one, but the syntax implies the first one. It was, "Should we change the precedent of the question mark operator but only when await exists?" People were, "That does not seem like a good idea." Other people were, "That's the only possible value."

We did what most good programmers do in this situation. One more thing: Upload also starts chaining this with spaces; this gets nasty really quickly. This still happens in JavaScript, sometimes in my understanding, I'm not a super big JavaScript person. Anyway, we did what everyone does on the internet. We argued, and we argued, and then we argued some more. People proposed every possible syntax for this. People were saying, "Maybe we should use an upside down question mark for the one that has the different precedence." Every possible aspect of this design was fought over. It's not a joke. This is multiple threads on our discourse instance. Four hundred and seventy eight on one of those, 212, the other one. Somebody else posted this screenshot of their email inbox. You can see those numbers, they're 25, 35, 100, 67, 8, almost all of these are about async/await, all individual threads. These are just the top two. There are actually more threads than this, like 168 and 133. People really like talking about syntax as it turns out. The problem was that everybody hated all the options that were not their favorite, but there was no clear winner because that's just what happens a lot of times with syntax. Everything has drawbacks, everything has positives.

At some point, we decided, "We can't let this incredibly important feature to Rust just be stuck down in the syntax decisions." We picked one. By "we," I mean other people. I didn't have to do that, thank God. As you’ll notice, we have this .await here. You may have noticed that it was different than the C# thing. With no errors, you call just .await. No parentheses because it's not a method call, it's a keyword. With errors, you just add the question mark. Everything looks great. Some people really viscerally hate this; other people are, "It seems weird at first, and then 15 minutes later, I forgot about it. It's fine."

I think that in the end, that's really how it's going to happen. We ended up with a weird await syntax. It just happens. It’s the best choice for Rust, I think, in the end. It was not my preferred option, but I just want to be able to use the feature. I think it's good after using it for a while. There's other stuff we're doing with async/await as well that have additional ergonomic improvements. This is an example of a crate called runtime. This is a UDP example that just repeats what you say to it. There's this runtime main annotation on the main function. You know how it's, "You have to submit your futures to an executor"? It turns out this is a macro that does that for you. You get code that looks like the Node example where you don't submit anything, but it turns out it rewrites the code so that you do, and that's kind of cool. We end up looking more and more similar to Node. This has no syntax highlighting, it gets a little messy but it looks better in an editor.

One really wild thing is that you can use async/await in web assembly with Rust. This is an example of a program that I wrote recently in async/await and it fetches some information from a database. Then there's some actual processing here I chopped out. What it does is, it returns it as part of the response. You’ll notice here this wasm entry file, it returns a promise. That's a JavaScript promise bound to Rust. What happens is this is called a Get in here, data.get. That is a call to a database that returns a promise - a JavaScript promise, because it's a JavaScript API to the database, and you need to do stuff with it. Then you turn it into a future with JS future from, and this is a specific kind of future that knows how to wrap a JavaScript promise and turn it into a Rust future.

Then you can call await on it. Then at the end, there's this future to promise that will take any future you have and convert it back into a JavaScript promise. Yes, this code in wasm is a JavaScript promise wrapped up in a Rust future, wrapped up in a JavaScript promise, and then returned through Rust back into web assembly back into JavaScript. This works, it's totally fine - depending on your definition of fine, I guess. It is a thing that if you go to my website right now, this code will execute, it's a thing. You can do all sorts of fun stuff with the compilers. They're great.

Finally Landing in Rust 1.37, or Maybe 1.38

As I wrap up with this long and tortured story about all these different options and how we went through every single possible thing that could have existed and figured out that all of them were terrible until our current situation, this is finally landing in Rust 1.37, or maybe Rust 1.38, dependent on that last little tiny issue that I was talking about, which is fine. However, I wrote this slide. This morning, I woke up and I was going to go to the conference, and I looked at Twitter, and this is actually wrong now. It turns out that earlier this morning, boats tweeted we are actually stabilizing async/await for real and that means that it will land in Rust 1.38 for sure.

Finally, that little tiny thing it doesn't matter. We figured it out, it's cool. This Rust 1.38 is landing in, I think it's the 15th of August, so roughly in August. But we did it. Years later, in some sense starting in the late '90s, bt that's before Rust, technically starting around the 2010, 2011 era. Eight or nine years later, we finally figured out how to do IO and Rust efficiently, and it works. If you look at the latest version of the tech and power benchmark - which is a thing that benchmarks different website things - this is not the latest report, this is a build that happened last night or last week. Actix is one of these in Rust. You can see that it is dominating the benchmarks. There are a whole bunch of other frameworks below there. The next closest thing is 62% of the speed it does. We're able to get really high performance out of Rust network services, thanks to spending all this time, sorting through all these examples. Working through every possible thing led us to the finest fastest thing in the end, but which is also reasonably ergonomic to write.

Two lessons to take you away in my last 30 seconds here: it turns out that a world class IO system implementation takes years. You can't just make it happen. Finally, different languages have different constraints. This means that you can't really argue that Go or Erlang style is better or worse than Rust style. They have different pros and cons in all directions. Just because you like a thing in one language and you move to a different language does not mean that the other language did it a different way for a wrong reason. We couldn't always just reuse things from other languages because we had different constraints in our designs.