In this blog post I will describe how to port hyper to use may for asynchronous IO. You will see how easy to porting the thread based code to coroutine based. And also get the high performance that may powers up.

About Hyper

hyper is rust most important Http library that is widely used and it’s trying to give the best asynchronous IO experience for users. Currently hyper is using tokio for async io. But it’s also having a v0.10.x tag that is thread based which is used by Rocket project.

I will port the thread based version to the coroutine based version for hyper.

The test server is the simple hello example in hyper and I will list all the bench results also.

Bench Settings

Machine Specs:

Logical Cores: 4 (4 cores x 1 threads)

4 (4 cores x 1 threads) Memory: 4gb ECC DDR3 @ 1600mhz

4gb ECC DDR3 @ 1600mhz Processor: CPU Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz

CPU Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz Operating System: Ubuntu VirtualBox guest

Bench client

wrk

The master that using tokio

Suppose that you have clone the hyper repo in local. Just checkout the master branch of it and run the following commands to start the server

1

2

3

4

$ git checkout origin/master

$ cargo run --example=hello --release

......

Listening on http://127.0.0.1:3000 with 1 thread.



and the bench result is

1

2

3

4

5

6

7

8

9

10

$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200

Running 10s test @ http://127.0.0.1:3000

2 threads and 200 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 2.30ms 0.92ms 18.82ms 93.04%

Req/Sec 44.99k 7.35k 51.85k 88.50%

895392 requests in 10.03s, 110.15MB read

Requests/sec: 89283.87

Transfer/sec: 10.98MB

wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200 1.72s user 5.70s system 73% cpu 10.046 total



The v0.10.x that using thread

checkout the thread version code and run the server. By default it will spawn enough threads to keep all cpu busy.

1

2

3

4

$ git checkout origin/0.10.x

$ cargo run --example=hello --release

......

Listening on http://127.0.0.1:3000



and the bench result is:

1

2

3

4

5

6

7

8

9

10

$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200

Running 10s test @ http://127.0.0.1:3000

2 threads and 200 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 52.54us 56.39us 12.14ms 99.36%

Req/Sec 74.67k 7.90k 91.51k 65.00%

742924 requests in 10.04s, 62.35MB read

Requests/sec: 73998.86

Transfer/sec: 6.21MB

wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200 0.85s user 5.70s system 65% cpu 10.043 total



you can notice that the performance is not as good as previous one because the thread module doesn’t support async IO.

The coroutine based version

now apply the patch to the thread based version. this patch only has a few changes that just replace the necessary std APIs to may APIs and change the thread number to 1000 coroutines.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

$ git checkout -b coroutine

$ git remote add may https://github.com/Xudong-Huang/hyper

$ git fetch may

$ git cherry-pick may/master

$ git show HEAD --stat

......

Cargo.toml | 1 +

examples/hello.rs | 2 ++

src/client/pool.rs | 3 ++-

src/lib.rs | 1 +

src/net.rs | 4 ++--

src/server/listener.rs | 19 +++++++++++--------

src/server/mod.rs | 7 ++++---

src/server/response.rs | 7 ++++---

8 files changed, 27 insertions(+), 17 deletions(-)



first modify src/hello.rs to using only one thread

1

may::config().set_io_workers(1).set_stack_size(0x2000);



and the test result is

1

2

3

4

5

6

7

8

9

$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200

Running 10s test @ http://127.0.0.1:3000

2 threads and 200 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 3.26ms 6.48ms 105.13ms 98.35%

Req/Sec 38.84k 3.21k 45.75k 85.00%

773544 requests in 10.05s, 64.92MB read

Requests/sec: 76956.66

Transfer/sec: 6.46MB



Ok, it’s almost the same as the thread version.

now change the io workers to 3 and see what happened

1

may::config().set_io_workers(3).set_stack_size(0x2000);



bench result is:

1

2

3

4

5

6

7

8

9

10

$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200

Running 10s test @ http://127.0.0.1:3000

2 threads and 200 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 1.50ms 2.10ms 29.28ms 89.43%

Req/Sec 95.76k 19.05k 139.73k 58.50%

1910441 requests in 10.07s, 160.33MB read

Requests/sec: 189790.65

Transfer/sec: 15.93MB

wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200 1.86s user 10.68s system 124% cpu 10.071 total



much better now!

Conclusion