Playing with bhyve

Here’s a look at Gea’s popular All-in-one design which allows VMware to run on top of ZFS on a single box using a virtual 10Gbe storage network. The design requires an HBA and a CPU that supports VT-d so that the storage can be passed directly to a guest VM running a ZFS server (such as OmniOS or FreeNAS). Then a virtual storage network is used to share the storage back to VMware.

VMware and ZFS: All-In-One Design

bhyve, can simplify this design since it runs under FreeBSD it already has a ZFS server. This not only simplifies the design, but it could potentially allow a hypervisor to run on simpler less expensive hardware. The same design in bhyve eliminates the need to use a dedicated HBA and a CPU that supports VT-d.

Simpler bhyve design

I’ve never understood the advantage of type-1 hypervisors (such as VMware and Xen) over Type-2 hypervisors (like KVM and bhyve). Type-1 proponents say the hypervisor runs on bare metal instead of an OS… I’m not sure how VMware isn’t considered an OS except that it is a purpose-built OS and probably smaller. It seems you could take a Linux distribution running KVM and take away features until at some point it becomes a Type-1 hypervisor. Which is all fine but it could actually be a disadvantage if you wanted some of those features (like ZFS). A type-2 hypervisor that supports ZFS appears to have a clear advantage (at least theoretically) over a type-1 for this type of setup.

In fact, FreeBSD may end up becoming the best all-in-one virtualization/storage platform. You get ZFS and bhyve, and also jails. You really only need to run bhyve when virtualizing a different OS.

bhyve is still pretty young, but I thought I’d run some tests to see where it’s at…

Environments

This is running on my X10SDV-F Datacenter in a Box Build.

In all environments the following parameters were used:

Supermico X10SDV-F

Xeon D-1540

32GB ECC DDR4 memory

IBM ServerRaid M1015 flashed to IT mode.

4 x HGST Ultrastar 7K300 HGST 2TB enterprise drives in RAID-Z

One DC S3700 100GB over-provisioned to 8GB used as the log device.

No L2ARC.

Compression = LZ4

Sync = standard (unless specified).

Guest (where tests are run): Ubuntu 14.04 LTS, 16GB, 4 cores, 1GB memory.

OS defaults are left as is, I didn’t try to tweak the number of NFS servers, sd.conf, etc.

My tests fit inside of ARC. I ran each test 5 times on each platform to warm up the ARC. The results are the average of the next 5 test runs.

I only tested an Ubuntu guest because it’s the only distribution I run in (in quantity anyway) addition to FreeBSD, I suppose a more thorough test should include other operating systems.

The environments were setup as follows:

1 – VM under ESXi 6 using NFS storage from FreeNAS 9.3 VM via VT-d

FreeNAS 9.3 installed under ESXi.

FreeNAS is given 24GB memory.

HBA is passed to it via VT-d.

Storage shared with VMware via NFSv3, virtual storage network on VMXNET3.

Ubuntu guest given VMware para-virtual drivers

2 – VM under ESXi 6 using NFS storage from OmniOS VM via VT-d

OmniOS r151014 LTS installed under ESXi.

OmniOS is given 24GB memory.

HBA is passed to it via VT-d.

Storage shared with VMware via NFSv3, virtual storage network on VMXNET3.

Ubuntu guest is given VMware para-virtual drivers

3 – VM under FreeBSD bhyve

bhyve running on FreeBSD 10.1-Release

Guest storage is file image on ZFS dataset.

4 – VM under FreeBSD bhyve sync always

bhyve running on FreeBSD 10.1-Release

Guest storage is file image on ZFS dataset.

Sync=always

Benchmark Results

MariaDB OLTP Load

This test is a mix of CPU and storage I/O. bhyve (yellow) pulls ahead in the 2 threaded test, probably because it doesn’t have to issue a sync after each write. However, it falls behind on the 4 threaded test even with that advantage, probably because it isn’t as efficient at handling CPU processing as VMware (see next chart on finding primes).



Finding Primes

Finding prime numbers with a VM under VMware is significantly faster than under bhyve.

Random Read

byhve has an advantage, probably because it has direct access to ZFS.

Random Write

With sync=standard bhyve has a clear advantage. I’m not sure why VMware can outperform bhyve sync=always. I am merely speculating but I wonder if VMware over NFS is translating smaller writes into larger blocks (maybe 64k or 128k) before sending them to the NFS server.

Random Read/Write

Sequential Read

Sequential reads are faster with bhyve’s direct storage access.

Sequential Write

What not having to sync every write will gain you..

Sequential Rewrite

Summary

VMware is a very fine virtualization platform that’s been well-tuned. All that overhead of VT-d, virtual 10gbe switches for the storage network, VM storage over NFS, etc. are not hurting its performance except perhaps on sequential reads.

For as young as bhyve is I’m happy with the performance compared to VMware, it appears to be slower on the CPU intensive tests. I didn’t intend on comparing CPU performance so I haven’t done enough variety of tests to see what the difference is there but it appears VMware has an advantage.

One thing that is not clear to me is how safe running sync=standard is on bhyve. The ideal scenario would be honoring fsync requests from the guest, however, I’m not sure if bhyve has that kind of insight from the guest. Probably the worst case under this scenario with sync=standard is losing the last 5 seconds of writes–but even that risk can be mitigated with battery backup. With standard sync there’s a lot of performance to be gained over VMware with NFS. Even if you run bhyve with sync=always it does not perform badly, and even outperforms VMware All-in-one design on some tests.

The upcoming FreeNAS 10 may be an interesting hypervisor + storage platform, especially if it provides a GUI to manage bhyve.