We have done a lot of performance testing of OVN over time, but one major thing missing has been an apples-to-apples comparison with the current OVS-based OpenStack Neutron backend (ML2+OVS). I’ve been working with a group of people to compare the two OpenStack Neutron backends. This is the first piece of those results: the control plane. Later posts will discuss data plane performance.

Control Plane Differences

The ML2+OVS control plane is based on a pattern seen throughout OpenStack. There is a series of agents written in Python. The Neutron server communicates with these agents using an rpc mechanism built on top of AMQP (RabbitMQ in most deployments, including our tests).

OVN takes a distributed database-driven approach. Configuration and state is managed through two databases: the OVN northbound and southbound databases. These databases are currently based on OVSDB. Instead of receiving updates via RPC, components are watching relevant portions of the database for changes and applying them locally. More detail about these components can be found in my post about the first release of OVN, or even more detail is in the ovn-architecture document.

OVN does not make use of any of the Neutron agents. Instead, all required functionality is implemented by ovn-controller and OVS flows. This includes things like security groups, DHCP, L3 routing, and NAT.

Hardware and Software

Our testing was done in a lab using 13 machines which were allocated to the following functions:

1 OpenStack TripleO Undercloud for provisioning

3 Controllers (OpenStack and OVN control plane services)

9 Compute Nodes (Hypervisors)

The hardware had the following specs:

2x E5-2620 v2 (12 total cores, 24 total threads) 64GB RAM 4 x 1TB SATA 1 x Intel X520 Dual Port 10G



Software:

CentOS 7.2

OpenStack, OVS, and OVN from their master branches (early December, 2016)

Neutron configuration notes (OVN) 6 API workers, 1 RPC worker (since rpc is not used and neutron requires at least 1) for neutron-server on each controller (x3) (ML2+OVS) 6 API workers, 6 RPC workers for neutron-server on each controller (x3) (ML2+OVS) DVR was enabled



Test Configuration

The tests were run using OpenStack Rally. We used the Browbeat project to easily set up, configure, and run the tests, as well as store, analyze, and compare results. The rally portion of the browbeat configuration was:

rerun: 3 ... rally: enabled: true sleep_before: 5 sleep_after: 5 venv: /home/stack/rally-venv/bin/activate plugins: - netcreate-boot: rally/rally-plugins/netcreate-boot - subnet-router-create: rally/rally-plugins/subnet-router-create - neutron-securitygroup-port: rally/rally-plugins/neutron-securitygroup-port benchmarks: - name: neutron enabled: true concurrency: - 8 - 16 - 32 times: 500 scenarios: - name: create-list-network enabled: true file: rally/neutron/neutron-create-list-network-cc.yml - name: create-list-port enabled: true file: rally/neutron/neutron-create-list-port-cc.yml - name: create-list-router enabled: true file: rally/neutron/neutron-create-list-router-cc.yml - name: create-list-security-group enabled: true file: rally/neutron/neutron-create-list-security-group-cc.yml - name: create-list-subnet enabled: true file: rally/neutron/neutron-create-list-subnet-cc.yml - name: plugins enabled: true concurrency: - 8 - 16 - 32 times: 500 scenarios: - name: netcreate-boot enabled: true image_name: cirros flavor_name: m1.xtiny file: rally/rally-plugins/netcreate-boot/netcreate_boot.yml - name: subnet-router-create enabled: true num_networks: 10 file: rally/rally-plugins/subnet-router-create/subnet-router-create.yml - name: neutron-securitygroup-port enabled: true file: rally/rally-plugins/neutron-securitygroup-port/neutron-securitygroup-port.yml

This configuration defines several scenarios to run. Each one is set to run 500 times, at three different concurrency levels. Finally, “rerun: 3” at the beginning says we run the entire configuration 3 times. This is a bit confusing, so let’s look at one example.

The “netcreate-boot” scenario is to create a network and boot a VM on that network. The configuration results in the following execution:

Run 1 Create 500 VMs, each on their own network, 8 at a time, and then clean up Create 500 VMs, each on their own network, 16 at a time, and then clean up Create 500 VMs, each on their own network, 32 at a time, and then clean up

Run 2 Create 500 VMs, each on their own network, 8 at a time, and then clean up Create 500 VMs, each on their own network, 16 at a time, and then clean up Create 500 VMs, each on their own network, 32 at a time, and then clean up

Run 3 Create 500 VMs, each on their own network, 8 at a time, and then clean up Create 500 VMs, each on their own network, 16 at a time, and then clean up Create 500 VMs, each on their own network, 32 at a time, and then clean up



In total, we will have created 4500 VMs.

Results

Browbeat includes the ability to store all rally test results in elastic search and then display them using Kibana. A live dashboard of these results is on elk.browbeatproject.org.

The following tables show the results for the average times, 95th percentile, Maximum, and minimum times for all APIs executed throughout the test scenarios.

API ML2+OVS Average OVN Average % improvement nova.boot_server 80.672 23.45 70.93% neutron.list_ports 6.296 6.478 -2.89% neutron.list_subnets 5.129 3.826 25.40% neutron.add_interface_router 4.156 3.509 15.57% neutron.list_routers 4.292 3.089 28.03% neutron.list_networks 2.596 2.628 -1.23% neutron.list_security_groups 2.518 2.518 0.00% neutron.remove_interface_router 3.679 2.353 36.04% neutron.create_port 2.096 2.136 -1.91% neutron.create_subnet 1.775 1.543 13.07% neutron.delete_port 1.592 1.517 4.71% neutron.create_security_group 1.287 1.372 -6.60% neutron.create_network 1.352 1.285 4.96% neutron.create_router 1.181 0.845 28.45% neutron.delete_security_group 0.763 0.793 -3.93%

API ML2+OVS 95% OVN 95% % improvement nova.boot_server 163.2 35.336 78.35% neutron.list_ports 11.038 11.401 -3.29% neutron.list_subnets 10.064 6.886 31.58% neutron.add_interface_router 7.908 6.367 19.49% neutron.list_routers 8.374 5.321 36.46% neutron.list_networks 5.343 5.171 3.22% neutron.list_security_groups 5.648 5.556 1.63% neutron.remove_interface_router 6.917 4.078 41.04% neutron.create_port 5.521 4.968 10.02% neutron.create_subnet 4.041 3.091 23.51% neutron.delete_port 2.865 2.598 9.32% neutron.create_security_group 3.245 3.547 -9.31% neutron.create_network 3.089 2.917 5.57% neutron.create_router 2.893 1.92 33.63% neutron.delete_security_group 1.776 1.72 3.15%

API ML2+OVS Maximum OVN Maximum % improvement nova.boot_server 221.877 47.827 78.44% neutron.list_ports 29.233 32.279 -10.42% neutron.list_subnets 35.996 17.54 51.27% neutron.add_interface_router 29.591 22.951 22.44% neutron.list_routers 19.332 13.975 27.71% neutron.list_networks 12.516 13.765 -9.98% neutron.list_security_groups 14.577 13.092 10.19% neutron.remove_interface_router 35.546 9.391 73.58% neutron.create_port 53.663 40.059 25.35% neutron.create_subnet 46.058 26.472 42.52% neutron.delete_port 5.121 5.149 -0.55% neutron.create_security_group 14.243 13.206 7.28% neutron.create_network 32.804 32.566 0.73% neutron.create_router 14.594 6.452 55.79% neutron.delete_security_group 4.249 3.746 11.84%

API ML2+OVS Minimum OVN Minimum % improvement nova.boot_server 18.665 3.761 79.85% neutron.list_ports 0.195 0.22 -12.82% neutron.list_subnets 0.252 0.187 25.79% neutron.add_interface_router 1.698 1.556 8.36% neutron.list_routers 0.185 0.147 20.54% neutron.list_networks 0.21 0.174 17.14% neutron.list_security_groups 0.132 0.184 -39.39% neutron.remove_interface_router 1.557 1.057 32.11% neutron.create_port 0.58 0.614 -5.86% neutron.create_subnet 0.42 0.416 0.95% neutron.delete_port 0.464 0.46 0.86% neutron.create_security_group 0.081 0.094 -16.05% neutron.create_network 0.113 0.179 -58.41% neutron.create_router 0.077 0.053 31.17% neutron.delete_security_group 0.092 0.104 -13.04%

Analysis

The most drastic difference in results is for “nova.boot_server”. This is also the one piece of these tests that actually measures the time it takes to provision the network, and not just loading Neutron with configuration.

When Nova boots a server, it blocks waiting for an event from Neutron indicating that a port is ready before it sets the server state to ACTIVE and powers on the VM. Both ML2+OVS and OVN implement this mechanism. Our test scenario measured the time it took for servers to become ACTIVE.

Further tests were done on ML2+OVS and we were able to confirm that disabling this synchronization between Nova and Neutron brought the results back to being on par with the OVN results. This confirmed that the extra time was indeed spent waiting for Neutron to report that ports were ready.

To be clear, you should not disable this synchronization. The only reason you can disable it is because not all Neutron backends support it (ML2+OVS and OVN both do). It was put in place to avoid a race condition. It ensures that the network is actually ready for use before booting a VM. The issue is how long it’s taking Neutron to provision the network for use. Further analysis is needed to break down where Neutron (ML2+OVS) is spending most of its time in the provisioning process.