Wow, that took a LOT of digging to find the issue. I also solved another problem I was having with performance as well. Turns out that Horizon doesn’t work very well at all with Google Chrome. Firefox it is from now on! After solving my slow console issue I was able to start doing some digging with tcpdump to start isolating my problem.

After looking at traffic flow both on an instance (I used NTS since it had all the network tools I needed in a live disk) I discovered that the compute host was actually seeing the traffic from the instance, but it was not passing it to or from the provider interface. That meant that all traffic was dying at the compute node causing instances to not get DHCP reservations or ICMP or ARP requests. A total network blackout, even between instances with static addresses configured.

The solution ended up being twofold. The big issue at hand was that the provider network interfaces must be in promiscuous mode to be able to accept traffic destined to instances. This is because the MAC addresses of the instances are different than the MAC address of the provider network, and without being in promiscuous mode the interface will ignore all packets not destined to itself.

The second half of the solution was a misconfiguration of my network. In trying to find a solution I had deleted all my networks and created them over again, sometimes multiple times to see if it was just a glitch. However the last time I created the network I left out a critical argument: –provider-physical-network. This argument corresponds to the configuration in /etc/neutron/plugins/ml2/linuxbridge_agent.ini under [linux_bridge] called physical_interface_mappings. In my case there is only one configuration of provider:ens6. When creating the network the physical network option should be ‘provider’ which will then tell neutron to use the interface ens6.

For example you could have multiple physical networks of something like public1 and plublic2 each on a separate interface. To configure that my physical_interface_mappings would look something like ‘physical_interface_mappings = public1:eth1,public2:eth2‘. I would then be able to configure two networks in neutron, one with physical network of public1, and the other with public2.

Unfortunately there’s no mention of promiscuous mode being a requirement in the CentOS install guide. I’m not sure if this is due to the guide being written for a slightly older version of CentOS where promiscuous mode isn’t required, or if it was just an oversight. Either way I went ahead and submitted a bug report to openstack-manuals so hopefully the install guide will be updated.

This has been a huge pain, but solving the issue feels so good. I’m now confident that I have a fully functioning OpenStack deployment! There are still a lot of things I want to tune up, and of course I still want to add block services, but that will have to wait for a bit. The biggest thing I want to fix now is that each compute node is only providing 50GB of ephemeral space total, so it’s not possible for me to launch many instances at the moment. This is a problem because one of my hosts has a whole 1TB disk that is basically empty. I’d like to figure out how to expand the available local storage so I can start more instances, but I haven’t had any luck with that so far.

I really hope this information can help someone else! Catch you next time!