In a company I work for we want to use newest technologies, with different success rate, but hey! We’re trying! Sometimes it can be quite difficult since we are forced to use our proxmox cluster, which compared to AWS or OpenStack is very static and quite oldish. On the other hand we are using Ansible to create Infrastructure as Code approach, provision and manage everything. However this approach had one issue, since we are using Prometheus as a monitoring solution we haven’t had any way (apart from running ansible playbook against this host) to update it’s targets when we provisioned a new VM. We also didn’t have any centralized configuration store or any dynamic service discovery solution, what we had was a barely working DHCP based on dnsmasq and plans to add DNS server. So why not use it?

Episode zero: That One which answers the question: What do we have?

Before we start implementing anything I need to tell you a little about our setup. As you know we are using proxmox cluster (newest version, provisioned with ansible ofc), and we create VMs by using “linked clone” option since proxmox doesn’t have cloud-init. Our golden image consists of some security hardening options, simple networking configuration (every interface with DHCP), user configuration, and node_exporter installed. After cloning, we assign an FQDN as a hostname as this allows us to easily use SSH bastion hosts and avoid name collisions.

Episode one: That One with DNS

As I mentioned we were using dnsmasq as our DHCP and DNS server, since it gave us everything we needed. It was obvious for us to use it for this task too. However after a lot of research and many trials and errors it turned out that we need to reach for bigger guns. We decided to use dedicated software for DNS and for DHCP. For DNS we went with the newest kid on the block — CoreDNS. It is an opensource DNS server written in Go, packaged in single binary and it is easily configurable using variety of plugins. For our purposes we chose to use one of those plugins — hosts plugin, which allows serving DNS zone data from a /etc/hosts style file like:

192.168.1.1 node01.example.org

192.168.1.2 node02.example.org

Moreover since CoreDNS can be used as an authoritative dns server for multiple domains, something that one instance of dnsmasq couldn’t do. All this in one simple and very readable Corefile, which in our case looked something like that:

That configuration provides us three different zone files (all /etc/hosts style ofc) for respectively example.org, example.net, and example.com as well as forwarding dns queries in case they didn’t match those domains. As I am big fan of containerizing everything DNS server has to work as a docker container (with 2 volumes, one for configuration, the other one with zone data) using the following command:

$ docker run — net=host — name dns -v “/opt/Corefile”:/Corefile -v “/opt/dns_hosts”:/dns_hosts coredns/coredns

Episode two: That One with Service Discovery

I’ve already mentioned that dnsmasq serves us as DHCP server but that is mostly beceause simplicity of configuration and promise of it’s future usage as DNS server. As I had resigned from the second one, I have decided to switch to ISC DHCP Server giving me more configuration options and more familiar environment. What’s more dnsmasq blocks domain part of client hostname which has critical meaning in terms of DNS SD solution.

Like with DNS server, my DHCP server works in a docker container this time with 3 volumes — first for configuration, second for scripts and third for dns zone data.

$ docker run — net=host — name dhcp -v “/opt/dhcp”:/data -v “/opt/dhcp_scripts”:/dhcp_scripts -v “/opt/dns_hosts”:/dns_hosts eth0

What scripts and why zone data? Ha! Here we come to the part where our magic happens! Whenever lease is granted, released or expired I want to update DNS type A records with clients FQDN so other hosts can access it by FQDN not via IP address. After lots of hours spent on reading docs, stracing applications, debugging network issues, I ended with /opt/data/dhcpd.conf file with following rules providing execution of my script :

Whenever something happens to lease update_dns.sh script is invoked with given parameters:

“add”/”del” — a string indicating wheather we want to add or delete specific DNS entry

“clip” — a string containing leased IP address

“option host-name” — another string containing hostname of the machine with given lease

According to what I said before domain part of client hostname is quite important. In fact to make this configuration work properly minimum configuration is needed on the client-site. First of all client machine must use a fully-qualified domain name as hostname

Second of all you have to ensure that dhcp client sends FQDN as a host-name . It can be done by ensuring that the following line is present in /etc/dhcp/dhclient.conf file:

host-name = gethostname();

Next part of this network magic happens by running simple bash script I called update_dns.sh (which you can see below). Basically it does two things, it can update specific DNS zone file with entries like 192.168.1.34 node.example.net and it can remove all such pairs from those files. As a security measure it verifies if domain send by a client is whitelisted or not.

Since CoreDNS is aware of file changes and quite quickly reloads it’s configuration I can easily serve dynamically updated A/AAAA and PTR records coresponding to DHCP clients in my network.

Episode three: That One with Prometheus

Now I have working something called DNS Service Discovery, why not use it? It would be nice to have VMs automatically register themselves in a monitoring system, right? Fortunately we are using Prometheus (installed with Ansible from cloudalchemy suite, more about it here: https://demo.cloudalchemy.org) which comes with multiple service discovery options. The one we are using frequently is file_sd (file service discovery), which needs an YAML file with some targets for prometheus to scrape data from. As for targets we are using node_exporter which comes bundled in our base VM image. Now I just need to generate a simple YAML file based on information from DNS server — just write another script (this time in python):

I’ve used pyinotify for monitoring all changes in zone files. When change is detected the new targets.yml file is generated and Prometheus starts to scrape new data.

Epilogue

You could probably ask “why didn’t you use etcd/consul/other_tech if you just wanted service discovery?”. But why would I? Service Discovery isn’t a new concept, it has been done in the past and DNS even comes with special record just for this purpose — the SRV record (yeah, I know, I have used an “A” record in my setup, but SRV is comming!). And since you basically cannot use internet without DNS server, why not use it also for other purposes? And if it breaks you can just say old SysAdmin mantra: “it was DNS!”.