I honestly think most people simply are unaware of how much personal data they leak on a daily basis as they use their computers. Even if they have some inkling along those lines, I still imagine many think of the data they leak only in terms of individual facts, such as their name or where they ate lunch. What many people don't realize is how revealing all of those individual, innocent facts are when they are combined, filtered and analyzed.

Cell-phone metadata (who you called, who called you, the length of the call and what time the call happened) falls under this category, as do all of the search queries you enter on the Internet.

For this article, I discuss a common but often overlooked source of data that is far too revealing: your DNS data. You see, although you may give an awful lot of personal marketing data to Google with every search query you type, that still doesn't capture all of the sites you visit outside Google searches either directly, via RSS readers or via links your friends send you. That's why the implementation of Google's free DNS service on 8.8.8.8 and 8.8.4.4 is so genius—search queries are revealing, but when you capture all of someone's DNS traffic, you get the complete picture of every site they visit on the Internet and beyond that, even every non-Web service (e-mail, FTP, P2P traffic and VoIP), provided that the service uses hostnames instead of IP addresses.

Let me back up a bit. DNS is one of the core services that runs on the Internet, and its job is to convert a hostname, like www.linuxjournal.com, into an IP address, such as 76.74.252.198. Without DNS, the Internet as we know it today would cease to function, because basically every site we visit in a Web browser, and indeed, just about every service we use on the Internet, we get to via its hostname and not its IP. That said, the only way we actually can reach a host on the Internet is via its IP address, so when you decide to visit a site, its hostname is converted into an IP address to which your browser then opens up a connection. Note that via DNS caching and TTL (Time To Live) settings, you may not have to send out a DNS query every time you visit a site. All the same, these days TTLs are short enough (often ranging between one minute to an hour or two—www.linuxjournal.com's TTL is 30 minutes) that if I captured all your DNS traffic for a day, I'd be able to tell you every Web site you visited along with the first time that day you visited it. If the TTL is short enough, I probably could tell you every time you went there.

Most people tend to use whatever DNS servers they have been provided. On a corporate network, you are likely to get a set of DNS servers over DHCP when you connect to the network. This is important because many corporate networks have internal resources and internal hostnames that you would be able to resolve only if you talked to an internal name server.

Although many people assume very little privacy at work, home is a different matter. At home, you are most likely to use the DNS servers your ISP provided you, while others use Google's DNS servers because the IPs are easy to remember. This means even if others can't intercept your traffic (maybe you are sending it through a VPN, or maybe that kind of line tapping simply requires more legal standing), if they can get access to your DNS logs (I could see some arguing that this qualifies as metadata), they would have a fairly complete view of all the sites you visit without your ever knowing.

This is not just valuable data from a surveillance standpoint, or a privacy standpoint, but also from a marketing standpoint. Even if you may be fine with the government knowing what porn sites you browse, where you shop, where you get your news and what e-mail provider you use, you may not want a marketing firm to have that data.

Recursive DNS vs. DNS Caching

The key to owning your DNS data and keeping it private is to run your own DNS server and use it for all of your outbound DNS queries. Although many people already run some sort of DNS caching programs, such as dnsmasq to speed up DNS queries, what you want isn't simply a DNS cache, but something that can function as a recursive DNS resolver. In the case of dnsmasq, it is configured to use upstream recursive DNS servers to do all of the DNS heavy lifting (the documentation recommends you use whatever DNS servers you currently have in /etc/resolv.conf). Thus, all of your DNS queries for www.linuxjournal.com go to your DNS caching software and then are directed to, for instance, your ISP's DNS servers before they do the traditional recursive DNS procedure of starting at root name servers, then going to com, then finally to the name servers for linuxjournal.com. So, all of your queries still get logged at the external recursive DNS server.

What you want is a local DNS service that can do the complete recursive DNS query for you. In the case of a request for www.linuxjournal.com, it would communicate with the root, com and linuxjournal.com name servers directly without an intermediary and ultimately cache the results like any other DNS caching server. For outside parties to capture all of your DNS logs, they either would have to compromise your local, personal DNS server on your home network, set up a tap to collect all of your Internet traffic or set up a tap at all the root name servers. All three of these options are either illegal or require substantial court oversight.

Install and Configure Your DNS Server

So, even when you rule out pure DNS caching software, there still are a number of different DNS servers you can choose from, including BIND, djbdns and unbound, among others. I personally have the most experience with BIND, so that's what I prefer, but any of those would do the job. The nice thing about BIND, particularly in the case of the Debian and Ubuntu packages, is that all you need to do is run:

$ sudo apt-get install bind9

and after the software installs, BIND automatically is configured to act as a local recursive DNS server for your internal network. The procedure also would be the same if you were to set this up on a spare Raspberry Pi running the Raspbian distribution. On other Linux distributions, the package may just be called bind.

If BIND isn't automatically configured as a local recursive DNS server on your particular Linux distribution and doesn't appear to work out of the box, just locate the options section of your BIND config (often in /etc/bind/named.conf, /etc/bind/named.conf.options or /etc/named/named.conf, depending on the distribution), and if you can't seem to perform recursive queries, add the following line under the options{} section:

options { allow-recursion { 10/8; 172.16/12; 192.168/16; 127.0.0.1; }; . . . }

This change allows any hosts on those networks (internal RFC1918 IP addresses) to perform recursive queries on your name server without allowing the world to do so.

Once you have BIND installed, you'll want to test it. If you installed BIND on your local machine, you could test this out with the dig command:

$ dig @localhost www.linuxjournal.com ; <<>> DiG 9.8.1-P1 <<>> @localhost www.linuxjournal.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17485 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.linuxjournal.com. IN A ;; ANSWER SECTION: www.linuxjournal.com. 1800 IN A 76.74.252.198 ;; AUTHORITY SECTION: linuxjournal.com. 30479 IN NS ns66.domaincontrol.com. linuxjournal.com. 30479 IN NS ns65.domaincontrol.com. ;; Query time: 31 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Wed Dec 18 09:37:13 2013 ;; MSG SIZE rcvd: 106

Otherwise, replace localhost with the IP address of your Raspberry Pi or whatever machine on which you installed BIND. To use this name server for all of your requests, update your /etc/resolv.conf file so that it contains:

nameserver 127.0.0.1