What is ads.txt?

Ads.txt is an IAB Tech Lab project that was created to fight inventory fraud in the digital advertising industry. The idea is simple; publishers put a file on their server that says exactly which companies they sell their inventory through. The file lists partners by name, but also includes the publisher’s account ID. This is the same ID buyers see in a bid request, which they can use as a key for campaign targeting.

Buyers use a web crawler to download all the ads.txt files and the information contained within on a regular basis and use it to target their campaigns. This means buyers know that if they bid on request that comes from an authorized ID, it’s coming from a source the publisher trusts or has control over. Buyers seem to be taking the idea seriously, too. Just a week ago Digitas published an open letter on Digiday saying they won’t buy from any publisher without an ads.txt file.

Ads.txt isn’t a silver bullet for all inventory quality woes, but it is a dead simple solution. You’d be stupid not to lock the door to your house, even if it’s not a guarantee of safety, right? The important bit is that for the first time publishers have a tool against inventory fraud instead of relying on the programmatic tech alone.

Are you a developer or patient person? Then try the ads.txt crawler yourself

As part of the program’s release, Neal Richter, a long time ad technology veteran and one of the authors of the ads.txt spec wrote a simple web crawler in Python. The script takes a list of domains and parses the expected ads.txt format into a database, along with some other error handling bits.

Developers will probably find it a piece of cake to use and non-developers will struggle a bit, like I did. That said, I got it running after pushing through some initial frustration and researching how to get a small database running on my computer. I wrote a detailed tutorial / overview of how to get it working for anyone interested in a separate post.

12.8% of publishers have an ads.txt file

At least, among the Alexa 10K global domains that sell advertising. To get this stat, I took the Alexa top 10,000 domains, removed everything owned by Google, Amazon, and Facebook – which don’t sell their inventory through 3rd parties and therefore don’t need an ads.txt file – and removed the obvious pornography sites. After filtering, I had 9,572 domains to crawl. I sent all those through Neal’s crawler and found 1,930 domains selling ads, and 248 with an ads.txt file. 248 / 1,930 = 12.8%, voila!

Update: Nov 1, 2017

In the less than 6 weeks or so since I published my first analysis, ads.txt adoption has continued to mushroom and now stands at 44%. I’m astonished to see the adoption rate triple over such a short time frame and I have to imagine this sets a record for publisher embrace of any IAB standard. So what’s driving it? My own opinion is the primary driver is Google’s Doubleclick Bid Manager’s announcement that they’d stop buying unauthorized supply paths at the end of October, which had led to a big grassroots push among the major exchanges with their publisher clients.

Adoption is happening across all sizes and geos

The good news is that while adoption remains thin, it is accelerating quickly. My figure of 12.8% is a solid improvement from GetIntent’s figure of 1.2% posted about a month after the IAB released the ads.txt spec. While I couldn’t use the same methodology as them, I expect their top 1K domains would all be in the Alexa 10K list.

From a publisher perspective, major corporations like CBS, Hearst, Meredith, NYTimes, TimeWarner, Disney, Univision and others have posted ads.txt files for their domains. As it turns out though, most domains using ads.txt are not in the Alexa 10,000, so I also crawled the 3,511 domains Neal listed as participating on the Ads.txt github repo as of 9/11/2017.

I didn’t find all these sites to actually have a file, but I was able to successfully add another 2,448 domains from that list. This left me a total dataset of 2,696 domains, which I’m using for the rest of the data and analysis below.

Google is the most popular ads.txt partner, followed by the major SSPs

Within the 248 domains in the Alexa top 10K I found that had an ads.txt file, there were 114 unique ad tech partners listed. Across the entire dataset I found 137 ad tech partners.

Not surprisingly, Google was by far the most common, present in 97% of the Alexa 10K cohort, and 76% of all domains. The major ad exchanges, Rubicon Project, AppNexus, Index Exchange, and OpenX are the next most common partners and are the only other companies with greater than 50% penetration among the Alexa 10K, but aren’t nearly as pervasive overall. Google is the only partner in the full dataset with more than 50% penetration.

On average, ad tech partners were listed on 96 domains, or 3% penetration. However, because the data is relatively stacked toward major players, the median figure of 16 domains per partner is more qualitatively accurate in my view.

Publishers often list ad tech partners multiple times

One of the most surprising items for me is how often publishers are listing the same partner multiple times with multiple IDs in their file, especially among the Alexa 10K cohort. In fact, some partners like Rubicon Project, ContextWeb, Amazon, FreeWheel, TripleLift and others are more often listed many times than once.

It’s not clear to me if this is at all suspicious, but I doubt it. Rather, I’d guess that these partners pass some kind of child ID in their system in the bid request vs. a top level account ID and had to go this route to comply with the ads.txt spec vs. changing the structure of their bid request. Perhaps someone from one of those companies will clarify in the comments on what’s behind this. In terms of AppNexus at least, publishers would have many listing if they work with any ad tech company that uses AppNexus as a backend technology company.

Most publishers have direct & reseller partners

The ads.txt spec allows publishers to specify what kind of relationship they have with a particular partner – either direct or reseller. Direct means the publisher controls the account themselves, while reseller means they’ve authorized another company to sell their inventory and that company controls the listed account.

I don’t expect this matters much from a trust perspective, but buyers could decide how to supply path optimize between publisher accounts if they have their supply available on many exchanges by using this flag. Buyers will most likely prefer direct supply paths to reseller paths, as there is one less company in the value chain and fees are therefore likely lower through direct paths.

The average publisher has 8.7 accounts with 4.8 partners

The median number of accounts is 4.5, while the median number of partners is 2. The Alexa 10K cohort is much different though; those sites have an average of 23.7 accounts listed with 10.4 partners. The median number of accounts for this group is 12 across 8 partners.

This just means larger sites have at least three times as many accounts with four times as many partners. Logically this means those sites have more demand on their inventory and have much better yields. It pays to be at the top!

But there are some extreme ads.txt files out there

I’m not sure what I expected the average ads.txt file to look like. Obviously in a post header bidding world publishers are selling their inventory through more exchanges than ever before, but some of these files are shocking long. Wonderhowto.com, Gadgethacks.com, iMore.com, and WindowsCentral.com share the record for longest ads.txt file in my dataset listing 146 different accounts from 39 different vendors. On all these sites, Google has over half the accounts – 78 listed in total.

The CBSi websites weren’t far behind though, with sites like cbsnews.com and cnet.com listing 116 accounts across 38 vendors. CBS accounts are bit more evenly distributed across a longer list of partners here, though some like Google, AppNexus, and OpenX all have more than 10 IDs listed.

Ads.txt would benefit from a couple enhancements

First and perhaps easiest is to create an ads.txt validation tool. Publishers need a way to know if they have errors in their ads.txt file before they post it. A validation tool could help address mistakes like putting the type value in the tag value, misspelling the type value (receller / resseller / reseler), and misspelling the exchange names. Publishers are also surely screwing up their account IDs as well, but that’s a problem for each exchange to address. Edit: Shortly after I published this article, AppNexus took up the charge and released a free ads.txt validation tool.

A more formal structure would address key use cases for large, complex publishers

The txt approach seems designed for accessibility, but actually makes it harder to understand in many cases than XML / JSON would be. For example, what if a publisher uses some partners for their US inventory but not their international inventory? What if they use one partner for large impact formats like interstitials but don’t allow those partners to sell IAB standard sizes? The current format doesn’t allow publisher to provide this level of detail.

Many platforms also specialize in one media type or another – think native / mobile / video – but generalists serve those ads as well. As written, the spec leaves the door open to these vagaries and prevents it from being as effective as it could.

Shifting to IDs instead of text could help eliminate sloppy errors

Next, I’d like to see the spec use an ID based approach instead of text for values like type / exchange given the rampant misspelling issues. The text helps validate publisher intent, but should serve as an optional note versus a true dependency.

The IAB should assign IDs to each member company and list them out in the spec people are already reading to structure their file. Google = 101, AppNexus = 202 and so forth. They could create a blanket non-member ID as well to enable companies outside the IAB to operate, but identify them as such. Or, there could be some kind of nominal fee to acquire an ads.txt exchange ID, which would be an additional barrier for fraudsters.

Same thing for the type values; Direct = 1, Reseller = 2, something like that. And likewise for Channel / Media Type which are often listed in the notes but aren’t yet a required field. They should be made a required field and again be based on IDs.

To sum up – add a validator, move to XML / JSON structure, add support for media types, and migrate from text values to ID values will make the IAB’s good idea even more effective.

The Data

As promised, here is my full dataset, available for anyone who wants to explore it.

Download Ad Ops Insider’s Ads.txt dataset