It is with great pleasure that I announce the newest addition to the Mixnode data family. Starting today, you can use Mixnode to run SQL queries over hundreds of thousands of ads.txt files from all around the web.

What is ads.txt?

Authorized Digital Sellers (Also known as ads.txt) is a simple yet effective standard to increase transparency in the online advertising ecosystem. Inspired by the robots.txt standard and developed by the leading industry technology and standards developer IAB Tech Lab, ads.txt can remove the financial incentive from selling counterfeit inventory.

The ads.txt standard allows content owners to declare authorized advertising platforms and resellers by listing them in a text file named 'ads.txt'. The ads.txt file is then placed at the root of the website belonging to the content owner. For example, the list of authorized direct ad platforms and resellers for reuters.com is available at https://www.reuters.com/ads.txt and contains the following data:

exponential.com,163960,DIRECT,afac06385c445926 tribalfusion.com,163960,DIRECT,afac06385c445926 indexexchange.com,176280,RESELLER,50b1c356f2c5c8fc google.com,pub-2051007210431666,RESELLER,f08c47fec0942fa0 google.com,pub-3746578658400510,RESELLER,f08c47fec0942fa0 indexexchange.com,185292,RESELLER smaato.com,1100036918,DIRECT google.com,pub-8200574565762874,DIRECT,f08c47fec0942fa0 spotxchange.com,152279,DIRECT,7842df1d2fe2db34 Spotx.tv,152279,DIRECT,7842df1d2fe2db34 rubiconproject.com,11384,DIRECT,0bfd66d529a55807 openx.com,537136463,DIRECT,6a698e2ec38604c6 openx.com,538986825,DIRECT,6a698e2ec38604c6 openx.com,537146938,DIRECT,6a698e2ec38604c6 Indexexchange.com,184971,DIRECT,50b1c356f2c5c8fc rhythmone.com,301820969,DIRECT,a670c89d4a324e47 rhythmone.com,1575427301,DIRECT,a670c89d4a324e47 yieldmo.com,Reuters.com,DIRECT ...

You can find more details about the ads.txt standard by referring to the official Authorized Digital Resellers specification.

An alternative to ads.txt crawling

Aggregating ads.txt files from all around the web is a common practice used for anti-fraud applications and to gain insight into the world of publishers, advertising platforms and resellers; however, it often requires significant investments in building a massive-scale web crawler that can run across a cluster of machines. Additionally, due to numerous politeness and rate limiting considerations, the crawler needs to be operated with near-perfect precision in order to prevent overloading or unauthorized access to websites.

The new Mixnode adstxt table is designed to be a simple, fast, and cost-effective alternative to building and running your own ads.txt crawler and aggregator. Rather than building the infrastructure required to extract ads.txt files from all around the web, you can simply write standard SQL queries against the adstxt table to find trends, insights, and patterns.

Using SQL you can have a bird's-eye view of the entire ads.txt ecosystem in the wild. For example, the following query demonstrates how you can find all websites using the AppNexus exchange in a matter of seconds:

select url_host from adstxt where content like '%appnexus.com%'

Similarly, if you need to find all websites using AppNexus and OpenX, you can simply extend your query with another LIKE condition:

select url_host from adstxt where content like '%appnexus.com%' and content like '%openx.com%'