I have been suspended from my job at Google for saying in an interview that I believe News and Search results have a political bias. I want to explore this question in a series of posts, using data science, with only publicly available information and tools.

We begin by replicating and extending an experiment run originally by Paula Boylard. I scraped Google News, searching for the query “donald trump”, once a minute, 5000 times. A scrape had 105 stories on average.

Power-Law Distribution Over Sites

We begin by looking at the distribution of publications (or web-sites) that make up our new Google/Trump corpus. In particular, we look at the probability that a randomly selected story comes from each given news site. The results are depicted here:

Note the use of a power-law (or 80/20, or rich-get-richer) distribution. The most-used site, CNN, is selected in 20% of all articles! In other words, even with the millions of sites on the Internet, 1 out of every 5 stories about “donald trump” from Google News is from CNN.

Cumulative Distribution

In power-law style, 50% of all stories come from the top 5 sites (CNN, USA Today, NYT, Politico, Guardian), and 83% of all stories come from the top 20.

To be continued…

Does this list of web-sites look politically neutral to you? We’ll explore further in a future post!