Since August 2017, the Internet Archive's Television News Archive has extracted the chyrons of CNN, MSNBC, Fox News and BBC News by OCR'ing a small bounding box at the bottom of the screen every 1 second. Earlier this month we began reprocessing this data into a research chyron dataset. Using this new dataset we can explore powerful questions, such as how similar the chyron text is across stations.

For example, using a single SQL query in BigQuery we can split each minute of chyron text into individual words and collapse by day into a daily histogram, removing words that appeared in less than 10 different one-minute intervals. In the same query we can then run a set of Pearson correlations comparing the per-station histograms by day, charting their similarity over time.

The final timeline, using 7-day smoothing, can be seen below:

The barchart below shows their median correlations over the past three years and also compares against BBC News:

What about those sharp dips in the timeline above? Those represent days when the three stations' chyrons noticably diverged. The timeline below zooms into the November 2019 portion of the timeline.

Here the sharp dips can be seen to correspond with the myriad impeachment developments of Nov. 6 and the impeachment testimony of Nov 19-21, suggesting that plotting chyron correlation over time can be a powerful way of detecting particularly partisan news days.

TECHNICAL DETAIL

Despite its complexity and the number of steps, the entire analysis above was completed with a single SQL query in BigQuery.