Beyond Speaking Time: An Analysis of Democratic Presidential Debates

Data preparation and feature engineering for predictive modeling using real-world data

Source: This LA Times article

As we head into the new year reeling from a climate of deep political polarization, I realized that I’ve never paid so much attention to the politics of this country as I have this past whole year — and that means I actually listen to The Daily podcast every morning now and even watched two or three live Democratic primary debates in their entirety.

The day after each debate, my newsfeed gets flooded with commentary, footage, and analysis of what the candidates talked about and even how much they talked. But I noticed there was a dearth of information on how they talked on the debate stage.

As we raise our glasses to the final year of the 2010s and to its eighth and final Democratic presidential debate, which is scheduled for December 19th, I thought it would be nice to close that gap with a little exploration and data analysis.

In this post, I will:

Provide a quick overview of how I preprocessed the raw transcripts scraped from the internet.

Present a comparative analysis of the candidates’ utterances using pandas looking at features like who was the most egocentric, who was the most future-oriented, and who spoke in the most flowery way.

After December 12th (which is the final date by which candidates have to meet all donor and polling requirements to qualify for the next debate), I will have all the information I need to: