METHOD

We used the Event Registry API to scrape news articles about Millennials. The query filtered on news articles with the word “Millennials”, “millennials”, “Millennial”, or “millennial” in the headline published between June 15, 2015 and June 15, 2019. This query yielded nearly 38,000 articles. We obtained article metadata, including the URL, title, body, and publishing date from the query. Sometimes, multiple news outlets in the same media family publish the same article; removing these duplicates yielded a total of 26,565 articles.

We used the Spacy Python package to part-of-speech tag the headline text. Part-of-speech tagging identifies each word’s part-of-speech in the sentence (e.g., a noun versus a verb versus an adverb). We filtered on articles headlines in which Millennials perform an action (“Millennials are killing the napkin industry'”, for instance). Narrowing our focus made it easier to identify the focus of their love and/or destruction. Using the newly tagged headlines, we subsetted the main dataset on headlines where “millennials” is the subject noun of the sentence, yielding 12,500 articles. Of these articles, we also removed articles with less than five sentences in the body.

The objects you can explore are the noun chunks Spacy identified as the first direct object in the headline. We opted to look at noun chunks instead of just nouns to get a complete picture of the items Millennials are interacting with. Noun chunks include adjectives plus nouns, such as “second home” instead of “home”. This method left us with about 4,000 unique nouns and 2,000 unique verbs.