What terms go hand-in-hand with #digitaljournalism?

Since both my personal and professional interests have a lot to do with new technologies, journalism and innovation, I decided to work more closely with the Twitter dataset for #digitaljournalism. In particular, I went on to visualize the relationships between #digitaljournalism" and other search terms.



The data (file: digitaljournalism-hashtagEdges) were retrieved on September 12, and consisted of 76 records from the previous seven days.

What caught my attention from the beginning was that the search term “digitaljournalism” was spelled in three different ways - #digitaljournalism, #digitalJournalism and #DigitalJournalism. Though content-wise all three of them represent the same thing, my guess was that in a data visualization tool like Gephi, they would be treated as separate entities.

It turns out there were indeed three clusters with the main node being #digitaljournalism - in one of its three variations. The most used spelling was the one with no capital letters in the search term, and I assume, if I had the complete dataset, there would have been even more nodes connecting to it. Also, there may have been another spelling variation, hence at least one more cluster.

As it turns out, however, there is a pretty simple to strip all search terms of their capitalizations (Textwrangler to the rescue!) and unite the three variations into one. Below is the resulting visualization:

My main learning from this exercise was rather practical, as I now know how to apply different filters in Gephi in order to get a clear and comprehensive visualization.