Exploring Telenovela with DBpedia, R and Gephi

Today I discovered Telenovelas. Telenovelas are short limited run programs similar to soap opera, they are popular in Spanish language counties and they are serious business. I stumbled across a clip on youtube and was instantly hooked. Check this out:

I headed to Wikipedia to find out more only to find that Telenovelas is a very large phenomenon, so large there is so much information I didn’t know where to start. So I headed to DBpedia to do some basic exploring.

One of the first pages in wikipedia I stumbled across was about a popular childrens telenovela chiquititas. I think checking an article in DBpedia is always a good place to start, so I decided to check out the resource by its URI using DBpedias install of Virtuoso here: http://dbpedia.org/page/Chiquititas. I noticed there was it had the property dbprop:genre with value dbpedia:Telenovela.

Using this its quite easy to write a SPARQL query that can then pull back a list of all Telenovela articles in wikipeida.

Which gave me a list of all the programs in the Genre in Wikipedia, theres 139 of them apparently, which seems a very low number, but then we don’t know how many articles are missing the structured information for the query to work.

I wondered if we could work out which country had the most of them, I appended the query to include :

?show dbpprop:country ?country .

?country foaf:name ?countryname

which I could then plot with the following:

To get:

So Brazilians can’t get enough of the stuff. I’m not surprised either judging from the earlier clip I saw.

I decided to take a look at the cast members members of each Telenovia series. This has to be taken lightly as Wikipedia is short on this information. The SPARQL query looked like this:

This pulls back 645 names and the show they were in, some of the names are the same because they have appeared in more than one show. I created a matrix of who has been in shows with who and then plotted it as a GraphML so I could explore the data in Gephi.

To finally create a graph, zoom in for names. It seems Sabine Moussier, Elaine Giardine, Tony Ramos are all big names central to telenovela. The big blue circle at the top is from an american soap called Hollywood High.

The final code:

[codesyntax lang=”text”]

library("SPARQL") library("igraph") endpoint = "http://dbpedia.org/sparql" query = "SELECT ?name ?countryname { ?show dbpedia-owl:genre dbpedia:Telenovela . ?show foaf:name ?name . ?show dbpprop:country ?country . ?country foaf:name ?countryname }" qd= SPARQL(endpoint, query) df = qd$results counts <-table(df$countryname) barplot(counts, main="Telenovela articles in Wikipedia by Country", xlab="Country") query = "SELECT ?name ?showname where { ?person foaf:name ?name . ?show dbpprop:starring ?person . ?show dbpedia-owl:genre dbpedia:Telenovela . ?show foaf:name ?showname . }" qd= SPARQL(endpoint, query) df = qd$results M = as.matrix( table(df) ) iMrow = graph.adjacency(Mrow, mode = "undirected") E(iMrow)$weight <- count.multiple(iMrow) iMrow <- simplify(iMrow) write.graph(iMrow, file="graph.graphml", format="graphml");

[/codesyntax]