6 min read

This much delayed blog post deals with analyzing revenue across media franchises, with the data presented by the Tidy Tuesday project.

This was a really fun analysis, mainly due to the brainstorming that took place due to the fact that there aren’t a lot of variables in the dataset present. I also used stringr fairly frequently here to parse and manipulate the character data present in the dataset.

As always, I load in my preferred packages, after which I briefly glimpse() the data.

library(tidyverse) #for a streamlined analysis library(patchwork) #for combined plots library(igraph) #for network graphs library(ggraph) #for plotting network graphs using the grammar of graphics library(RColorBrewer) #for colour palettes theme_set(theme_light()) #preferred theme choice media_franchises <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-02/media_franchises.csv") #reading in the data media_franchises %>% glimpse()

## Observations: 321 ## Variables: 7 ## $ franchise <chr> "A Song of Ice and Fire / Game of Thrones", ... ## $ revenue_category <chr> "Book sales", "Box Office", "Home Video/Enter... ## $ revenue <dbl> 0.900, 0.001, 0.280, 4.000, 0.132, 0.760, 1.0... ## $ year_created <dbl> 1996, 1996, 1996, 1996, 1996, 1992, 1992, 199... ## $ original_media <chr> "Novel", "Novel", "Novel", "Novel", "Novel", ... ## $ creators <chr> "George R. R. Martin", "George R. R. Martin",... ## $ owners <chr> "Random House WarnerMedia (AT&T)", "Random Ho...

media_franchises %>% head()

## # A tibble: 6 x 7 ## franchise revenue_category revenue year_created original_media creators ## <chr> <chr> <dbl> <dbl> <chr> <chr> ## 1 A Song o~ Book sales 0.9 1996 Novel George ~ ## 2 A Song o~ Box Office 0.001 1996 Novel George ~ ## 3 A Song o~ Home Video/Ente~ 0.28 1996 Novel George ~ ## 4 A Song o~ TV 4 1996 Novel George ~ ## 5 A Song o~ Video Games/Gam~ 0.132 1996 Novel George ~ ## 6 Aladdin Box Office 0.76 1992 Animated film Walt Di~ ## # ... with 1 more variable: owners <chr>

media_franchises_processed <- media_franchises %>% separate_rows(owners, sep = "\\) ") #separating multiple owners into their own rows

I run through a few count() commands in order to get a better look at the categorical data present in my dataset, since I have a hunch that they are what I will be mainly looking at.

media_franchises_processed %>% count(revenue_category, sort = TRUE)

## # A tibble: 8 x 2 ## revenue_category n ## <chr> <int> ## 1 Box Office 104 ## 2 Merchandise, Licensing & Retail 101 ## 3 Home Video/Entertainment 91 ## 4 Video Games/Games 73 ## 5 Comic or Manga 46 ## 6 Music 19 ## 7 TV 16 ## 8 Book sales 11

media_franchises_processed %>% count(owners, sort = TRUE)

## # A tibble: 97 x 2 ## owners n ## <chr> <int> ## 1 Shueisha (Hitotsubashi Group 25 ## 2 The Walt Disney Company 25 ## 3 Shueisha (Hitotsubashi Group) 16 ## 4 (manga 13 ## 5 DC Entertainment (AT&T) 13 ## 6 (films) 11 ## 7 (franchise 10 ## 8 Square Enix 10 ## 9 Pierrot 9 ## 10 Bandai Namco (games) 8 ## # ... with 87 more rows

media_franchises_processed %>% count(creators, owners, sort = TRUE)

## # A tibble: 132 x 3 ## creators owners n ## <chr> <chr> <int> ## 1 Stan Lee Steve Ditko (franchise 7 ## 2 Stan Lee Steve Ditko Marvel Entertainment (The Wal~ 7 ## 3 Stan Lee Steve Ditko Sony (films) 7 ## 4 Akira Toriyama (manga 6 ## 5 Akira Toriyama Bandai Namco (games) 6 ## 6 Akira Toriyama Bird Studio Shueisha (Hitots~ 6 ## 7 Akira Toriyama Toei Animation (anime 6 ## 8 George Lucas Lucasfilm (The Walt Disney Co~ 6 ## 9 Hideaki Anno Gainax Tatsunoko Prod~ Khara[dc][279][280] 6 ## 10 Hironobu Sakaguchi Hiromichi Tanak~ Square Enix 6 ## # ... with 122 more rows

Now comes the exciting part: plotting!

I don’t do anything I haven’t done before here, but what makes it refreshing is the use of the patchwork package that lets me create a truly beautiful combined plot, in an extremely readable syntax!

For using patchwork , I store my revenue plots in 3 separate variables.

#Boxplots of revenue across categories p1 <- media_franchises_processed %>% mutate(revenue_category = fct_reorder(revenue_category, revenue, median)) %>% ggplot(aes(revenue_category, revenue)) + geom_boxplot(aes(fill = revenue_category)) + coord_flip() + labs(x = "Revenue category", y = "Revenue", title = "Total revenue across categories", subtitle = "In billions (USD)") + guides(fill = FALSE) #Revenue timelines, split across categories p2 <- media_franchises_processed %>% ggplot(aes(year_created, revenue)) + geom_smooth(method = "lm", aes(color = revenue_category)) + facet_wrap(~revenue_category, ncol = 2) + #facetting across categories guides(color = FALSE) + labs(title = "Revenue trends over the

years, per category", subtitle = "Generally consistent,

drop in merchandise revenue", x = "Year created", y = "Revenue (in billion USD)") + theme(axis.text.x = element_text(angle = 45)) #for readability

#Storing the names of the highest earning franchises topFranchises <- media_franchises_processed %>% group_by(franchise) %>% summarise(totalRev=sum(revenue)) %>% arrange(desc(totalRev)) %>% head(8) %>% #for readability pull(franchise) #Creating a custom colour palette custColors <- colorRampPalette(brewer.pal(8,"Set1"))(10) #Stacked bar plot of highest earning franchises p3 <- media_franchises_processed %>% filter(franchise %in% topFranchises) %>% mutate(franchise = fct_reorder(franchise, revenue, sum)) %>% ggplot(aes(franchise, revenue)) + geom_col(aes(fill = revenue_category)) + coord_flip() + scale_fill_manual(values = custColors) + labs(title = "Highest earning franchises", subtitle = "Sorted via total revenue", x = "Franchise name", y = "Revenue", caption = "In billions of dollars", fill = "Revenue category") + theme(legend.position = "bottom") (p1/p3)|p2 #patchwork!

This is where the beauty of patchwork kicks in: | for adding plots horizontally, and / for adding them vertically. Brilliant and simple. For more functionality be sure to check out the official repository!

Delving into a little detail, we can see that on average books create the most revenue for franchises, although merchandise are very close behind.

Now comes the kicker: I’ve always wanted to somehow know the number of franchises a media powerhouse like, say, The Walt Disney company owns, since they own a lot of big name franchises present today (the MCU and Star Wars come to mind). I visualize relationships between the original creators and their present day owners using the igraph and ggraph packages, as follows:

set.seed(100) #for reproducibility #Defining an arrow a <- grid::arrow(type = "closed", length = unit(.15, "inches"), angle = 15) media_franchises_processed %>% count(creators, owners, sort = TRUE) %>% filter(!str_detect(owners, "[\\(\\)]")) %>% #removing characters within brackets filter(n > 2) %>% graph_from_data_frame() %>% #creating a network graph from the dataframe ggraph(layout = 'fr') + #plotting it using the grammar of graphics geom_edge_link(aes(edge_colour = n), arrow = a, end_cap = circle(.07, 'inches')) + geom_node_point() + geom_node_text(aes(label = name), repel = TRUE, size = 2) + scale_edge_color_distiller(palette = "YlOrRd", direction = 1) + labs(title = "Relationships between creators and owners", subtitle = "The Walt Disney Company seems to own quite a few creations", caption = "Arrows point from creator to owner") + theme_void() + theme(legend.position = "bottom", legend.box = "vertical")

A lot of names stick out to me. We can see arrows from Masashi Kishimoto (creator and writer of Naruto) and Tite Kubo (creator and writer of Bleach) pointing towards Studio Pierrot, which makes sense, as this studio animated the shows Naruto and Bleach, two shows with an absolutely massive fan following. We can also see arrows from Pixar to The Walt Disney Company, for example.