About Me

I am a data analyst currently working in the XR space (augmented reality/virtual reality/spatial computing) and work daily analyzing social media data, emerging market trends, and other sources of consumer data. I previously worked on a Webby-Award winning video game research project for the Entertainment Software Association where I compiled the largest public database of active video game companies in the country, which was used to assess economic growth in the American video game market.

Methodology and Further Reading

Data Collection, Cleaning, and Analysis

Sample Twitter Data Collection and Cleaning

All data was collected using python scripts from Github that leveraged Twitter’s API. A separate tool leveraging Enterprise-level API access was used to verify data counts only. Data was pulled for research purposes and was collected in the event that it could be leveraged to defend against slander, defamation, or excessive hate or violence against the Reylo community.

All data is a sample and does not represent the population of every user who ever discussed a specific ship using any term to identify the ship (i.e. this data set is not inclusive of all tweets mentioning “reylo” OR “rey and kylo”) OR of every user who discussed a specific ship using that ship’s name only (i.e. this data set is not inclusive of all tweets mentioning “reylo”). This is for the following reasons:

Data was collected using easily comparable search terms only. All data associated with “reylo” in this article includes only tweets mentioning “reylo” and NOT “rey and kylo.” Likewise, all data associated with “finnrey” in this article includes only tweets mentioning “finnrey” and NOT “finn and rey.” (Please note, however, that the code pulled tweets containing “reylos” and “finnreys”). Simple search terms were chosen to limit errors in data collection that are more likely to occur when searching for a phrase (e.g. “rey and kylo”) vs one word (e.g. “reylo”). In shipping circles, the ship name is commonly used over the name of the characters and it is likely to be used at least once by a member of that shipping community. This is why this article focuses more on account volume than actual tweet volume. For this reason, the ship name is justified in representing a sample of all users and tweets discussing the ship within the given time frame. Data does not include tweets from locked accounts or tweets that have been deleted at the time of the data pull.

This analysis leveraged three datasets: 1) a historical sample of Twitter data where search term = “reylo” and where time period = 12/23/2015 to 12/28/2019 (total sample set = 433,210 tweets); 2) a historical sample of Twitter data where search term = “finnrey” and where time period = 12/23/2015 to 12/28/2019 (total sample set = 25,505 tweets); and 3) a sample of Twitter data where search term = “reylo” and where time period = 12/31/2019 to 1/3/2020 (total sample set = ~48,000 tweets before cleaning. The sample set utilized in this article totaled 25,012 tweets).

Data for the sample set assessing tweets mentioning “reylo” between 12/31/2019 to 1/3/2020 was cleaned by matching usernames via a VLOOKUP with pro-Reylo account followers. These follower lists were mostly gathered manually. Accounts were also deleted based on keywords commonly used by pro-Reylo accounts including “#BenSoloDeservesBetter,” #BenSolo,” “Ben Solo,” “#BenSoloDeservedBetter,” “canon,” “antis,” “uwu,” and “force bond.”

All historical data pulls occurred over non-consecutive intervals. Data was matched after each pull to remove duplicates. Data was also cleaned for any large bot accounts. Excluded accounts include @botreylo (30,459 tweets) and @whyshipreylo (7,638 tweets).

Chart visualizes all accounts mentioning “reylo” before bot removal. The two largest circles — “botreylo” and “whyshipreylo” are bots that were removed from the dataset.

To avoid major data discrepancies during periods where mention volume was higher (e.g. after a Star Wars film release), I triangulated the numbers in the data set in the following ways:

Tweet volume was verified using a social-listening tool with Enterprise-level API access. This tool was used to verify the number of tweets only to make sure the sample was representative of the population. No data from this tool is visualized in this article. Tweet volume was matched with other fandom metrics such as number of fanfictions created on fanfic website Archive of Our Own to verify activity of the Finnrey and Reylo shipping communities, respectively (links are in the article). Tweet volume was matched with other publicly accessible trend metrics including Google Trend data. When comparing the search terms for “Reylo” vs “Finnrey” in worldwide Google Trends, the data is similar to the trend of the Twitter data visualized earlier on in this article with spikes following the release of The Last Jedi in December 2017 and The Rise of Skywalker in December 2019.

Google Trends Worldwide snapshot where search terms = “reylo” v “finnrey” and time period = 12/23/15 to 12/28/19

Data Analysis

All data was analyzed and visualized in Excel and Tableau. Data sets were matched utilizing simple VLOOKUP functions between data sources.

Major themes from the data were collected after manually reading a random sample of about 7k tweets. This allowed me to gain context and understand the ways in which top themes and terms were being used.

The major themes were verified in a text mining analysis in R using the libraries (library (tm)), (library(stringr)) and (library(wordcloud)). Words were removed from the analysis including all stop words and “John,” “Boyega,” “Star,” “Wars,” “Reylos,” “Reylo,” “Twitter,” “Youtube,” “Instagram, “Rey,” “www,” “com” and “https.”

Word cloud of Twitter data from 12/31/2019 to 1/3/2020 created in R with library(wordcloud)

The word cloud above includes all words mentioned a minimum of 250 times. Counts for specific words of interest are as follows:

Fuck: 1,457

Shit: 1,128

Hate: 1,074

Racist: 920

Fucking: 896

Mad: 865

Toxic: 817

Abusive: 646

Sexist: 468

The “keyword groups” were created using a combination of my manual analysis and the texting mining completed in R. They were assembled after using common words that appeared within my manual analysis most frequently. Total mentions from each keyword group were determined using the SUMPRODUCT function. The keyword groups to determine the total number of mentions featured in this article are as follows:

Mental=[mental, manic, bedlam, satanism, sickness, disease, idiotic, loony, sociopath, lunatic, psychopath, insane, asylum, unhinged, crazy, nuts]

Racism=[racist, racism, black, racially, racial]

SJW=[SJW, progressive, liberal, leftist, woke, Antifa]

Twilight=[twilight, shades]

Sex=[porn, porno, erotica, sexualization, objectification, sexual, sexuality, horny, racy, smut, lust, sex]

Hate=[troll, trolling, bully, bullying, tears, crying, therapy, pain, cry, hate, hating] (Note: “harassing” and “harassment” were omitted since they were typically associated with reylos “harassing” Boyega rather than Boyega/antis “harassing” Reylos)

Abuse=[abuse, abusive, toxic, rape]

Screenshot Collection

Screenshots were collected from the following sources:

My own saved files of screenshots taken overtime between 2017 to 2019. Twitter Advanced Search found utilizing phrases from tweets and usernames found within data sets. Twitter Advanced Search found testing specific search phrases. Outside parties provided additional screenshots from 2016 to 2019 from Tumblr and Twitter corroborating the data and analysis. Screenshots found from other Reylo accounts that were posted between 2018 to 2019 to Twitter from Tumblr and Twitter.

EDIT: There has been confusion surrounding which names were redacted vs which usernames were kept in this article so I am expanding upon my methodology here.

Twitter is a public platform and all tweets can be easily identified within a native search tool. That being said, redacting names is best practice and I followed that practice with the following exceptions:

The username was essential to show patterns between the data sets (e.g. accounts that were considered “top accounts” across both the Dec 2015–2019 data set and the Dec 2019-Jan 2020 data set). This was to show the validity of the data and allow this methodology to be replicated. The username was essential to show patterns overtime. For example, certain usernames were revealed to show participation in multiple online incidents targeted towards women. The username could be used as an example of the reach of the harassment (e.g. verified accounts, who typically have a larger follower base)

Further Reading

“Acceptance of Self, the Female Gaze, and Removing Myth from Man: Thoughts on The Last Jedi” by Jessica (known online as Saturnine Stardust)

“Reylo, a Manifesto (or why we need to stop letting people define us as “just a ‘ship”)” by Nat (known online as ashesforfoxes)

“Reylo, A manifesto Pt II: Welcome to the Rollercoaster Ride to Hell” by Nat (known online as ashesforfoxes)

“The Life of Female Fans” by Women of the Whills