Internet and social media participation open doors to a plethora of positive opportunities for the general public. However, in addition to these positive aspects, digital technology also provides an effective medium for spreading hateful content in the form of cyberbullying, bigotry, hateful ideologies, and harassment of individuals and groups. This research aims to investigate the growing body of online hate research (OHR) by mapping general research indices, prevalent themes of research, research hotspots, and influential stakeholders such as organizations and contributing regions. For this, we use scientometric techniques and collect research papers from the Web of Science core database published through March 2019. We apply a predefined search strategy to retrieve peer-reviewed OHR and analyze the data using CiteSpace software by identifying influential papers, themes of research, and collaborating institutions. Our results show that higher-income countries contribute most to OHR, with Western countries accounting for most of the publications, funded by North American and European funding agencies. We also observed increased research activity post-2005, starting from more than 50 publications to more than 550 in 2018. This applies to a number of publications as well as citations. The hotbeds of OHR focus on cyberbullying, social media platforms, co-morbid mental disorders, and profiling of aggressors and victims. Moreover, we identified four main clusters of OHR: (1) Cyberbullying, (2) Sexual solicitation and intimate partner violence, (3) Deep learning and automation, and (4) Extremist and online hate groups, which highlight the cross-disciplinary and multifaceted nature of OHR as a field of research. The research has implications for researchers and policymakers engaged in OHR and its associated problems for individuals and society.

Copyright: © 2019 Waqas et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

However, none of the previous work, to the knowledge of authors, has focused on the mapping of general research indices, prevalent themes of research, research hotspots, and influential stakeholders such as organizations and contributing regions regarding OHR. This undertaking is essential as such analyses help to evaluate the field-specific impact of scholarly research, as well as the impact of scientists, collaborative networks, and institutes. Therefore, we set out to map OHR using scientometric analysis, defined as the “quantitative study of science, communication in science, and science policy” [ 24 ]. Most importantly, scientometrics helps identify influential research studies resulting in the progress and evolution of a specific field of science [ 24 ]. By using reproducible statistical techniques, stakeholders can quantize the research output, citation rates, influential funding agencies, journals, scientists, institutes, and regions involved in the progress of the scientific discipline [ 24 ]. By mapping these trends, researchers, policymakers, and funding agencies can determine areas where an increase or restriction in research work and funding is required [ 25 – 27 ]. Therefore, this investigation aims to address this paucity of data using advanced scientometric techniques.

However, online hate is a complex phenomenon—with its definition depending on theoretical paradigms, disciplines, and forms of victimization [ 1 , 15 ]. Due to this complexity, online hate research (OHR) is a fragmented field with a growing number of research papers across disciplines, as the adverse effects of online hate are more widely recognized in society and as new disciplines (e.g., computer science, psychology) are introducing their own approaches to study and solve the associated problems. Due to this increasing body of research, there is a need for literature analyses that map the current state of OHR. While several evidence-synthesis approaches have attempted to summarize and critically review the literature on online hate, these tend to be based on heterogeneous methodologies and restricted to a particular discipline or field of study [ 9 , 10 , 23 , 13 , 16 – 22 ]. For example, an elaborate effort by the British Institute of Human Rights sought to systematically map studies about initiatives against cyberbullying and inform legislative efforts by the European Union [ 21 ]. A qualitative approach by Awan sought to provide evidence regarding the use of social media platforms by ISIS by examining 100 Facebook pages and 50 Twitter users [ 13 ]. Country-specific efforts included Gagliardone et al.’s efforts to map politically driven online hate in Ethiopia by reviewing relevant Facebook profiles, pages, and groups with more than 100 followers [ 23 ], which provided a framework for analyzing online hate speech and explored the continuum between freedom of expression [ 23 ]. Cyber-bullying has also attracted attention from public health and mental health professionals. Most influential and cited work in this domain is attributed to Tokunaga, who critically reviewed and synthesized evidence on cyberbullying victimization [ 20 ].

Online hate spreading has also emerged as a tool for politically motivated bigotry, xenophobia, homophobia, and excessive nationalism [ 9 – 12 ]. An example can be seen in the 2016 US elections; the narrative of “Make America Great Again” has empirically been shown to have amplified the online presence of white supremacists [ 9 ]. Social media platforms have granted a new spirit to radical nationalist groups including Klansmen and Neo-Nazis by ensuring anonymity or pseudonymity (i.e., disguised identity), ease of discussions, and spread of radical ideologies [ 1 ]. Moreover, social media and online forums have provided hate-driven terrorist groups a medium for launching propaganda to radicalize youth globally [ 13 ]. These groups use images and Internet videos to communicate their hateful intent, to trigger panic, and to cause psychological harm to the general public [ 14 ]. As a prime example of cyberterrorism, the Islamic State of Iraq & Syria (ISIS) effectively used social media to recruit youngsters from Europe to participate in the Syrian conflict [ 12 ]. Their social media campaigns led to at least 750 British youngsters joining Jihadi groups in Syria [ 13 ]. Overall, these real-world phenomena highlight the very real negative impact of spreading online hate and suggest that online hate can be considered as a major public concern.

The advent of the modern Internet opens doors to a plethora of positive opportunities for the general public. These opportunities span across equity in education and general access to knowledge, modes of entertainment, consumerism, and e-participation. However, in addition to these positive aspects, digital technology also provides an effective medium for spreading hateful content in the form of bigotry and hateful ideologies, as well as cyberbullying and harassment of individuals and groups on social media platforms [ 1 , 2 ]. Online hate, albeit conducted in the virtual world, may have dire real-life consequences at both individual and population levels. For example, the cyberbullying among youth and student populations and subsequent links with poor mental health, depression, trauma, substance misuse, and a higher risk of suicide are well-documented [ 3 – 6 ]. Recent estimates have placed exposure to online hate ranging from 31% to 67% across different study samples [ 7 ]. Among New Zealanders, for example, 11% of adults have been personally targeted by online hate [ 1 ], whereas, in the US, 41% of adults have experienced online hate speech and harassment [ 8 ]. Online hate has been shown to predominantly target and influence minorities, young age groups, people with disabilities, and the LGBTQ (Lesbian, Gay, Bisexual, Transgender, Queer) community [ 1 ].

Citation bursts were defined as articles attracting significant research activity in a given period. Clusters and themes of research in this field were identified by running a cluster analysis that identified the publication record cited in a specific set of publications, and the clusters were named using naming algorithms including TF*IDF; Mutual Information (MI) and Log Likelihood Ratio (LLR) [ 25 – 27 ]. Each cluster was also depicted by a year representing the mean year of publications of all included research studies. Out of these methods, LLR has been shown to be the most accurate [ 25 – 27 ]. The first method, TF*IDF, utilizes terms that are weighted by term frequencies (TF) multiplying inverted document frequencies (IDF) [ 25 – 27 ]. Log-likelihood ratio tests choose the most appropriate clustering label by assessing the strength of the bond between a term and the cluster [ 25 – 27 ]. Generally, the higher the LLR, the better the evidence. Lastly, the mutual information method is used for feature selection in machine learning; however, it works better with larger datasets [ 25 – 27 ].

After that, network analysis was run using pathfinder network scaling while allowing for the pruning of sliced networks [ 25 – 27 ]. All bibliographic data were then visualized as merged and static networks/clusters. Articles were represented as nodes, while the relationship between nodes was visualized as lines or edges. Two important matrices were used to demonstrate the overall structural properties of the network: modularity and silhouette value. Note that a high value of modularity (close to 1) corresponds to a good network structure that is reasonably divided into loosely coupled clusters, and a high silhouette score represents an appropriately homogenized cluster. This technique allowed for the visualization of important publications in a collaborative network based on their centrality values, also identified as a tree ring representing their history of citations and year-wise patterns [ 25 – 27 ]. New theories and landmark studies with high between-ness centrality were identified as purple rings while citation bursts were visualized as red tree rings [ 25 – 27 ].

Co-citation analyses were performed using CiteSpace software (n = v4.0, Drexel University, Pennsylvania, US). The bibliographic records retrieved from WOS were fed into the CiteSpace software, and “sliced” into three-year slices, where each slice was represented by 50 documents with the highest cited frequency. Titles, abstracts, and keywords were used as terms sources while cited references were used as nodes.

In the first phase, data curated from the Web of Science core database (WOS) was utilized for knowledge mapping based on the theory of document co-citation. According to this theory, when two documents are co-cited by one document, they are connected in a co-citation relationship [ 31 ].

The present mapping study is a broad overview of OHR. In line with our objectives, a broader interpretation of online hate was preferred, covering all forms of expressions that spread, incite, promote, or justify hate against groups or individuals [ 21 ]. This interpretation was adapted from the framework for online hate proposed by the British Institute of Human Rights [ 21 ]. All forms of expressions on a macro-level including racial hatred, xenophobia, anti-Semitism, aggressive nationalism, and hatred against minorities and migrants were included. On an individual level, various forms of expression, for instance, partner abuse as well as cyber-bullying against school children owing to their racial, ethnic, sexual background, and disabilities were included [ 21 ]. We acknowledge that there are alternative definitions for online hate and online toxicity, the latter of which can be defined as rude, disrespectful, or unreasonable commenting that is likely to make one leave a discussion [ 29 , 30 ]. Most of these definitions perceive online hate as a conceptually broad phenomenon that touches many stakeholder groups. For that reason, we consider broad inclusion criteria to be relevant for this research.

The search process resulted in a total of 3,371 research articles for a scientometric analysis. The data curated from the Web of Science (core database) included the citation characteristics, citation counts, and cited references. The Web of Science core database is one of the most frequently used databases for scientometric analyses. It was chosen primarily because it indexes detailed citations and full records of cited references that help in elucidating co-citation relationships between related documents [ 28 ].

In addition to operationalizing the concepts in Table 1 as search terms, we defined a list of popular social media platforms that were also used as search terms, as several studies focus on hate taking place in a specific social media platform. Using the Web of Science core database, an electronic search was conducted to retrieve peer-reviewed research studies (published through March 2019) pertaining to online hate. Overall, this search strategy encompassed important concepts pertaining to online hate and popular platforms: “TS = (Hate OR toxicity OR cyberbullying OR bullying OR harass* OR firestorm* OR abuse OR abusive OR ‘abusive language’ OR maltreat* OR oppress* OR persecut* OR taunt* OR bully* OR bullies OR victim* OR ‘hate speech’) AND TI = (Online OR ‘social media’ OR web OR virtual OR cyber OR Orkut OR Twitter OR facebook OR Reddit OR Instagram OR snapchat OR youtube OR whatsapp OR wechat OR QQ OR Tumblr OR linkedin OR pinterest)”. As mentioned, this search strategy was formulated based on an initial reading of the literature and identifying commonly emerging terms in the studies about online hate. No restrictions were applied for year of publication or language.

In the course of exploring the definitions, we compiled a list of keywords for the electronic search carried out to identify the body of research about OHR (see Table 1 ).

We defined the focal topic of study as online hate. We identified several definitions from the prior literature that helped us understand the nature of the phenomenon and to collect a list of concepts that reflect the multifaceted nature of OHR. Definitions of online hate vary, but a unifying factor is the use of technology for expressions that are harmful to individuals, groups, or society as a whole. An example of a definition that encompasses this duality is that of Kaakinen et al., according to whom online hate has two defining characteristics: it is technology-mediated and intends to offend, discriminate and abuse a person or a group based on group defining characteristics such as gender, race, nationality, ethnicity, disability, or sexual orientation [ 7 ].

Results

Top sources Top sources included Computer in Human Behavior, Lecture Notes in Computer Science, Cyberpsychology, Behavior & Social Networking, Journal of Medical Internet Research, Journal of Adolescent Health, Journal of Youth and Adolescence, Procedia Social and Behavioral Sciences, PLOS One, New Media Society, and Child Abuse & Neglect. While most frequent conference proceedings were published by IEEE ACM International Conference on Advances in Social Network Analysis and Mining, Annual International Conference on Education Research and Innovation, International Conference on World Wide Web, ACM Conference on Computer Supported Cooperative Work and Social Computing, Saudi Computer Society National Computer Conference, IEEE International Conference on Trust Security and Privacy in Computing and Communication Trustcom, ACM SIGSAC Conference on Computer and Communications Security and International Conference on Intelligence and Security Informatics Cybersecurity and Big Data. Frequencies of publications by top sources are presented in Table 2.

Fields of publication Top ten fields of publication included computer science information systems (n = 325), computer science theory methods (n = 282), criminology (n = 263), communication (n = 221), multidisciplinary psychology (n = 193), electrical/electronic engineering (n = 187), computer science interdisciplinary publications (n = 183), psychiatry (n = 168), educational research (n = 180) and clinical psychology (n = 154).

Top papers based on centrality in respective clusters Top papers were judged based on their values of centrality, where a value of 0.1 indicates a central publication. In a collaborative and co-cited network of publications, a high centrality value reflects highly significant research studies. However, in this analysis, none of the studies reached a centrality value of 0.1, indicating no central publication in the respective cluster. However, top centrality value (> 0.01) was achieved by 14 studies (Table 3 and Fig 5). The majority of these papers focused on cyberbullying among adolescents. Tokunaga RS (2010) and Kowalski RM (2007) were found to be most central to entities with centrality values of 0.04. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 3. Top articles based on centrality values. https://doi.org/10.1371/journal.pone.0222194.t003 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 5. Influential authors in online hate. https://doi.org/10.1371/journal.pone.0222194.g005 Six publications, including Raskauskas and Stoltz [32]; Kowalski and Limber’s as well as Smith et al.’s work from 2007 to 2008 [5] were one of the earliest studies that noted the prevalence and nature of electronic bullying, victimization, and perpetration among American pupils [5,32,33]. Dehue et al. [34] focused on youngsters’ experience of cyberbullying as well as their parents’ perception about it. They found that parents do set rules for the use of the Internet for their children but are not conscious of their perpetrating behavior and also underestimate victimization experiences [34]. Slonje and Smith reported four types of cyberbullying—by text message, email, phone call, and video clip—and emphasized that bullying by video clips is perceived as most negative in the society, and most of the pupils tell their school friends about their experiences and not their parents [35]. Erdur-Baker explained the risky use of the Internet and its association with cyberbullying in Turkey and was one of the rarer studies conducted outside the US [36]. Tokunaga provided synthesized critical review evidence of cyberbullying and provided an integrative definition of cyberbullying, differentiated it from traditional bullying, and linked it with serious psychosocial and affective problems [20]. His work also outlined the areas of concern in research on cyberbullying and provided a framework for future research [20]. In a similar vein, Junon and Gross [37] reported patterns of cyberbullying and their association with social anxiety among school going children [37]. Hinduja and Patchin provided the earliest link of cyber-aggression and increased risk of suicide [4]. Ybarra et al. [38] associated cyberbullying to rule-breaking behavior and aggression in real life in a dose-dependent manner [38]. Two studies focused on the development of the most widely used psychometric questionnaires in cyberbullying. Calvete et al.’s [39] work was the earliest work that led to the development and validation of the Cyberbullying questionnaire for profiling aggressors and cyberbullies [39]. They also reported that the use of proactive aggression, justification of violence, exposure to violence, and less perceived social support of friends was prevalent among cyberbullies [39]. A cyber-dating abuse questionnaire assessed two latent constructs: direct aggression among romantic partners and monitoring control, such as the use of personal passwords [40]. Another of the two studies reported teen dating abuse using an online medium and online sexual solicitations in chat rooms and its risk factors including using chat rooms, using the Internet with a cell phone, talking with people met online, sending personal information to people met online, talking about sex online, and experiencing offline physical or sexual abuse [41,42].

Clusters on cyberbullying Five clusters focused on the theme of cyber-bullying. The first meaningful cluster (n = 48, silhouette value = 0.91) emerged as a social networking site as per TF*IDF, cyberbullying, internet harassment and sexual harassment and cyberbullying experience (MI) in 2006 (mean year of publication of included studies). In other words, there were 48 research articles with a similar theme that could be presented with the cluster title of “social networking site” by the TF * IDF method. These 48 articles were placed in this cluster because all of them were cited by a similar group of publications, thus, representing a co-citation relationship. The most cited of this group was Mishna [43] who investigated cyberbullying behaviors among Canadian adolescents. They reported that bullying perpetrators perceived themselves as funny, popular, and powerful, albeit feeling guilty as well [43]. The second meaningful cluster included 48 studies with a silhouette value of 0.88 in 2011. It was named as general strain theory (TF*IDF), cyberaggression (LLR), and Australian youth (MI). The most active citer was Kowlaski et al. [44], who reported cyberbullying behavior among college students across multiple domains of life [44]. Cyberbullying and utilization of routine activity theory were discussed in the seventh cluster with 15 members, a silhouette value of 0.99 and the mean year 2004. It was termed as social networking site by TF*IDF method, internet user, utilizing routine activity theory, potential factor by LLR method, and case study by MI method. The most active citer of this cluster was Marcum et al. [45], who provided causal reasoning for cyber-victimization utilizing the framework of routine activity theory [45]. This theory posits that victimization requires three factors: the presence of a likely offender, a suitable target, and the absence of a capable guardian [45]. The 12th cluster focused on the association of spending time in online communities (TF*IDF) with the mental health of adolescents and caregiver-child relationships (LLR and MI). This cluster included seven papers with a silhouette value of 1.00 in 2000. The most active citer of this group was Ybarra et al. in 2004, who focused on Internet harassment and its association with quality of child-caregiver relationship [46]. The 16th cluster reported papers on an educational and artistic intervention to prevent cyberbullying. It was termed as virtual drama, the emergent narrative approach, and anti-bullying education (TF*IDF, MI, LLR), and emerged in 2005 [47]. The most active citer, Aylett et al. [47] presented evidence for virtual educational software to prevent cyber-bullying.

Clusters of sexual solicitation and intimate partner violence A total of three important clusters focused on the theme of sexual solicitation, dating abuse, and intimate partner violence. The third cluster focused on social support (TF*IDF) sexual solicitation via electronic mail; seeking human service; social support (LLR and MI) and included 44 papers. The most active citer was Finn (2000), who described the dangers involved when women seek human services on the internet [48]. This cluster emerged in the year 1998, highlighting early years of research. The sexual solicitation was the focus of another cluster with 17 papers and a silhouette value of 0.94, emerging in the year 2012. It was termed as extent, situational factor (LSI); hate speech, network site, and online sexual solicitation (LLR, MI). It focused on the abuse of minors as well as online exposure among the youth as evident by its most active citers [49]. The tenth cluster focused on intimate partner violence by utilizing routines activity theory, comprising ten papers in the year 2011 and a mean silhouette value of 0.99. It was labeled as information security; the extent of cyberbullying behavior (TF*IDF), cyber partner abuse, systematic review, routine activities theory, and empirical study (LLR, MI). The most active citer for this cluster was Arntfield (2015), who proposed a new framework for understanding cyber victimology using the Routines Activity Theory Framework [50]. The author stressed the role of victims as both a facilitator and factor for predation [50]. The terms “systematic review” and “empirical study” refer to the study designs utilized by studies in these clusters.

Clusters on deep learning & automation Deep learning and automation were studied in two important clusters. The fourth cluster focused on cyber defense (TF*IDF) and adaptive use and network-centric mechanism (LLR) and emerged in 2000. The most active citer was Atighetchi in 2000, whose work focused on defending against network-based attacks, and development of technologies augmenting an application’s resilience against hackers [51]. The 20th cluster revealed deep learning models and text classification as a viable source for identification of hate speech on Facebook groups in 2016 with a silhouette value of 1.0. The papers by Agrawal et al. [52] and Pitsilis et al. [53] were the most common citers of these clusters. Pitsilis et al. [53] proposed recurrent neural network models to discern hateful content on social media utilizing user-related information such as their tendency toward racism and sexism [53], while Agrawal et al. [52] showed that previous algorithms aiding in detection of cyberbullying have bottlenecks: specific platform, a specific topic of bullying, and thirdly, reliance on handcrafted features of the data. They proposed that deep learning models are viable in all of these situations [52].