Fake or manipulated images propagated through the Web and social media have the capacity to deceive, emotionally distress, and influence public opinions and actions. Yet few studies have examined how individuals evaluate the authenticity of images that accompany online stories. This article details a 6-batch large-scale online experiment using Amazon Mechanical Turk that probes how people evaluate image credibility across online platforms. In each batch, participants were randomly assigned to 1 of 28 news-source mockups featuring a forged image, and they evaluated the credibility of the images based on several features. We found that participants’ Internet skills, photo-editing experience, and social media use were significant predictors of image credibility evaluation, while most social and heuristic cues of online credibility (e.g. source trustworthiness, bandwagon, intermediary trustworthiness) had no significant impact. Viewers’ attitude toward a depicted issue also positively influenced their credibility evaluation.

The ubiquitous availability of easy-to-use software for editing digital images brought about by rapid technological advances of the 21st century has dramatically decreased the time, cost, effort, and skill required to fabricate convincing visual forgeries. Often distributed through trusted sources such as mass media outlets, perhaps unknowingly, these manipulated images propagate across social media with growing frequency and sophistication. Moreover, the technology that allows for manipulating or generating realistic appearing images has far outpaced the technological development of methods for detecting fake imagery and even experts often cannot rely on visual inspection to distinguish authentic digital images from forgeries. Bad actors can thus easily publish manipulated visual content to deceive their viewers, inflicting cognitive stress, exploiting prior beliefs, or influencing individuals’ decisions and actions.

Although it is difficult to say how prevalent undetected occurrences of fake imagery are, numerous examples have been exposed in which manipulated images have caused substantial harms at individual, organizational, and societal levels. For instance, an image of Senator John Kerry and Jane Fonda sharing the stage at a Vietnam era antiwar rally emerged during the 2004 presidential primaries as Senator Kerry was campaigning for the Democratic nomination. The accompanying caption stated, “Actress and Anti-War Activist Jane Fonda Speaks to a crowd of Vietnam Veterans as Activist and Former Vietnam Vet John Kerry (LEFT) listens and prepares to speak next concerning the war in Vietnam (AP Photo).” The forged photograph, however, was created by compositing together two separate photos that separately depicted Kerry and Fonda. The edited image showing them together gave the false impression that Kerry shared the controversial antiwar views of activist Jane Fonda (Light, 2004). In a more recent example, in January 2014, the Associated Press news agency fired its Pulitzer prize-winning photographer Narciso Contreras for digitally removing an object from one of his widely distributed photographs of the Syrian civil war (The Guardian, 2014). This case has stirred an ongoing and contested discussion about the authenticity of digital photographs, the potential repercussions of image manipulation, and the ethics code in photojournalism. Numerous other examples exist where fake imagery has been used to distort the truth and manipulate viewers (For more examples, see http://pth.izitru.com/). It is unclear how prevalent are instances of undetected photo manipulation.

The damage done by manipulated imagery is real, substantial, and persistent. Studies suggest that manipulated images can distort viewer’s memory (Wade et al., 2002)—therefore further enhancing the credibility of these images—and even influence decision-making behaviors such as voting (Bailenson et al., 2008; Nash et al., 2009). Moreover, even when individuals do become aware of the true nature of a forgery, the harmful impact of misinformation on their perception, memory, emotions, viewpoints, and attitude toward a topic can linger (Sacchi et al., 2007). Quite often the distribution of fake images will far surpass the distribution of any correction or attempt to expose the forgery (Friggeri et al., 2014). The factors combine to make image manipulation an extremely effective and difficult to combat manipulation method.

While there is a growing awareness that images should no longer be automatically assumed to be credible, authentic, or reliable sources of information, the general public remains vulnerable to visual deception. Due to the scope and speed of information dissemination across social media websites, the potential for ill-intentioned players to inflict emotional distress or to purposefully influence opinions, attitudes, and actions through visual misinformation poses a severe and growing societal risk. Yet we know distressingly little about how online viewers assess or make credibility judgments of online images. This article details a large-scale online experiment of image credibility that seeks to understand how individuals evaluate manipulated images that accompany online stories, and what features (image-related and non-image-related) impact their credibility judgment. The images tested in this study were altered using common manipulation techniques: composition, elimination, and retouching (identified in Kasra et al., 2018).

The research design was informed by earlier research on social and heuristic approaches to credibility judgment as well as by our previous exploratory findings on online image credibility (Kasra et al., 2018). Previous research in this area has either predominantly focused on fake image detection using machine learning approaches (Gupta et al., 2013), or on the credibility of textual information, such as websites and blogs (Allcott and Gentzkow, 2017; Morris et al., 2012; Wineburg and McGrew, 2016). These studies tend to assume that individuals make credibility evaluations on their own without considering that decisions are heavily influenced by one’s social networks. Our study is among the first to test the social and cognitive heuristics of information credibility and evaluation in the context of image authenticity.

Results There were slightly more men (N = 1902, 54.72%) than women (N = 1548, 44.53%) among those who completed the study. Participants were between 20 and 87 years old (one participant reported being 11 years old and was removed as all participants are required to be 18 or older to enter the study), with a mean age of 34.71 years (SD = 11.16). The largest household income category was less than 30,000 US dollars annually. Participants were well-educated, with 89.8% reporting at least some college or above. Detailed demographic statistics are reported in Table 2. Table 2. Descriptive statistics of participant demographics (N = 3476). View larger version Overall, we observed significant differences in the average credibility judgment of the six images, as expected. The mean credibility ratings on a 7-point scale for each of the images are: 4.65 (SD = 1.19, bridge collapse), 3.86 (SD = 1.74, gay couple adopting children), 1.83 (SD = 0.96, genetically modified mouse), 3.08 (SD = 1.66, a school in Africa), 4.06 (SD = 1.35, Syrian bombing), and 2.29 (SD = 1.32, Hispanic politician). The descriptive statistics and correlations are reported in Table 3. Table 3. Correlations of continuous variables (N = 3476). View larger version To test all hypotheses and answer research questions, we ran an analysis of covariance (ANCOVA) for all participants (N = 3476), with all four experimental factors (source trustworthiness, source and media type, intermediary, and bandwagon). The participant’s sex and the image tested were considered as fixed factors. The covariates were the participant’s age, digital imaging experience, Internet skills, and favorable attitude toward the issue. An interaction term between source trustworthiness and intermediary was also included. H1 predicted that images from highly trustworthy sources are evaluated as more credible than those from less trustworthy sources. H1 was not supported, as source trustworthiness did not have a significant main effect in the whole model, F(1, 3449) = 1.64, p = .20. H2a predicted that images from news organizations are perceived as more credible than those from individuals. H2b predicted that images from an organization’s official website will be perceived as more credible than those from their social media accounts. We tested the main effect of source and media type, as well as planned contrasts between the three levels within the factor (news organization website, news organization social media, individual social media). The results showed that the main effect was nonsignificant for the whole sample, F(2, 3449) = 1.75, p = .17, so were the planned contrasts. As a result, H2a and H2b were not supported. H3 predicted that images from more credible intermediaries will be perceived as more credible than those from less credible intermediaries, while RQ1 asked if having an intermediary affects image credibility. The factor intermediary did not have a significant main effect, F(2, 3449) = 0.97, p = .38. Subsequent planned contrasts among the three levels (no intermediary, low trustworthiness, and high trustworthiness) yielded nonsignificant results. Therefore, H3 was not supported, and the answer to RQ1 was negative. RQ2 explored the potential interaction between source trustworthiness and intermediary. We again found no significant interaction in all models. H4 predicted that images with higher levels of bandwagon cues such as shares and favorites will be perceived as more credible than those with lower levels of bandwagon cues. The main effect of bandwagon cues was nonsignificant in the whole model, F(1, 3449) = 0.04, p = .85, as well as in both subsamples. H4 was therefore not supported. H5a predicted that people with greater amounts of photography experience and digital imaging skills will perceive images as less credible compared to people with less skill or experience. This hypothesis was supported in the whole model, F(1, 3449) = 12.38, p < .001. H5b predicted that people with greater levels of Internet skills will perceive images as less credible compared to people with lower skills. This prediction was also supported, F(1, 3449) = 6.79, p = .01. To investigate further how Facebook and Twitter use in particular may play a role in credibility judgment of online images using either Facebook or Twitter mockups, we divided the participants into two subsamples: the Twitter sample is based on people who were exposed to Figures 1, 3, and 6 (bridge collapse in China, genetically modified cat/mouse, Hispanic politician), where the Twitter interface was used in the mockup; The Facebook sample is based on people who were exposed to Figures 2, 4, or 5 (gay couple, African school, and Syrian bombing). We ran separate ANCOVAs on both samples, using the same design as the whole model, first adding whether participants have a Facebook/Twitter account (binary variable), and if they do, their Facebook/Twitter use intensity measure, respectively. This resulted in two ANCOVA models for the Facebook subsample and two ANCOVA models for the Twitter subsample (Table 4). Table 4. ANCOVA predicting image credibility. View larger version H6a predicted that people who use Facebook more will perceive images as less credible compared to people who use Facebook less. This hypothesis did not receive support as Facebook use intensity was not associated with credibility rating, Model 3: F(1, 1535) = 0.86, p = 0.35. H6b predicted that people who use Twitter more will perceive images as less credible compared to people who use Twitter less. This hypothesis was supported, as Twitter use intensity was significant, Model 5: F(1, 999) = 5.98, p = .02. H7 predicted that people’s support of the issue depicted in the image is positively related to their credibility rating of the image. This hypothesis received strong support in the whole sample, F(1, 3449) = 9.00, p < .001, and Facebook subsample, Model 2: F(1, 1701) = 10.94, p = .001; Model 3, F(1, 1535) = 8.74, p = .003. Finally, participants’ sex and age were included as controls. Age showed a strong main effect across the board, Model 1: F(1, 3449) = 44.08, p < .001. Sex was significant in the Facebook subsample, Model 2: F(1, 1701) = 39.61, p < .001; Model 3, F(1, 1535) = 36.29, p < .001, but not significant in the whole sample, Model 1: F(1, 3449) = 2.63, p = .105, or the Twitter subsample.

Discussion As tools for creating and manipulating digital images become increasingly commonplace and easy to use, fake images continue to propagate across social media platforms and contemporary media environment, influencing the viewers and posing a significant sociopolitical threat around the world. It is thus imperative to better understand how people evaluate the credibility of online images. This study reports the findings from a large-scale experiment on image credibility evaluations on the Web, conducted on Amazon MTurk. Based on previous work on social and cognitive heuristics for evaluating online credibility, we tested the effects of several features such source, intermediary, and the background and skills of the viewers on assessing the credibility of images online. The results were consistent across all six images tested, showing that viewers’ Internet skills, digital imaging experiences, social media use, and pro-issue attitude are significant predictors of image credibility evaluation. However, none of the image context features tested—for example, where the image was posted or and how many people liked it—had an impact on participants’ credibility judgments. Our findings also reveal that credibility evaluations are far less impacted by the content of an online image. Instead they are influenced by the viewers’ backgrounds, prior experiences, and digital media literacy. This study contributes critical insights to image credibility research. Past studies reported that people generally believe that they are rarely capable of identifying fake images as such (Farid and Bravo, 2010), and that images are generally considered trustworthy (Kasra et al., 2018; Nightingale et al., 2017). Yet the credibility variance of images was limited in these studies, either by the measurement scale employed (binary yes/no), or by the topics and contexts of the images. To ensure a large variance, our study purposefully chose six fake images depicting a wide range of issues. Each image was forged using various image-manipulation techniques (e.g. composition, elimination, retouching) and exhibited different levels of sophistication. Contrary to what the previous studies suggested, our results show that people are not as gullible in evaluating image credibility on the Web. Our participants rated four images as fake or manipulated (below 4 on a 7-point scale). The other two images were rated only a little above the midpoint. This result indicates that participants, no matter how careless or distracted they may be, can still be discerning consumers of digital images. Compared to previous research, our study implemented three notable changes. First, our design recognized that image consumption and evaluation on the Internet are always contextual rather than occurring in a vacuum. We therefore provided brief textual information about each fabricated image, similar to how images are usually presented and viewed online. Second, taking into account that online information is continuously shared and reshared by different sources, we explicitly manipulated and tested whether the existence and trustworthiness of an intermediary had any bearing on image credibility evaluation. Third, we adopted a measure of credibility (6 items, on a 7-point scale) that is more nuanced than a binary yes/no choice, which was used in the study by Nightingale et al. (2017). Our scale has better reliability and validity than a binary measure, which is prone to false positive and false negative results. The most significant discovery of our study is that viewers’ skills and experience greatly impact their image credibility evaluations. The more knowledge and experience people have with the Internet, digital imaging and photography, and the online media platforms, the better they are at evaluating image credibility. Our results suggest that to mitigate the potential harm caused by fake images online, the best strategy is investing in educational efforts to increase users’ digital media literacy. Meanwhile, issue attitude has a significant effect as well. This is consistent with the confirmation bias found in many similar studies (e.g. Knobloch-Westerwick et al., 2015) that people are more likely to accept an image as real if it aligns with their prior beliefs. This finding could explain why fake news spreads so readily in social media settings. Several well-researched social and cognitive heuristic cues found in online credibility research (e.g. source trustworthiness, media platform, and bandwagon cues) did not have any significant effect on image credibility. Although surprising, this result does not mean that the process of image credibility evaluation is inherently different from the process of judging online information, neither does it mean that people use very different heuristics. We speculate that workers on MTurk might have been rushing through the experiment without paying enough attention to the various source, intermediary, and bandwagon cues. MTurk participants were certainly not motivated to pay attention (Antin and Shaw, 2012), as they were compensated by the completion of the task, regardless of the response quality. We included a few quality-control mechanisms, such as a 30 second minimum stay time on the image page before participants could advance to the next page.2 However, it can be argued that a rushed, careless scan without motivation to consider various cues is indeed how people consume news and images in today’s media environment. In this regard, the MTurk workers’ behaviors may be representative of people’s actual behaviors online. Limitation and future research This study has a number of limitations. We recruited participants from MTurk, a widely used platform for social science studies like ours. Although the platform allowed us to recruit a reliable, inexpensive, and demographically diverse sample, workers on MTurk are still not representative of the general population. We found them to have good self-reported Internet skills (M = 4.04 on a 5-point scale), compared to a student sample in our study and those reported previously (Hargittai and Hsieh, 2012; Hargittai and Shaw, 2015), perhaps unsurprisingly as they participated in an online labor marketplace. Still, research found that the MTurk samples are slightly more diverse demographically than standard Internet samples, and a lot more diverse than American college samples (Buhrmester et al., 2011). The second limitation is the lack of manipulation checks in our design. As a result, we do not know for sure whether observed results stemmed from lack of attention (e.g. participants did not notice the purported source of an image was New York Times) or the factor itself (e.g. participants did not think New York Times was a credible source). Given the pretest results, we believe the former is more likely than the latter, yet this remains a minor threat to validity. Third, our study was cross-sectional so causality cannot be ascertained, although we believe most variables capture pre-existing states and habits (Internet skills, digital imaging experiences) that are unlikely to change due to image evaluation tasks. In addition, we purposefully included only fake images in the credibility evaluation task and excluded unaltered and/or misattributed original images. Although this approach eliminated the risk of participants being already familiar with the stimuli, it only tested participants’ suspicion when confronted with an image that was actually fake. In other words, for those participants who rated fake images as less credible, we could not determine whether they were truly capable at evaluating image credibility, or just being more skeptical in general. However, given the amount of misinformation and disinformation in today’s media environment, being skeptical is arguably the crucial first step in all credibility evaluation tasks. As Rheingold (2012) argues, “the heuristic for crap detection is to make skepticism your default” (p. 77). Meanwhile, the fake images used in this study were all forgeries of sufficient quality so as not to be immediately distinguishable from authentic images. Previous work showed that participants failed to identify the images as fake and even when told that the images were fake they failed to correctly identify what image areas had been manipulated (Kasra et al., 2018). Inability to distinguish between compelling fakes and authentic images implies that results would be the same regardless of whether the study was done using authentic images, compelling fakes, or a mixture. Nevertheless, future research should test users’ ability to evaluate unaltered and misattributed images as well as fakes of varying quality. Even though our experiment aimed to be comprehensive, it still left out a few important factors. For example, we did not manipulate the recency of images posted (all images were presented with random 2015 dates), which could influence credibility judgments (Westerman et al., 2014). Furthermore, considering that people are more likely to consume news on social media sites instead of traditional channels, and that their networks of “friends” will play an important role in information diffusion, a productive future research direction is to examine credibility judgment in participants’ naturalistic social network environment. This will allow the study to factor in the endorsements and aggregate ratings from the participants’ self-curated network of friends and contacts. We also focused only on image consumption on the desktop while people increasingly access news on mobile devices (Fedeli and Matsa, 2018). How the parameters of mobile devices may impact credibility judgment remains a fruitful future direction. Operationally, our study design could be further improved by swapping the order of the Internet and digital skills questions with the fake image to eliminate potential priming effects. Forcing the participants to stay on the image page for 30 seconds was also a less than ideal approach to ensuring sufficient time for evaluation. Furthermore, participants’ Internet and digital photography skills were self-reported rather than measured objectively, although evidence shows that the likelihood of participants misreporting their Internet skills is low (Hargittai, 2009). Future research is encouraged to address the above operational concerns to further improve validity.

Conclusion In the age of fake news and alternative facts, the risks and dangers associated with ill-intentioned individuals or groups easily routing forged visual information through computer and social networks to deceive, cause emotional distress, or to purposefully influence opinions, attitudes, and actions have never been more severe. This article details an online experiment to probe how people respond to and evaluate the credibility of images in online environments. Through a series of between-subjects factorial experiments that randomly assigned participants on Mechanical Turk to rate the credibility of fake image mockups, we found that image characteristics such as where it is published and how many people shared it do not matter. Instead, participants’ Internet skills, digital imaging experiences, social media use, and pro-issue attitude are significant predictors of credibility evaluation of online images.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by National Science Foundation grants CNS-1444840 and CNS-1444861.

Notes 1.

We ran the same ANCOVA analyses on the student sample and found little difference from the MTurk sample. 2.

We also ran models including the total time spent on the image page as a covariate, which did not change results.

ORCID iD

Cuihua Shen https://orcid.org/0000-0003-1645-8211