Methods

Subjects and design

A total of 659 (M = 25.5 years, SD = 8.2, range = 13–70; 362 male, 283 female, 14 declined to respond) subjects completed the study online. A further 32 subjects were excluded from the analyses because they had missing response time data for at least one response on the detection or location task. As in Experiment 1, subjects did not receive payment for taking part but were given feedback on their performance at the end of the study. We stopped collecting data once we reached 100 responses per photo. The design was similar to that of Experiment 1.

Stimuli

We took our own photos in RAW format at a resolution of 3008 × 2000 pixels and converted them to PNGs with a resolution of 1600 × 1064 pixels prior to any digital editing. We checked the photos to ensure there were no spatial distortions caused by the lens, such as barrel or pincushion distortion. The photo manipulation process was the same as in Experiment 1. We applied the five manipulation techniques to six different photos to create a total of 30 manipulated photos. We used the non-manipulated version of these six photos and another four non-manipulated photos to give a total of ten original photos. Thus, the total number of photos was 40. As in Experiment 1, we ran two independent saliency models to check whether our manipulations had influenced the salience of the region where the manipulation had been made. See Additional file 2 for details of the saliency analyses. Similar to Experiment 1, our manipulations made little difference to the salience of the regions of the image.

Procedure

The procedure was similar to that used in Experiment 1, except for the following two changes. First, subjects were asked to locate the manipulation regardless of their response in the detection task. Second, subjects were asked to click on one of 12, rather than nine, regions on the photo to locate the manipulation. We increased the number of regions on the grid to ensure that the manipulations in the photos spanned two regions, on average, as per Experiment 1.

Results and discussion

As in Experiment 1, subjects spent a reasonable amount of time examining the photos. In the detection task, the mean response time per photo was 57.8 s (SD = 271.5 s) and the median 24.3 s (interquartile range = 17.3 to 37.4 s). In the location task, the mean response time was 10.9 s (SD = 27.0 s) and the median 8.2 s (interquartile range = 6.1 to 11.2 s).

Overall accuracy on the detection task and the location task

Overall accuracy in the detection task was slightly lower than that observed in Experiment 1, but still above chance: Subjects correctly classified 62% of the photos as being original or manipulated (cf. 66% in Experiment 1), 95% CI [60%, 63%]. Subjects had some ability to discriminate between original (58% correct) and manipulated (65% correct) photos, d' = 0.56, 95% CI [0.50, 0.62], replicating the results from Experiment 1. Again, this provides some support for the prediction that the match or mismatch between the information in the photo and people’s expectation of what real-world scenes look like might help people to identify original and manipulated real-world photos. In contrast to Experiment 1, however, subjects did not show a bias towards saying that photos were authentic: c = −0.07, 95% CI [−0.10, −0.04]. It is possible that asking all subjects to search for evidence of a manipulation—the location task—regardless of their answer in the detection task, prompted a more careful consideration of the scene. In line with this account, subjects in Experiment 2 spent a mean of 14 s longer per photo on the detection task than those in Experiment 1.

Recall that the results from Experiment 1 suggested that subjects found the location task difficult, even when they correctly detected the photo as manipulated. Yet, we were unable to conclusively say that location was more difficult than detection because we did not have location data for the manipulated photo trials that subjects failed to detect. In Experiment 2 we gathered those data, but before we could directly compare subjects’ ability to detect manipulated photos with their ability to locate the manipulations within, we had to correct for guessing. For the detection task, chance performance was the same as Experiment 1, 50%. For the location task, however, there were two differences to Experiment 1. First, subjects were asked to select one of 12, rather than one of nine, image regions. Second, we used a new image set; thus, the number of regions manipulated for each image and manipulation type changed. Accordingly, we ran a separate Monte Carlo simulation to determine the chance rate of selecting the correct region. Table 3 shows that overall chance performance in the location task was 17%.

Table 3 Mean number of regions (out of a possible 12) containing manipulation and results of Monte Carlo simulation to determine chance performance in location task by manipulation type and overall Full size table

Subjects performed better than chance on the location task: a mean 56% of the manipulations were accurately located, 95% CI [55%, 58%]. Given that a mean 62% of the manipulated images were accurately detected and a mean 56% of the manipulations located, it seems that performance was very roughly similar on the two tasks. But this interpretation doesn’t take into account how subjects would perform by chance alone. A fairer approach is to compare subjects’ performance on the detection and location tasks with chance performance on those two tasks. For the detection task, subjects detected a mean 12% more manipulated images than would be expected by chance alone, 95% CI [10%, 13%]. Yet, somewhat surprisingly, subjects located a mean 39% more of the manipulations than would be expected by chance alone, 95% CI [38%, 41%]. This finding suggests that people are better at the more direct task of locating manipulations than the more generic one of detecting if a photo has been manipulated or not. Although this potential distinction between people’s ability to detect and locate manipulations is an interesting finding, the reason for it is not immediately apparent. One possibility is that our assumption that each of the 12 image regions has an equal chance of being picked is too simplistic—perhaps certain image regions never get picked (e.g., a relatively featureless area of the sky). If so, including these never picked regions in our chance calculation might make subjects’ performance on the location task seem artificially high. To check this possibility, we ran a second chance performance calculation.

In Experiment 2, even when subjects did not think that the image had been manipulated, they still attempted to guess the region that had been changed. Therefore, we can use these localization decisions in the original (non-manipulated) versions of the six critical photos to determine chance performance in the task. This analysis allows us to calculate chance based on the regions (of non-manipulated images) that people actually selected when guessing rather than assuming each of the 12 regions has an equal chance of being picked. Using this approach, Table 4 shows that overall chance performance in the location task was 23%. Therefore, even based on this chance localization level, subjects still located a mean 33% more of the locations than would be expected by chance alone, 95% CI [32%, 35%]. This finding supports the idea that subjects are better at the more direct task of locating manipulations than detecting whether a photo has been manipulated or not.

Table 4 Chance performance in location task by manipulation type and overall based on mean number of subjects choosing the manipulated region in the original version of the image Full size table

Ability to detect and locate manipulations

On the manipulated photo trials, asking subjects to locate the manipulation regardless of whether they correctly detected it allowed us to segment accuracy in the following ways: (i) accurately detected and accurately located (hereafter, DL), (ii) accurately detected but not accurately located (DnL), (iii) inaccurately detected but accurately located (nDL), or (iv) inaccurately detected and inaccurately located (nDnL). Intuitively, it seems most practical to consider the more conservative accuracy—DL—as correct, especially in certain contexts, such as the legal domain, where it is crucial to know not only that an image has been manipulated, but precisely what about it is fake. That said, it might be possible to learn from the DnL and nDL cases to try to better understand how people process manipulated images.

Figure 7 shows the proportion of DL, DnL, nDL, and nDnL responses for each of the manipulation types. The most common outcomes were for subjects to both accurately detect and accurately locate manipulations, or both inaccurately detect and inaccurately locate manipulations. It is interesting, however, that on almost a fifth (18%) of the manipulated photo trials, subjects accurately detected the photo as manipulated yet failed to locate the alteration. For 10% of the manipulated trials, subjects failed to detect but went on to successfully locate the manipulation. Subjects infrequently managed to detect and locate airbrushing manipulations; in fact it was more likely that subjects made DnL or nDL responses. Although this fits with our prediction that plausible manipulations would be more difficult to identify than implausible ones, the pattern of results for geometrical inconsistency, shadow inconsistency, and addition or subtraction do not support our prediction. Subjects made more DL responses on the plausible addition or subtraction manipulation photos than on either of the implausible types, geometrical manipulations and shadow manipulations. Why, then, are subjects performing better than expected by either of the chance measures on the addition or subtraction manipulations and worse than expected on the airbrushing ones? One possibility is that people’s ability to detect image manipulations is less to do with the plausibility of the change and more to do with the amount of physical change caused by the manipulation. We now look at this hypothesis in more detail by exploring the relationship between the image metrics and people’s ability to identify manipulated photos.

Fig. 7 Mean proportion of manipulated photos accurately detected and accurately located (DL), accurately detected, inaccurately located (DnL), inaccurately detected, accurately located (nDL), and inaccurately detected, inaccurately located (nDnL) by manipulation type. The dotted horizontal lines on the bars represent chance performance for each manipulation type from the results of the Monte Carlo simulation. The full horizontal lines on the bars represent chance performance for each manipulation type based on subjects’ responses on the original image trials. Error bars represent 95% CIs Full size image

Image metrics and accuracy

Recall that the results from Experiment 1 suggested a relationship between the correct detection and location of image manipulations and the amount of disruption the manipulations had caused to the underlying structure of the pixels. Yet, the JPEG format of the images used in Experiment 1 created some (re-compression) noise in the Delta-E measurements between different images; thus, we wanted to test whether the same finding held with the lossless image format used in Experiment 2. As shown in Fig. 8, we found that the Delta-E measure was positively correlated with the proportion of photos that subjects correctly detected as manipulated (r(28) = 0.80, p < 0.001) and the proportion of manipulations that were correctly located (r(28) = 0.73, p < 0.001). These Pearson correlation coefficients are larger than those in Experiment 1 (cf. detect r = 0.34 and locate r = 0.41 in Experiment 1). It is possible that the re-compression noise in the JPEG images in Experiment 1 obscured the relationship between Delta-E and detection and localization performance. To check whether there was a stronger relationship between Delta-E and people’s ability to detect and locate image manipulations in Experiment 2 than Experiment 1, we converted the correlation coefficients to z values using Fisher’s transformation. There was a significantly stronger correlation between the Delta-E and detection in Experiment 2 than in Experiment 1: z = −2.74, p = 0.01. Yet because we had good reason to predict a stronger relationship in Experiment 2 than Experiment 1 (based on the JPEG re-compression noise), it might be fairer to consider the p value associated with a one-tailed test, p = 0.003. The correlation between Delta-E and accurate localization was not significantly stronger in Experiment 2 than in Experiment 1 based on a two-tailed test (z = −1.81, p = 0.07), but was based on a one-tailed test (p = 0.04). Therefore, it is possible that the global (re-compression) noise in the Delta-E values in Experiment 1 weakened the association between the amount of change and people’s ability to identify manipulations. This finding suggests that Delta-E is a more useful measure for local, discrete changes to an image than it is for global image changes, such as applying a filter.

Fig. 8 Mean proportion of correctly detected (a) and located (b) image manipulations by extent of pixel distortion as measured by Delta-E. The graphs show individual data points for each of the 30 manipulated images Full size image

Of course, the whole point of manipulating images is to fool observers, to make them believe that something fake is in fact true. Therefore, it might not be particularly surprising to learn that people find it difficult to spot high quality image manipulations. Yet it is surprising to learn that, even though our subjects never saw the same image more than once, this ability might be dependent on the amount of disruption between the original and manipulated image. The positive relationship between the accurate detection and location of manipulations and Delta-E suggests that it might be possible to develop a metric that allows for a graded prediction about people’s ability to detect and locate image manipulations. The possibility that a metric could be used to predict people’s ability to identify image manipulations is an exciting prospect; however, further research is needed to check that this finding generalizes across a wider variety of images and manipulation types. Our findings suggest that manipulation type and the technique used to create the manipulation, for instance, cloning or scaling, might be less important than the extent to which the change affects the underlying pixel structure of the image. To test this possibility, we next consider the relationship between the Delta-E values and the proportion of (a) correct detection and (b) location responses by the category of manipulation type.

Our findings in Experiments 1 and 2 show that subjects’ ability to detect and locate image manipulations varied by manipulation type, yet, in Experiment 2 the differences were not adequately explained by the plausibility of the manipulation. That is, subjects accurately detected and located more of the addition or subtraction manipulations than the geometry, shadow, or airbrushing manipulations. One possibility is that the five categories of manipulation type introduced different amounts of change between the original and manipulated versions of the images. If so, we might expect these differences in amount of change to help explain the differences in subjects’ detection and localization rates across these categories.

To check this, we calculated the mean proportion of correct detections, localizations, and Delta-E values for each of the five categories of manipulation type. As Fig. 9 shows, there was a positive correlation between the amount of change and the proportion of correct detections (r(3) = 0.92, p = 0.03) and the proportion of correct localizations (r(3) = 0.95, p = 0.01). These results suggest that the differences in detection and localization rates across the five manipulation types are better accounted for by the extent of the physical change to the image caused by the manipulation, rather than the plausibility of that manipulation. Yet, given that subjects did not have the opportunity to compare the manipulated and original version of the scene, it is not entirely obvious why amount of change predicts accuracy.

Fig. 9 Mean proportion of correctly detected (a) and located (b) image manipulations by extent of pixel distortion as measured by Delta-E. The graphs show the mean values for each of the five categories of manipulation type Full size image

Our results suggest that the amount of change between the original and manipulated versions of an image is an important factor in explaining the detectability and localization of manipulations. Next we considered whether any individual factors are associated with improved ability to detect or locate manipulations.

Factors that mediate the ability to detect and locate manipulations

Using GEE analyses, we again explored various factors that might affect people’s ability to detect and locate manipulations. As discussed, we were able to use liberal or stringent criteria for our classification of detection and location accuracy on the manipulated image trials. Accordingly, we ran three models: the first two used the liberal classification for accuracy (and replicated the models we ran in Experiment 1), and the other examined the more stringent classification, DL. As in Experiment 1, for the detection task, we also ran two repeated measures linear regression GEE models to explore the effect of the predictor variables on signal-detection estimates d' and c. We included the same factors used in the GEE models in Experiment 1. The results of the GEE analyses are shown in Table 5.

Table 5 Results of the GEE binary logistic and linear regression models to determine variables that predict accuracy in the detect and locate tasks Full size table

Using the more liberal accuracy classification, that is, both DL and DnL responses for detection, we found that three factors had an effect on likelihood to respond correctly: response time, general beliefs about the prevalence of photo manipulation, and interest in photography. As in Experiment 1, faster responses were more likely to be correct than slower responses. Also replicating the finding in Experiment 1, those who believe a greater percentage of photos are digitally manipulated were slightly more likely to correctly identify manipulated photos than those who believe a lower percentage of photos are digitally manipulated. Additionally, in Experiment 2, those interested in photography were slightly more likely to identify image manipulations correctly than those who are not interested in photography. For the location task, using the more liberal accuracy classification, that is, both DL and nDL responses, we found that two factors had an effect on likelihood to respond correctly. Again there was an effect of response time: In the location task, faster responses were more likely to be correct than slower responses. Also those with an interest in photography were slightly more likely to correctly locate the manipulation within the photo than those without an interest. Next we considered whether any factors affected our more stringent accuracy classification, that is, being correct on both the detection and location tasks (DL). The results revealed an effect for two factors on likelihood to respond correctly. Specifically, there was an effect of response time with shorter response times being associated with greater accuracy. There was also an effect of interest in photography, with those interested more likely to correctly make DL responses than those not interested.

Our GEE models in both Experiments 1 and 2 revealed that shorter response times were linked with more correct responses on both tasks. As in Experiment 1, this association might be explained by several models of perceptual decision making; however, determining which of these models best accounts for our data is beyond the scope of the current paper.