Online votes or ratings can assist internet users in evaluating the credibility and appeal of the information which they encounter. For example, aggregator websites such as Reddit allow users to up-vote submitted content to make it more prominent, and down-vote content to make it less prominent. Here we argue that decisions over what to up- or down-vote may be guided by evolved features of human cognition. We predict that internet users should be more likely to up-vote content that others have also up-voted (social influence), content that has been submitted by particularly liked or respected users (model-based bias), content that constitutes evolutionarily salient or relevant information (content bias), and content that follows group norms and, in particular, prosocial norms. 489 respondents from the online social voting community Reddit rated the extent to which they felt different traits influenced their voting. Statistical analyses confirmed that norm-following and prosociality, as well as various content biases such as emotional content and originality, were rated as important motivators of voting. Social influence had a smaller effect than expected, while attitudes towards the submitter had little effect. This exploratory empirical investigation suggests that online voting communities can provide an important test-bed for evolutionary theories of human social information use, and that evolved features of human cognition may guide online behaviour just as it guides behaviour in the offline world.

We take a two-stage approach to our survey. To avoid guiding our respondents towards the predictions above, we first take advantage of a subreddit called “Theory of Reddit” where frequent voters post their motivations for up- and down-voting content. From these we assembled a list of 29 commonly-stated reasons, which contained the predicted reasons but also a range of others. We then surveyed 489 Redditors to see which of these 29 were most important, how they clustered together, and whether the characteristics of Redditors (e.g. their age, gender, or time on the site) predicted their evaluation of each.

To summarise, our aim in this study is to examine whether people report their online voting to be motivated by (i) the votes of others via social influence, (ii) the informational content of posts, (iii) the characteristics of the poster, via model-based bias, (iv) the enforcement of group norms, and (v) the enforcement of prosociality. All of these predictions derive from prior models and lab experiments that aim to identify the adaptive design features of human information use. Some predictions have already been tested in an online voting context (e.g. [ 22 , 33 ]), although not using self-report surveys. While self-report surveys have many weaknesses, and people may well not be aware of the actual reasons behind their voting decisions [ 49 ], it is instructive to know the explicit motivations of online voters, and whether these match with the results of online experiments [ 22 , 23 ] and corpus analyses [ 33 ].

The occurrence of anti-social behaviour on the internet is well known (e.g. ‘trolling’). As noted above, Reddit is an established and successful online community, so we might expect Redditors to possess norms designed to prevent anti-social posts from disrupting that community. Our final prediction is therefore that Redditors will up-vote content that exhibits prosocial sentiments (e.g. praise or helpfulness), and down-vote content that exhibits anti-social sentiments (e.g. personal abuse).

Although in theory any behaviour can be stabilised as a social norm [ 44 ], there has been much interest in prosocial (or altruistic) norms which entail a cost to the individual and a benefit to the group [ 40 ]. Prosocial norms have been detected using economic games in which people willingly punish others who under-contribute to public goods [ 45 ]. This altruistic punishment has been observed across all human societies that have been studied [ 46 , 47 ], and especially in the Western societies from which the majority of our respondents come. One explanation for the widespread existence of prosocial norms is cultural group selection [ 6 , 16 , 40 ], wherein throughout human history groups with stable prosocial norms out-competed and replaced groups without such prosocial norms (although other theories based on purely individual benefit have also been proposed: [ 48 ]).

Voting on Reddit takes place within an online community that can by itself instil particular shared norms in its users. Redditors are encouraged to adhere to an informal set of values known as the “reddiquette”, in addition to various moderated rules that guide user behaviour inside particular subreddits (subreddits are lists of submitted comments and links on specific topics, such as ‘worldnews’, ‘politics’ or ‘movies’). Given the possibility that people have evolved to enforce the behavioural standards of their community [ 40 ], it seems reasonable to expect that Redditors may internalise and try to enforce their community values by up-voting or down-voting content on the basis of its compliance with Reddit norms. We therefore predict that compliance with Reddit norms will be an important influence on Redditors’ voting decisions.

Norms are defined as “learned behavioral standards shared and enforced by a community” [ 40 ] p.218. Social psychologists have long demonstrated the powerful role that norms play in directing human behaviour and judgement [ 41 , 42 ], while recent developmental psychology studies show that from a very early age children internalise and follow norms for even arbitrary behaviours [ 43 ]. Chudek and Henrich [ 40 ] argue that culture-gene coevolution has resulted in this powerful norm-psychology given that there were likely fitness benefits to both coordinating group behaviour, and adopting majority behaviours that represent the accumulated wisdom of previous generations.

The fitness benefits of social learning can be further improved through selective learning from evidently skilful, successful or prestigious individuals, as they may be especially likely to possess useful information [ 12 , 35 ]. This has again been supported by experimental studies where participants preferentially copy successful individuals in various tasks [ 18 , 19 , 36 , 37 ] as well as copy individuals who others have looked at or shown deference in some way [ 38 , 39 ]. Here, we might predict that Redditors will report preferentially up-voting content that is submitted by individuals who they like and/or respect, and down-voting content that is submitted by individuals who they dislike and/or do not respect.

Evolutionary models predict that people should preferentially copy and transmit useful or relevant information [ 6 , 16 , 25 , 26 ]. Empirical studies have identified various criteria for ‘usefulness’ or ‘relevance’ such as that the information concerns social interactions [ 27 , 28 ], contains supernatural or non-intuitive concepts [ 29 , 30 ], or elicits emotional reactions of disgust [ 31 , 32 ]. For our purposes, we might predict that Redditors up-vote content that has characteristics such as wide anticipated appeal or uniqueness. One recent study conducted textual analysis of user reviews on a retail website [ 33 ], finding that the rated helpfulness of a review is predicted by the review’s length, detail and understandability (e.g. use of short rather than long words). Our study represents a test of whether Redditors are aware of these effects. We might also expect Redditors to up-vote content that they personally agree with, and down-vote content that they personally disagree with, given evidence that people typically transform information to fit pre-existing beliefs [ 14 , 34 ].

Given that it can be difficult to determine from its content alone whether a submission should be up- or down-voted, and also given that social information concerning others people’s voting decisions is freely available, we predict that Redditors should report using previous voting decisions as a guide to their voting decisions. A recent large-scale randomised experiment on a news aggregation website similar to Reddit [ 22 ] supports this prediction, finding that artificially up-voting a comment significantly increased the likelihood that actual users would subsequently up-vote that comment. Another study used replicate cultural markets to show that song preferences are susceptible to social influence, given that different songs became popular in different markets [ 23 , 24 ]. Our survey provides a test of whether Redditors are aware of the effect of social influence or not.

Evolutionary models have formally examined the adaptiveness of social learning, defined as copying the knowledge or behaviour of other individuals, relative to asocial learning, defined as personally evaluating behaviours or knowledge with no influence from others [ 15 ]. One prediction of these models is that social learning should be used when asocial learning is particularly costly or ambiguous [ 16 , 17 ]. This prediction has received empirical support in both cultural evolution [ 18 , 19 ] and social psychology [ 20 ] experiments. Note that while this is a generally adaptive strategy, it can in certain cases lead to maladaptive ‘informational cascades’ where individuals copy inappropriate information from others without directly evaluating its effectiveness [ 21 ].

We are particularly interested in applying the novel framework of cultural evolutionary theory to online voting decisions. The field of cultural evolution [ 5 – 11 ] concerns (i) the way in which human cognition has biologically evolved to acquire, process and transmit information in an evolutionarily adaptive manner, such as by preferentially learning from successful individuals or by copying others only when one’s personal information is unreliable; and (ii) how these ‘social learning strategies’ [ 12 , 13 ] affect long-term cultural dynamics, i.e. how they influence change and variation in attitudes, beliefs, knowledge and other forms of culturally-transmitted information. Given that online voting is fundamentally concerned with the evaluation of information that originates from other people, and with the decision to transmit that information to others (by making it more or less prominent), we think it is plausible that psychological mechanisms that have evolved to serve these functions in the offline world may also apply to the online world. Note that there is strong overlap between cultural evolution and social psychology [ 14 ], and many of the same findings and predictions can be found in each tradition.

This raises the question of how users decide to up-vote or down-vote content on such websites. In this exploratory empirical study we surveyed 489 Reddit users (known on the site as ‘Redditors’) asking them about their reasons for up- or down-voting content. Reddit was chosen because it is one of the most popular and prominent aggregator sites. It is currently in the top 50 of the world’s most visited websites [ 3 ], and it has over 100 million unique visitors per month, of which just over 3 million are registered users who cast over 20 million votes [ 4 ].

Consequently, many internet users rely on aggregator websites, or aggregation mechanisms built into other websites, to make decisions about what to read, view, buy, visit or endorse [ 1 , 2 ]. Dedicated aggregator websites such as Reddit ( www.reddit.com ) allow users to submit comments and posts that typically contain links to news stories, articles, images and videos found on other websites. These submissions can be up-voted or down-voted by other users to influence the content’s position and subsequent visibility. Similarly, retail websites such as Amazon ( www.amazon.com ) ask customers to evaluate other customers’ product reviews, and preferentially display reviews that have been rated the most helpful. Websites such as Reddit and Amazon are therefore using bottom-up, user-driven aggregated evaluations to filter information that would otherwise be overwhelming. This is in contrast to, say, traditional news sources such as print newspapers, which rely on editors to select content to present to readers in a top-down fashion.

The internet is becoming increasingly central to people’s lives. For many people who live in industrialised countries it is now a major means of acquiring and transmitting information, of social interaction and communication, of entertainment, and of buying and selling goods. With this increased use has inevitably come a proliferation of information. Internet users are faced with a barrage of news stories, articles, products, services and other content that they could never directly and exhaustively evaluate for themselves.

Factor scores were created for each individual by calculating the mean of the raw scores corresponding to all items loading on a factor, as recommended for exploratory studies such as this one [ 53 ]. Consequently, factor scores retained the scale metric of the original Likert items, allowing for easier interpretation in subsequent analyses. These factor scores were then used as dependent variables in a series of ordinal logistic regression models to see if the importance scores of different voting influences varied depending on the voters' personal traits. Ordinal logistic regression was used because the dependent variables were 5-point Likert responses and were therefore not normally distributed. We used the clm and clmm functions in the R package ordinal [ 54 ] (clm gave identical results to the more popular polr function for ordinal logistic regression, but also allowed us to model random effects via the clmm function). Thirteen predictors were included in the models. Continuous predictors were the frequency of the respondent’s Reddit (i) visits, (ii) votes and (iii) contributions, (iv) their age, and the extent to which respondents report their voting being influenced by (v) their emotional reaction to the post, (vi) their objective evaluation of the post, (vii) the poster’s reputation, and (viii) the membership duration of the poster. Categorical predictors were the respondents' (ix) location country, (x) gender, (xi) Reddit membership duration, and whether they take more notice of (xii) up-voted content and (xiii) gold-badged content. Country was entered as a random effect given that shared location may generate non-independence in responses. For gender, ‘male’ served as the baseline against which ‘female’ and ‘other’ were compared. For Reddit membership duration, ‘1–6 months’ served as the baseline against which ‘6–12 months’, ‘1–2 years’, ‘2–5 years’ and ‘more than 5 years’ were compared. To avoid inflated Type I error rates associated with stepwise regression [ 55 ], we ran an initial full model with all thirteen predictors and then removed predictors with p>0.05. Model comparison was then used to check that all remaining predictors significantly improved model fit; where they did not they were removed. The remaining best-fit models are presented here. The full data file is available as S1 Dataset .

All analyses were conducted using R version 2.8.0 [ 51 ]. First, a correlation matrix was calculated for the 29 variables concerned with different content characteristics as influences on voting. Principal Components Analysis (PCA) was used to explore whether these responses were organised in ways that reflected a smaller number of broad underlying forces, using the ‘principal’ function in R package psych [ 52 ]. The number of extracted factors was based on Eigenvalues greater than 1, and orthogonal (varimax) rotation was applied to better identify each item with a single factor.

In addition to questions about voting motivations, respondents were asked to state their age, gender and Reddit membership duration category (1–6 months, 7–12 months, 1–2 years, 2–5 years, more than 5 years). They were also asked to state their frequency of Reddit visits, votes on posts, votes on comments, submissions of posts and submissions of comments on a four-point scale (daily, weekly, monthly, few times a year or less). Two variables were then created to represent the mean frequency of post and comment votes, and the mean frequency of post and comment submissions. The respondents’ location country was provided by SurveyGizmo. Respondents were asked to state the extent to which their overall voting decisions are influenced by emotional reactions to content, objective evaluations of the content’s quality, online reputation of the poster, and the length of time the poster had been a Redditor, where answers were given on a 5 point Likert scale ranging from “not at all” to “a lot”. Finally, there were two yes/no questions that asked respondents if they take more notice of highly up-voted content and content that is accompanied by Gold badges, which can act as indicators of commendation or popularity on Reddit.

During the first phase, the search term “upvote downvote” was used inside the subreddit “Theory of Reddit” to look for posts where Redditors previously discussed their motivations for voting. Search results were sorted by relevance and four posts whose titles referred to motivations for voting were opened. One of the researchers (MP) then read and analysed these posts and their response comments, which included a total of 180 comments made by Reddit users. A dramaturgical qualitative coding framework described in [ 50 ] was used to consider the users’ objectives, conflicts, tactics, attitudes, emotions and subtexts when analysing their written motivations for voting. This preliminary investigation identified 29 recurring themes upon which the quantitative questions in the survey were based. These characteristics encompassed traits that were necessary to address the predictions (e.g. content sounding intelligent, agreement or disagreement, aggression or consideration for other people etc.), as well as other unanticipated qualities. Participants were asked to rate the importance of these 29 characteristics as influences on upvotes and downvotes, placing their answers on a 5 point Likert scale ranging from “not important” to “very important”.

We distributed a survey to each participant that contained qualitative and quantitative questions in order to gain an insight into Redditors' motivations for voting, as well as brief demographic questions (see S1 File for full survey text). The first qualitative part of the survey asked participants to vote on a sample of 15 Reddit comments and then to write brief explanations for these decisions. The quantitative part of the survey contained generic questions about what influences the respondents’ usual voting behaviour on Reddit. This paper focuses only on the quantitative results from the second part of the survey.

Respondents were recruited through notices posted in subreddits that are concerned with how Reddit communities work ( www.reddit.com/r/TheoryOfReddit ), anthropology ( www.reddit.com/r/Anthropology ) and surveys ( www.reddit.com/r/SampleSize ). These subreddits were chosen because their users were expected to be interested in being part of the study. Participants were invited to follow a link to an online survey, which was hosted between 12 th to 20 th November 2013 using SurveyGizmo ( www.surveygizmo.com ) survey software. The sample consisted of 489 Redditors (236 females, 248 males, 5 gendered as “other”). The participants had a mean age of 26 years (s.d. = 7.78) with an age range of 18 to 64. The majority of participants were located in the United States (68.1%), Canada (8.0%), United Kingdom (5.9%) or Australia (3.5%), with the remaining 14.5% coming from 33 countries none of which individually exceeded 1.6% (8 participants).

Results

The full list of 29 content characteristics selected for inclusion in the survey can be seen in the left-hand column of Table 1. These were all deemed appropriate for PCA analysis, since the Kaiser-Meyer-Olkin measure of sampling adequacy was 0.8 and Bartlett’s test of sphericity was highly significant (p<0.001). Eight principal components were extracted and they explained 61.85% of the total variance, as shown in Table 1. We labelled these factors Reddit Norms (whether content follows the rules of Reddit), Empathy / Humour (whether content elicits empathy, agreement or humour), Social Influence (the content’s existing number of up-votes or down-votes), Prosociality (whether content is considered socially damaging, rude or inconsiderate), Intelligence / Uniqueness (whether the content sounds intelligent, interesting or unique), Unshared Experiences / Bad Memories, Unoriginality, and Attitude Towards User (whether the content is posted by a liked or disliked user). Two variables remained unfactored, Disagreement Of Opinion and Wish Others To See (whether the voter thinks that the content should be viewed by other users). After the calculation of mean importance scores for each factor (see S1 Table), the factor Unshared Experiences / Bad Memories was dropped from further analyses due to low importance. Fig 1 shows the mean importance scores for the seven remaining factors and the two unfactored variables.

We then regressed factor scores on our individual difference variables, to better understand which kind of Reddit user valued each factor. Table 2 shows significant predictors from the best-fitting regression model for each factor. We provide odds ratios (ORs) and their confidence intervals rather than beta coefficients as ORs are the most easily interpreted effect size measure from ordinal logistic regressions. ORs indicate the relative change in the odds of different outcomes occurring per unit change in a predictor. An OR = 1 indicates no change, and thus no effect. An OR = 1.10 for, say, the age predictor indicates that for every one-unit increase in age (by one year), the odds of choosing one level of the Likert-scale outcome variable (e.g. 5 = “Very important”) is 1.10 times the odds of choosing any lower level (e.g. 1 = “Not important”, 2, 3 or 4). Note that we cannot directly compare ORs for predictors in the same model that have different scales. For example, age is continuous ranging from 18–64, while gender has three categories, so a one-unit increase in age is different to a shift from male to female or other.

The country variable was entered as a random effect in all models. However, none of these multi-level models showed significantly better fit compared to a model without country as a random effect. This indicates that country did not influence responses, most likely because the majority of our respondents were in Western countries (see Methods). The models in Table 2 therefore do not contain country as a random effect.

Table 2 shows that Reddit Norms was rated as more important by younger respondents, women, longer-term Reddit members, respondents who evaluated posts based on objective criteria, more frequent voters, and more frequent contributors. Empathy / Humour was rated higher by women, by respondents who report evaluating posts based on emotional reaction, and more frequent voters. Prosociality was rated as more important by more frequent voters, and by respondents who use both objective evaluations and emotional reactions. Emotional reaction (OR = 1.78, 95% CI [1.53, 2.08]) had a larger effect than objective evaluation (OR = 1.23, 95% CI [1.03, 1.46]), with non-overlapping confidence intervals (note that these predictors are comparable because they were measured on the same scales). Intelligence / Uniqueness was rated as more important by more frequent contributors and by respondents who use both objective evaluation and emotional reaction. Respondents reporting their gender as “other” appear to give lower importance compared to males, but this finding is unlikely to be reliable as the number of respondents who identified with this gender was very small (n = 5). Objective evaluation (OR = 1.70, 95% CI [1.43, 2.03]) had a larger effect than emotional reaction (OR = 1.30, 95% CI [1.11, 1.52]), although with slightly overlapping confidence intervals. Unoriginality was rated more important by younger respondents, longer-term Reddit members, more frequent voters, by respondents who use objective evaluation, by respondents who took notice of gold-badged content, and by respondents who use the membership duration of the poster. Social Influence was rated more important by younger respondents, respondents who use emotional reaction, more frequent voters and posters, respondents who notice upvoted content, and respondents who use the reputation of the poster. Wish Others To See was rated as more important by respondents who use objective evaluation and emotional reaction (with similar effect sizes), and more frequent voters. Disagreement Of Opinion was rated as more important only by respondents who use emotional reaction. Finally, Attitude Towards User was rated as more important by respondents who use emotional reaction, more frequent posters, and respondents who use the reputation and membership duration of poster. Of the latter, reputation (OR = 2.60, 95% CI [2.07, 3.30]) had a larger, non-overlapping effect size than membership duration (OR = 1.49, 95% CI [1.14, 1.95]).

Following the discovery of a significant relationship between age and Social Influence, additional tests were used to see if there were significant age differences in other questions associated with this variable. Mann-Whitney U tests showed that people who answered “yes” to “Do you take more notice of highly upvoted content?” were significantly younger (mean rank = 239.08) than those who answered “no” (mean rank = 288.11; z = -2.51, p < .05). Similarly, people who answered “yes” to “Do you take more notice of content accompanied by Gold badges?” were significantly younger (mean rank = 223.63) than those who answered “no” (mean rank = 266.64; z = -3.37, p < .05). For other possible markers of social influence, Spearman's Rank Order correlations showed that correlations between age and the perceived influence of the poster’s reputation (r s = -.071, p > .05) and the poster’s membership duration (r s = -.011, p > .05) were non-significant.

Descriptive analyses showed that content creators’ reputation and Reddit membership duration did not have a big perceived influence on the participants’ voting decisions, eliciting mean influence scores of 1.42 (s.d. = 0.78) and 1.25 (s.d. = 0.65) respectively on a 5 point Likert scale.