Donations data collection and processing

The GoFundMe platform is a popular website launched in 2010 which allows anyone to create campaign pages describing requests for charitable financial donations. The types of campaigns on GoFundMe include charitable causes such as disaster relief funds and assistance for medical bills. All categories of campaigns in our data can be seen in Supplementary Fig. 2 along with the distribution of male and female contributions to each campaign category. Medical campaigns are the most common type in our dataset. People could voluntarily browse these pages and make donations to requesters. Records of years of donations were publicly displayed on the GoFundMe website at the time of this writing. Donors could (and often did) leave public messages along with their donations.

When donors were browsing a requester’s page, they saw on that page the requester’s goal amount, the amount of money donated so far toward that goal, a picture posted by the requester, a description of the request posted by the requester, and the names, donation amounts, and messages of the last (up to) 10 people to make a donation to that campaign. An example of a requester’s campaign page is shown in Supplementary Fig. 1.

The data were downloaded from the GoFundMe website during May and June of 2016. Information from the campaign pages as well as all donation amounts to each campaign were downloaded from the publicly accessible GoFundMe website using the statistical software R.

In the self-identified donors analyses (hypotheses 2 through 6) we removed anonymous donations and also donations with names that described more than one person (e.g. “John and Mary Smith” or “Alice & Deborah”) as our hypotheses focus on the gender of a single donor behind each transaction, not a group. In sum, 5.3% of donations were removed for indicating multiple donors. The donation transactions for which the gender of the donor could not be confidently estimated were also not included in these analyses.

Gender estimation

The gender of each requester and donor was estimated using information about their first names from the U.S. Census. We used the name frequency data from the 1990 US Census. The U.S. Census program provides the percentages of people with commonly occurring first names that are male or female. Thus, for any relatively common first name the census data provide an empirical probability that the person is male or female given his or her name. We used these probabilities to estimate the genders of requesters and donors by comparing the probability that a person with that first name is female to the probability that a person with that name is male. Our conservative and effective gender estimation algorithm is provided in Supplementary Methods 2.1. We only labeled a participant as male or female if he or she was above 10 times more likely to be one gender than the other given the gender frequencies associated with his or her first name. This gender estimation procedure confidently estimated 76% of the participants in our database who provided their names (and did not indicate multiple donors). Supplementary Table 1 shows a random sample of the gender detection output given user-provided names.

We coded the genders of the campaign creators with the same algorithm used for coding the names of donors. Out of the original set of campaigns, 6,100 campaigns were successfully coded for the gender of their creators and are included in the main regression analysis.

Survey data collection

We recruited 331 participants through Amazon Mechanical Turk. Participants were compensated at the equivalent of $15 per hour. In the summary of the survey we explained that it was only for people who previously donated on GoFundMe. We also asked at the beginning of the survey if they had donated to a campaign on GoFundMe before and disallowed participants who answered “No” from continuing with the study. Prior to analysis we removed 16 participants for failing an attention check question. We reviewed the open ended responses to identify unacceptable responses and removed ten participants for leaving illegitimate (clearly automated or incomprehensible) responses or ones that indicated they donated on a different crowdfunding platform than GoFundMe. This left a sample of 305 responses. 47% of participants were male and the sample had a mean age of 33 (SD = 11). The modal interval of amounts donated was $16-$30. In collecting our human subjects data we complied with all relevant ethical regulations for work with human participants. Informed consent was obtained from participants. This study was approved by the Princeton IRB under protocol #11464.

We asked all participants a battery of questions regarding their motivations for making their most recent donation. These questions were designed to address all potential egoistic motivations including seeking self-rewards (e.g. warm glow or positive view of self), avoiding self-punishment (e.g. guilt), seeking social rewards, avoiding social punishments, costly signaling, tax incentives, reciprocity, and directly benefitting from the campaign. Respondents that gave a donation anonymously then proceeded to answer the same set of questions and others regarding their most recent anonymous donation. More details on this survey can be found in Supplementary Methods 2.11.

Statistical framework

We implement a mixed-effects regression to evaluate hypotheses 2 to 5. The outcome variable is the amount of each donation in US dollars. Donor gender is modeled as a dummy variable (donor gender) with 0 representing female and 1 representing male.

There were some extreme outliers in our data due to a small number of donors giving massive amounts, so we excluded observations where the amount donated was greater than or equal to three standard deviations from the average donation amount or where the mean visible donation on the page was three standard deviations above the average mean visible donation on the page at the time of donating. When we run the same regression but with no outliers excluded, the pattern of significant findings is essentially the same and the coefficients all increase in magnitude. This can be seen in Supplementary Table 7. As a robustness check, we ran the same regression analysis with cutoff thresholds of 2 and 1 standard deviations. The effects are virtually the same under these different cutoff levels and can be seen in Supplementary Table 7. Statistical comparison of mean amounts of donations by men and women was performed using a two sample (two-tailed) t-test allowing for unequal variances across groups. The mean donation for males was $84.98 and for females was $59.80.

In our regression analysis, the campaign each donation was made to and the category of each campaign were both modeled as random intercepts. Including effects for campaigns in the model addresses the fact that different campaigns inherently have different baseline amounts that are appropriate for donations. For example, donating to a couple’s honeymoon fund likely does not warrant the same donation amount as donating to a fund for an urgently needed surgery. Modeling a random intercept for each campaign controls for potential campaign-level confounds such as the popularity of causes or the socioeconomic status of geographic areas where campaigns originated. Similarly including effects for campaign categories is to account for baseline differences across categories.

We note that by including the recipient-oriented variables in the same regression as the donor-oriented variables, we reduce the sample size notably. We include only the full model regression results in the paper for simplicity, as the results even with a smaller sample size are almost identical to the results with the recipient-oriented variables excluded. Supplementary Table 5 provides the regression output with the recipient-oriented variables excluded.

Same name analyses

Regarding Hypothesis 2, we note that only about 1% of donations were made to recipients with the same publicly listed last names as the donors. This is a small percentage, but still left >1400 transactions made to an apparent relative for each gender. We do not know how many donor-recipient pairs were relatives that we could not identify as relatives, so our data cannot be used for inference regarding the frequency of donations to relatives versus non-relatives.

By using sharing the same last name as a proxy for a family tie we are assuming that when a donor and a recipient have the same last name it is likely that they are related. In order to get a sense of the likelihood of donors and recipients sharing the same last name due to chance and not familial ties, we evaluated the empirical probability of this using our own dataset. We find that around 2% of cases where donors and recipients share the same last name in our dataset are likely due to chance rather than familial ties. In Supplementary Methods 2.6 we provide a more detailed account of this robustness check. Also in Supplementary Methods 2.6 we provide calculations demonstrating that with a small probability of random last name matches between donors and recipients (on the order of 2%) it is unlikely that random matches are affecting our results in a substantial way.

Instances where a person donated to a campaign of someone with the same last name but who was not a family member could add noise to the analysis and attenuate the effect size but should not increase the probability of a type I error, assuming there is not a substantial effect on donations between people who share the same last name by chance. If there was such an effect, this would affect our results subtly but not substantially due to the low probability we found chance matches were likely to have occurred with as described above. Instances where a donor gave to a recipient whom is a family member but did not share the same last name (e.g., donating to a sibling who married and took her spouse’s last name) will attenuate the estimated effect, but would not result in a type I error.

Men and women have different probabilities of sharing last names with family members due to conventions of name-taking in marriages. Therefore we do not attempt to ascertain gender differences in kin generosity with this variable.

Proportion of visible females

Testing our fourth hypothesis involved calculating the proportion of visible females. On a GoFundMe campaign page a prospective donor could see the names and donation amounts of the last 10 people to donate to the campaign. Since we know the order of the donations, we are able to determine which past donors and donations were shown on the page at the time of each contribution. If one of the past donations seen on the page was anonymous, could not be gender-coded, or was from more than one person then the donor was not included in the calculation of the female proportion of visible donors. In other words, the proportion of visible females variable represents the proportion of gender-identifiable donors visible on the page that were female. If all visible donors on the page were female then proportion of visible females would be equal to 1. If half of the visible donors were female it would be equal to 0.5.

Since donor gender is modeled as a dummy variable (0 = female, 1 = male) and there is an interaction in the model with donor gender and proportion of visible females, the coefficient for proportion of visible females can be interpreted as the effect of the proportion of visible females for female donors. Donor gender would be equal to 0 for females, so the Donor gender:Proportion of visible females coefficient is multiplied be zero and removed from the equation.

We note that the costly signaling effect for female donors did not remain significant in one robustness check where we excluded outliers above one standard deviation from the mean donation and mean visible amount. These results can be seen in Supplementary Table 7.

Mean visible donations

Since females tend to give less per donation, the variable mean visible donation is correlated with the variable proportion of visible females. The more females on the page, the lower the average donation shown tends to be. By modeling both of these variables simultaneously in one regression model, we avoid the potential issue that only one of these effects is truly at work here since they both explain unique portions of the variance (i.e. both have significant effects).

It is plausible that time could be a time confound in the relationship between the mean visible donations and the amounts given by each donor. That is, if there are consolidated periods in time when donations increase across donors and campaigns, such as on a holiday that encourages generosity, this may act as a third variable and increase the average visible donations and the individual donations simultaneously. This would give the impression of donations being influenced by prior donations because they would be correlated. We investigated the possibility of this by looking at the correlations between each donation and the past 20 donations. Since only the past 10 were shown on the screen, if the effect we see is due to visibility then there should be a stark drop-off in correlations after the tenth. This is the pattern we find which can be seen in Supplementary Figs. 3, 4.

Empathy coding

In order to assess the content of messages for expressions of empathy, we developed a short list of empathic phrases such as “I empathize…”, “… feel your…”, and “… heartfelt…”. The full algorithm including all key phrases can be seen in Supplementary Methods 2.5. We automatically coded for the presence of these phrases in the messages left by donors as a binary indication of each message expressing empathy or not. We implemented a permutation-based robustness check to ensure the validity of this method which can be seen in more detail in Supplementary Methods 2.5. Standard errors for percent of transactions expressing empathy were calculated using the normal approximation for binomial standard errors.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.