Credit: Melissa Simone

Melissa Simone, a quantitative psychologist at the University of Minnesota in Minneapolis, uses surveys to study eating behaviours in people from sexual and gender minorities (LGBT+). In less than 24 hours of promoting one such survey on Twitter, she received 386 responses. But in most cases there was nobody at the keyboard. Simone’s survey had been attacked by ‘bots’, automated online mischief-makers created by people who were probably targeting her survey for the US$15 reward she offered.

To Simone, bots are “like fake Twitter accounts”. The fact that they might be deployed to sabotage scientific studies, she says, is “mind-blowing”. She spent 200–300 hours developing a battery of tests to ferret out the false responses, culling her data set to just 11.

Simone shared her findings on Twitter in September 2019. In November, she relaunched her survey, this time recruiting participants directly rather than on social media. Nature asked her about her experience.

How did you know your survey had been attacked?

My survey had a lot of open-ended questions. As I was scrolling through the answers, I noticed a response that used Latin words. I thought, “Huh, that’s weird,” and kept scrolling. I saw that exact response again and again. That convinced me there was something wrong.

How did you identify suspicious responses?

We used ‘skip logic’ to personalize the surveys. If the user clicks “yes, I am transgender”, they should see questions about their experience as a trans person. But because bots are following the underlying code rather than the logic of the survey, they will click that they’re cisgender (someone who identifies with the gender they were assigned at birth), and still answer questions about holding a transgender identity.

Bots were also able to skip questions that were required for all participants, and produce bundles of responses that were identical across all survey fields. Some bots started and stopped the study at the exact same time, amounting to 488 questions answered in just 7 minutes. That’s basically impossible to achieve with real respondents. It’s pretty unlikely that this many people started the study at the exact same minute and finished it exactly 7 minutes later.

How can other researchers protect their own surveys?

First, never use a public survey link. Unique, personalized links can prevent people from using the same IP address to submit hundreds of responses.

Second, use ‘honeypots’ — questions that no person should be able to see, which resemble ‘real’ questions. For instance, I added a question about marital status directly after one regarding relationship status, and then hid the marital-status question from human participants. A bot wouldn’t realize it should skip that question and would answer it instead.

Include open-ended questions, because that’s a really good way to detect suspicious patterns. And continuously check your data throughout the lifetime of your survey. It used to be acceptable to check your data every couple of days to ensure its integrity. It’s evident from my experience that that’s not enough.

How has the survey relaunch gone?

We’ve had some bumps along the way, but things are moving along quite well. We are no longer advertising our study through Twitter or on public pages. Instead, we recruit participants exclusively from college and university campuses, through queer-specific programmes and groups. We did run into a bit of a problem when one of our partners shared our study advert on Twitter. Minutes after posting we had about 100 false responses from bots. After they removed the post, the fake responses trickled in for a day or two before things returned to normal.