A lot of neural networks are black boxes. We know they can successfully categorize things—images with cats, X-rays with cancer, and so on—but for many of them, we can't understand what they use to reach that conclusion. But that doesn't mean that people can't infer the rules they use to fit things into different categories. And that creates a problem for companies like Facebook, which hopes to use AI to get rid of accounts that abuse its terms of service.

Most spammers and scammers create accounts in bulk, and they can easily look for differences between the ones that get banned and the ones that slip under the radar. Those differences can allow them to evade automated algorithms by structuring new accounts to avoid the features that trigger bans. The end result is an arms race between algorithms and spammers and scammers who try to guess their rules.

Facebook thinks it has found a way to avoid getting involved in this arms race while still using automated tools to police its users, and this week, it decided to tell the press about it. The result was an interesting window into how to keep AI-based moderation useful in the face of adversarial behavior, an approach that could be applicable well beyond Facebook.

The problem

Facebook sees billions of active users in a month, and only a small fraction of those fall into the category that the company terms abusive: fake and compromised accounts, spammers, and those using the social network to run scams. So while the company can (and does) use human moderators, the problem is simply large enough that they can't be expected to catch everything. Which means an automated system of some sort is necessary if the service doesn't want to be swamped by content it doesn't want to see.

Facebook (or any other social network operator) will obviously have access to lots of data that can be used by an automated system: an account's posting history, details provided at sign-on, friend networks, and so on. And an algorithm could easily use that data to identify problematic accounts, including neural networks that are trained using the data and a human-curated list of problematic and acceptable behavior.

The problem, as mentioned above, is that the people running abusive accounts also have access to all this data and can potentially figure out the features that are causing accounts to be banned. Alternatively, they can change their behavior enough to avoid triggering suspicion. This raises the risk of an arms race, with the scammers perpetually getting a step ahead of the algorithms that are intended to catch them.

To avoid this, Facebook's researchers have shifted from using account data to what might be called account metadata. Rather than using the number of posts a given account might make, it looks at the number of post a typical friend's account makes. Similar values can be generated for the average number of friends that the account's friends are connected to, how often friend requests are sent, and so on. A series of values like this are combined into a profile that the company's researchers are calling a "deep entity."

The assumption here is that the typical account will establish relationships with accounts that are also closer to typical. Meanwhile, a spammer will probably have fewer connections with genuine accounts and more with things like bot accounts, which also display unusual patterns of behavior and connections. The deep entity profile captures these differences in aggregate and provides two key advantages: it's much harder for abusive account owners to understand what aspects of a deep entity are being used by an algorithm, and it's much harder for the account owners to change this, even if they could understand.

Deep-entity classification

Deep-entity classification is relatively simple, if a bit compute-intensive. It simply involves crawling the network graph of a given user and aggregating data from all its connections. Where things enter the realm of computer science is in how these classifications are used to actually identify problematic accounts.

Facebook engineers decided to use a neural network to perform the classification. That requires the network to have training data: deep-entity profiles that are tagged with indications of whether the account is problematic or not. Here, the engineers had two options. Work with other classification algorithms had produced a large volume of relatively uncertain data that flagged different accounts as problematic or not. Meanwhile, human moderators had gone through a much smaller collection of accounts but made much higher-quality calls regarding whether the account was abusive.

The folks at Facebook naturally decided to use both. They produced a two-tier system. In the outer tier, a multi-layer neural network used the low-quality training data to identify accounts with deep-entity profiles that were typically associated with odd behavior. While this neural network would naturally process the data until it arrived at a binary decision—abusive or not—the researchers actually stopped the analysis at the layer just short of the binary decisions.

By this point, the network had processed the original deep-entity information into a limited number of values that it would use to identify if an account's connections are unusual or not. These values could be extracted as a 32-number vector that captures the features that are typically associated with unusual accounts.

These values were then sent on to a second form of processing, using a machine-learning approach called a decision tree. This decision tree was trained using human-labelled account data. Critically, the Facebook engineers trained multiple decision trees: one for spammers, one for hijacked accounts, and so on. These decision trees make the final call about whether an account represents a problem and needs to be deactivated.

Computer science meets policy

The system has been in production a while now and has proven rather successful, blocking a minimum of half a billion accounts each quarter, with a high of over 2 billion blocks in the first quarter of last year. Blocked accounts can also be used to constantly retrain the system in the background, and it can evaluate its own metrics to determine when the retraining has progressed to the point where the in-production system can be productively replaced.

While the system may be effective, the decision about how to deploy the system (and how to integrate it with a larger strategy for acceptable content) is a matter of policy rather than computer science. Human moderators provide a higher level of accuracy in their calls regarding whether content is abusive, and a Facebook communications manager told Ars that the company is expanding its use of human moderators heavily. But humans can only act on content that has been reported, while the algorithms can work preventatively. So striking the right balance of investing in the two aspects of moderation is going to end up being a judgment call.

The other issue suggested by this technology is whether it can be deployed against the accounts that spread misinformation about topics such as climate change and health information—the latter issue looming larger as coronavirus spreads unabated. Here, the company has straddled an awkward line, trying to avoid becoming, in the words of its communication manager, "the arbiter of truth"—notably including a refusal to police the factual content of political ads. Its approach to outsourcing fact checking has drawn fire for allowing sites with a questionable history regarding facts to serve as fact checkers.

Facebook's communication manager told Ars that specific health claims that have been debunked by the WHO or CDC can be removed. But there's no indication that groups that repeatedly make such claims will ever see their accounts suspended—even though tools such as the one described here should make identifying them much simpler. Put differently, while Facebook's engineers may have done a masterful job at developing a system that can identify problematic accounts, deciding how to apply that technology remains a policy decision.