Stripe enables businesses in many countries worldwide to onboard easily so they can accept payments as quickly as possible. Stripe’s scale makes our platform a common target for payments fraud and cybercrime, so we’ve built a deep understanding of the patterns bad actors use. We take these threats seriously because they harm both our users and our ecosystem; every fraudulent transaction we circumvent keeps anyone impacted from having a bad day.

We provide our risk analysts with automated tools to make informed decisions while sifting legitimate users from potentially fraudulent accounts. One of the most useful tools we’ve developed uses machine learning to identify similar clusters of accounts created by fraudsters trying to scale their operations. Many of these attempts are easy to detect and we can reverse engineer the fingerprints they leave behind to shut them down in real-time. In turn, this allows our analysts to spend more time on sophisticated cases that have the potential to do more harm to our users.

Fraud in the payments ecosystem

Fraud can generally be separated into two large categories: transaction fraud and merchant fraud. Transaction fraud applies to individual charges (such as those protected by Radar), where a fraudster may purchase items with a stolen credit card to resell later.

Merchant fraud occurs when someone signs up for a Stripe account to later defraud cardholders. For example, a fraudster may attempt to use stolen card numbers through their account, so they’ll try to provide a valid website, account activity, and charge activity to appear legitimate. The fraudster hopes to be paid out to their bank account before Stripe finds out. Eventually, the actual cardholders will request a chargeback from their bank for the unauthorized transaction. Stripe will reimburse chargebacks to issuing banks (and by proxy, the cardholder) and attempt to debit the fraudster’s account. However, if they have already been paid out then it may be too late to recover those funds and Stripe ultimately covers those costs as fraud losses.

Fraudsters also may attempt to defraud Stripe at a larger scale by setting up a predatory or scam business. For example, the fraudster will create a Stripe account, claiming to sell expensive apparel or electronics for low prices. Unsuspecting customers think they are getting a great deal, but they never receive the product they ordered. Once again, the fraudster hopes to be paid out before they are shut down or overwhelmed with chargebacks.

Using similarity information to reduce fraud

Fraudsters tend to create Stripe accounts with reused information and attributes. Typically, low-effort fraudsters will not try to hide links to previous accounts, and this activity can be detected immediately at signup. More sophisticated fraudsters will put more work into hiding their tracks in order to prevent any association with prior fraud attempts. Some attributes like name or date of birth are trivial to fabricate, whereas others are more difficult—for example, it requires significant effort to obtain a new bank account.

Linking accounts together via shared attributes is reasonably effective at catching obvious fraud attempts, but we wanted to move from a system based on heuristics to one powered by machine learning models. While heuristics may be effective in certain cases, machine learning models are significantly more effective at learning predictive rules.

Suppose a pair of accounts are assigned a similarity score based on the number of attributes they share. This similarity score could then help predict future behavior: if an account looks similar to a known fraudulent account, there’s a significant likelihood they are more likely to also be fraudulent. The challenge here is to accurately quantify similarity. For example, two accounts who share dates of birth should have a lower similarity score than two accounts who share a bank account.

By training a machine learning model, we remove the need for guesswork and hand-constructed heuristics. Now, we can automatically retrain the model over time as we obtain more data. Automatic retraining enables our models to continually improve in accuracy, adapt to new fraud trends, and learn the signatures of particular adversarial groups.

Choosing a clustering approach

Machine learning tasks are generally classified as either supervised or unsupervised. The goal of supervised learning is to make predictions given an existing dataset of labeled examples (for example, a label that indicates whether an account is fraudulent), whereas in unsupervised learning the usual goal is learn a generative model for the raw data (in other words, to understand the underlying structure of the data). Traditionally, clustering tasks fall into the class of unsupervised learning: unlabeled data needs to be grouped into clusters that capture some understanding of similarity or likeness.

Fortunately, we’re able to use supervised models, which are generally easier to train and may be more accurate. We already have a large body of data demonstrating whether a given account has been created by a fraudster based on the downstream impact (e.g. we observe a significant number of chargebacks and fraud losses). This allows us to confidently label millions of legitimate and illegitimate businesses from our dataset.

In particular, our approach is an example of similarity learning where the objective is to learn a symmetric function based on training data. Over the years, our risk underwriting teams have manually compiled many examples of existing clusters of fraudulent accounts through our investigations of fraud rings, and we can use these reference clusters as training data to learn our similarity function. By sampling edges from these groups, we obtain a dataset consisting of pairs of accounts along with a label for each pair indicating whether or not the two accounts belong to the same cluster. We use intra-cluster edges as positive training examples and inter-cluster edges as negative training examples, where an edge denotes a pair of accounts.

We use known clusters of accounts to train our predictive models.

Now that we have the labels specified, we must decide what features to use for our model. We want to convert pairs of Stripe accounts into useful model inputs that have predictive power. The feature generation process takes two Stripe accounts and produces a number of features that are defined on the pair. Due to the rich nature of Stripe accounts and their associated data, we can construct an extensive set of features for any given pair. Some examples of the features we’d include are categorical features that store the values of common attributes such as the account’s email domain, any overlap in card numbers used on both accounts, and measures of text similarity.

Using gradient-boosted decision trees

Because of the wide variety of features we can construct from given pairs of accounts, we decided to use gradient-boosted decision trees (GBDTs) to represent our similarity model. In practice, we’ve found GBDTs strike the right balance between being easy to train, having strong predictive power and being robust despite variations in the data. When we started this project we wanted to get something out the door quickly that was effective, had well-understood properties, and was straightforward to fine-tune. The variant that we use, XGBoost, is one of the best performing off-the-shelf models for cases with structured (also known as tabular) data, and we have well-developed infrastructure to train and serve them. You can read more about the infrastructure we use to train machine learning models at Stripe in a previous post.

Now that we have a trained model, we can use it to predict fraudulent activity. Since this model operates on pairs of Stripe accounts, it’s not feasible to feed it all possible pairs of accounts and compute scores across all pairs. Instead, we first generate a candidate set of edges to be scored. We do this by taking recently created Stripe accounts and creating edges between accounts that share certain attributes. Although this isn’t an exhaustive approach, this heuristic works well in practice to prune the set of candidate edges to a reasonable number.

Once the candidate edges are scored, we then filter edges by selecting those with a similarity score above some threshold. We then compute the connected components on the resulting graph. The final output is a set of high-fidelity account clusters which we can analyze, process, or manually inspect together as a unit. In particular, a fraud analyst may want to examine clusters which contain known fraudulent accounts and investigate the remaining accounts in that cluster.

This is an iterative process; as each individual cluster grows, we can quickly identify increasing similarity as fake accounts in a fraudster’s operation are created. And the more fraud rings we detect and shutdown at Stripe, the more accurate our clustering model becomes at identifying new clusters in the future.

Each edge is weighted by a similarity score; we identify clusters by finding connected components in the resulting graph.

Benefits of the clustering system

So far, we’ve discussed the overall structure of the account clustering system. Although we have other models and systems in place to catch fraudulent accounts, using clustering information has the following advantages:

We’re even better at catching obvious fraud. It’s difficult for fraudsters to completely separate new accounts from previous accounts they’ve created in the past, or from accounts created by other fraudsters. Whether this is due to reusing basic attribute data or more complex measures of similarity, the account clustering system catches and blocks hundreds of fraudulent accounts weekly with very few false positives.

It’s difficult for fraudsters to completely separate new accounts from previous accounts they’ve created in the past, or from accounts created by other fraudsters. Whether this is due to reusing basic attribute data or more complex measures of similarity, the account clustering system catches and blocks hundreds of fraudulent accounts weekly with very few false positives. Fraudsters can only use their resources once. Whenever someone decides to defraud Stripe, they need to invest in resources such as stolen IDs and bank accounts, each of which incur monetary cost or inconvenience. In effect, by requiring fraudsters to use a new set of resources every time they create a Stripe account, we slow them down and increase the cost of defrauding Stripe. Clustering is a key tool since it invalidates resources such as bank accounts that have been previously used on fraudulent accounts.

Whenever someone decides to defraud Stripe, they need to invest in resources such as stolen IDs and bank accounts, each of which incur monetary cost or inconvenience. In effect, by requiring fraudsters to use a new set of resources every time they create a Stripe account, we slow them down and increase the cost of defrauding Stripe. Clustering is a key tool since it invalidates resources such as bank accounts that have been previously used on fraudulent accounts. Our risk analysts conduct more efficient reviews. When accounts require manual inspection by an analyst, they spend time trying to understand the intentions and motivations of the person behind the account. Analysts focus on the details of the business to sift legitimate users from a set of identified potentially fraudulent accounts. With the help of our clustering technique, analysts can easily identify common patterns and outliers and apply the same judgments to multiple accounts at once with a smaller likelihood of error.

When accounts require manual inspection by an analyst, they spend time trying to understand the intentions and motivations of the person behind the account. Analysts focus on the details of the business to sift legitimate users from a set of identified potentially fraudulent accounts. With the help of our clustering technique, analysts can easily identify common patterns and outliers and apply the same judgments to multiple accounts at once with a smaller likelihood of error. Account clusters are a building block for other systems. Understanding whether two accounts are duplicates or measuring their degree of similarity is a useful primitive that extends beyond the use cases described here. For example, we use the similarity model to expand our training sets for models which have sparse training data.

Catching fraud in action

Stripe faces a multitude of threats from fraudsters who attempt to steal money in creative and complex ways. Identifying similarities between accounts and clustering them together enhances our effectiveness and improves our ability to block fraudulent accounts. Clustering accounts together and identifying duplicate attempts to create fraudulent accounts makes life more difficult for fraudsters. One goal of our models is to change the economic model of fraud by raising the required cost for unused bank accounts, IP addresses, devices and other tools they use. This leads to a negative expected value for fraudsters, weakens the underlying supply chain for stolen credentials and user data, and disincentivizes committing fraud at scale.

We often think about fraud as an adversarial game; uncovering fraudulent clusters allows us to tip the game in our favor. Using common tools like XGBoost enabled us to quickly deploy a solution that naturally fit into our machine learning platform and allows us to easily adapt our approach over time. We’re continuing to explore new techniques to catch fraud to ensure Stripe can reliably operate a low-friction global payment network for millions of businesses.

Like this post? Join the Stripe engineering team. View openings