Since the emergence of digital banking and online shopping, it has never been easier for companies, banks, and customers to trade goods and transfer money. In this new era of mobile eCommerce, peer-to-peer (P2P) transaction platforms (e.g. PayPal, Venmo, Prosper) have emerged as an attractive alternative that bypasses traditional intermediaries, allowing them to offer customers very low (sometimes even zero) transaction fees.

P2P transaction networks lie at the core of several cryptocurrency-based marketplaces. In this context, one can trade goods and cryptocurrencies in a platform that has no middleman and offers anonymity to its users. However, just like in all other trading platforms, one can fall victim to a fraudster, losing money and trust in the platform. For this reason, P2P platforms strive to prevent fraudulent transactions by letting users rate each other after completing a transaction using perception or trust scores. Although useful, these perception scores are not sufficient when it comes to identifying sophisticated fraudsters that disguise themselves as trustworthy sellers.

Here I discuss a project I developed during my time as an AI Fellow at Insight Data that combines Graph Theory and Deep Learning to predict fraudulent transactions in two P2P Bitcoin marketplaces. Using recently-developed methods for representation learning on graphs, I built a classifier that discriminates between honest sellers and fraudsters in order to estimate the likelihood that a seller will commit fraud in a future transaction.

1. The problem with anonymous P2P transactions

Even though anonymity allows users to keep their sensitive data private, there are serious disadvantages that relate to fraudulent transactions:

Refunds are really hard to enforce. Without a middleman or a transaction intermediary, it becomes almost impossible to trace a fraudster after a money transfer is completed.

A user (green) has fallen victim of a fraudster (orange)

Perception scores are not very informative. It is often difficult to infer the trustworthiness of a seller when the numbers of positive and negative reviews are very similar. In other words, how large should the difference between positive and negative reviews be before we conclude whether a seller is honest?

Raw perception scores might not be very informative

Perception scores can be manipulated. Since buyers can only rely on reviews to assess the trustworthiness of a seller, sellers usually take advantage by inflating their own perception scores using multiple accounts to rate themselves very positively as to disguise as honest sellers.

Fraudsters often manipulate their perception scores

Anonymous P2P struggle to retain honest users. As a consequence of fraudulent transactions, honest users find it difficult to stay in the P2P network, which translates into preventing P2P platforms from acquiring and retaining users.

2. Enhancing perception scores with network theory

Given the problems described above and the fact that users are anonymous, it would seem like we can only count on perception scores to discriminate between good sellers and fraudsters. After all, unlike banks and other financial institutions that work on fraud prevention technologies, we don’t have any additional information about our users such as their credit scores or their financial history.

However, we could leverage the information given by the network itself. That is, the information implicitly contained in the transactions (edges) between users (nodes) in the P2P network. For example, a gang of fraudsters could be characterized by having very positive ratings (weights) among themselves, but very negative connections with users outside of their hub.

The structure of the transaction network contains implicit information that can help us identify fraudsters

In the following sections, I will describe the data and the methods I used to leverage both perception scores and graph representations to predict fraudulent transactions in two Bitcoin P2P marketplaces.

3. The Bitcoin P2P Marketplace data

I downloaded graph data of two Bitcoin P2P marketplaces (namely, Bitcoin OTC and Bitcoin Alpha) from the Stanford Large Network Dataset Collection. As shown below, the graph data are contained in tables listing the ratings or perception scores that buyers assigned to sellers after each transaction. The ratings lie within the range [-10,10] and only ~11% of the transactions are rated positively. In later sections I will describe how I addressed the class imbalance in the ratings.

A sample of transaction data contained in the Bitcoin Marketplace graphs. SOURCE refers to the buyer’s user ID, TARGET is the seller’s user ID, and the RATING is the perception score assigned by SOURCE to the TARGET

Concatenating the Bitcoin OTC and Bitcoin Alpha tables results in a graph with 59,788 transactions (edges) and 9,664 users (nodes). A portion (subgraph) of the complete network is shown below.

A subgraph of the Bitcoin Marketplace network with red nodes representing users and black arrows representing transactions. Arrows point towards the seller

To simplify the analysis, I reduced the transaction ratings to two classes: honest (+1) and fraudulent (-1).

The ratings distribution is imbalanced with 89% of the transactions rated positively and the rest rated negatively. Note that not a single transaction is rated with a score of 0

The task is to build a classifier that will identify which sellers are more likely to commit fraud given a particular buyer, considering both the neighborhood of the users and their overall perception scores.

4. Learning user representations with Node2Vec

In order to extract user features from its location in the transaction network, I used a Python implementation of the Node2Vec algorithm. Briefly, Node2Vec generates low-dimensional representations for each node in a graph by simulating random biased walks and optimizing a neighborhood preserving objective. In a way, these node representations contain information about the node’s function, its community, and its proximity to other nodes.

The Node2Vec algorithm generates vector representations for each node in a graph

First, I trimmed the original network and only kept 80% of the total transactions for training the classifier (see section 5). Then, I generated Node2Vec representations of size 14 and visualized them using t-SNE to project the representations in two dimensions. As shown below, the Node2Vec 2D projections clearly separate nodes into two regions, as would be expected from the fact that users come from two different marketplaces (Bitcoin OTC and Bitcoin Alpha)

Two dimensional projections of node representations learned with Node2Vec

To make more complete node representations, I concatenated 6 additional node features to each Node2Vec vector. These additional features are described in the image below.

Additional features that were concatenated with the 14-dimensional Node2Vec vectors. The meaning of each column is the following: in_degree = total incoming degree (ratings received), pos_in_edges = total positive ratings received, neg_in_edges = total negative ratings received. Similarly, out_degree, pos_out_edges, and neg_out_edges represent ratings given by the user to others.

Thus, for each node I constructed a 20-dimensional vector containing both Node2Vec features and perception scores.

5. Training and validating a Neural Net for binary classification

For each transaction in the network, I created a 40-dimensional vector representing a buyer-seller pair and I fed these vectors into a Neural Net (NN) with the following layers:

Two hidden layers of size 128 (ReLU activation)

One hidden layer of size 64 (ReLU activation)

One hidden layer of size 32 (ReLU activation)

One hidden layer of size 16 (ReLU activation)

One output layer of size 1 (sigmoid activation)

For the purpose of training this NN under supervised learning, I labeled the fraud transaction class as 1 and the honest transaction class as 0. Because of the class imbalance (89% of the training examples are honest transactions or 0's), I used bootstrapping to train the NN with a different balanced subset of training examples every 10 epochs for a total of 150 epochs. I called the resulting model TrustKeeper. Also, for comparison, I trained a second NN with the same architecture but using only the perception scores as input. I tested both TrustKeeper and this second NN predictions with the held-out 20% of network transactions. The resulting confusion matrices are shown below

Confusion matrix from a Neural Network model without Node2Vec embeddings

Confusion matrix from TrustKeeper

ROC curves for TrustKeeper and NN model without Node2Vec features

6. Conclusion

For P2P transaction networks, even a small improvement in successful fraud prediction rates can make the difference between staying in business and filing for bankruptcy. Many P2P services struggle to retain existing users and attract new ones due to the inherent properties of the platform. In this project I’ve explored how one can leverage network properties to enhance fraud prediction models in the absence of user’s metadata in anonymous Bitcoin transaction marketplaces.

TrustKeeper is able to predict fraudulent transactions with higher accuracy compared to simpler models that do not leverage node embeddings from Node2Vec. Thus, an immediate application of TrustKeeper would be the prevention of fraudulent transactions and recommendation of most honest sellers. Furthermore, by leveraging TrustKeeper, one could identify potential fraudster hubs and ban them from the P2P network. As a result, P2P platforms could increase user satisfaction scores which would directly translate into higher retention times and higher conversion rates for new users.

A full implementation of my TrustKeeper model, as well as a detailed summary of the model performance results can be found at https://github.com/Jhird/TrustKeeper.

Thanks for reading!