Finance, Machine Learning and AI

Enhancing Traditional Credit Scoring With Social Data

The Limitations of Traditional Credit Scoring

The success of any lending institution depends heavily on its ability to identify and approve as many new loans as possible, while keeping risks at a minimum. A key ingredient in this process is credit scoring. In theory, analysts at banks review applicant scores and then decide loan amounts, interest rates, loan periods and so forth. In practice, however, it’s rarely that simple. There are several gaping holes in traditional scoring systems.



One example: traditional credit scores is that it fails to take into account people who don’t have credit cards. About a third of millennials have never applied for cards, which means they have no scores. And as Millennials’ buying power increases (they’re already touted to be the generation with the most buying power by 2018), lending firms need to adjust their credit scoring processes to keep up.



It isn’t just thin- and no-file Millennials; even someone who earns a significant amount of money and spends it on a frequent basis often doesn’t have that showing up on his or her credit history. Nor do people who earn a significant amount of money and not spend it at all, show up! Besides, if the scoring process is down right, a credit worthiness report needn’t merely serve as a tool to assist in lending money to a given individual. It can be used for a wide variety of applications, including estimating point of equilibrium between default and loan off take.



Credit Scoring Isn’t One-Size-Fits-All

Loan applicants are diverse in age, socioeconomic background and so many other parameters. Besides, there is still a lot of confusion surrounding FICO, Vantage, TransUnion and other credit scores. More often than not, applicants don’t even understand whether or not they’re eligible for loans. Given these issues, and the fact that a lot of Millennials are shying away from financial products that enable traditional credit scoring (credit cards, mortgages, etc.), financial institutions need to move away from cookie cutter credit scoring models, and adopt a more personalized approach. We’re going to explain how.



We’ve worked with financial institutions and products to personalize the credit scoring process, by integrating customers’ overall social footprint with credit scores to determine credit worthiness. We’ve also worked with popular credit scoring products like CoreLogic to build an additional layer of scoring intelligence, improving delinquency predictions by 5-6 percentage points, and radically increasing revenue. We’re going to use our learnings to explain how existing credit scoring models can be improved.



Enter “Enhanced Credit Score”

This method of scoring does not restrict itself to the conventional credit score, which is a one size fits all score for all consumers. Instead it optimizes the score for each purpose and enriches the data with external data elements to enable an accurate prediction for a specific purpose. For example, a person with a lifestyle that’s flamboyant and includes risky behavior correlates with greater risk than a person with similar income levels who’s got a quieter lifestyle. We can create behavioral credit scores in two ways: Using public (open) data, and social data (from social media).



Public Data:

There are a number of public data sources that can be used to track a customer’s spending patterns. In many countries, purchase of goods such as a house or a car is public by government mandate. In USA, the Federal Trade Commission has offered a massive amount of public data through its Data.gov and Usaspending.gov websites. The data archives cover consumer spending activity, school enrolment, American Housing Survey and various other data sets from which meaningful insights about spending capability can be extracted.



There are other one-off instances of crucial spending-related data being made public. For example, as part of a larger effort to improve transparency, Freddie Mac made available Single Family Loan-Level Datasets, and Lending Club, a leading P2P lending firm, released all of its loan data on Kaggle.



Social Data:

Crawling social media for lifestyle choices and studying its impact on actual default is one of the best ways to expand the base for lending while at the same time decreasing the rates of default. Social media information presents a diverse range of behavioral and spending signals that can be utilized to create a very well-rounded credit worthiness profile.



An increasing number of startups are relying on social data to assess consumer creditworthiness. Social network-based credit scoring and financing practices broaden opportunities for a larger portion of the population and may benefit low-income consumers who would otherwise find it hard to obtain credit.



A Combined Behavioral Credit Score:

With a combination of social and public data and based on Machine Learning algorithms for prediction, a powerful behavioral credit profile can be built. Behavioral credit scores of existing customers can be used in the early detection of high-risk accounts and enable targeted interventions, for example by pro-actively offering debt restructuring. Behavioral credit scores also form the basis for more accurate calculations of the total consumer credit risk exposure, which can result in a reduction of bad debt provision.



Similarly, the probability of recovering a loan that’s been defaulted on can be estimated based on enhanced scoring techniques. This helps lenders optimize their collection effort, focusing on maximizing recovery.



Enhanced Credit Scoring Models

So far, we’ve looked at where we can get data from, for enhanced credit scoring. Here’s a quick look at what powers our enhanced credit scoring models.



There are a variety of model types, such as scorecards, decision trees or neural networks. When you evaluate which model type is best suited for achieving your goals, you may want to consider criteria such as the ease of applying the model, the ease of understanding it and the ease of justifying it. At the same time, for each particular model of whatever type, it is important to assess its predictive performance, i.e. the accuracy of the scores that the model assigns to the applications and the consequences of the accept/reject decisions that it suggests. The best model will, therefore, be determined both by the purpose for which the model will be used and by the structure of the data set that it is validated on.



Scorecards:

The traditional form of a credit-scoring model is a scorecard. This is a table that contains a number of questions that an applicant is asked (called characteristics) and for each such question a list of possible answers (called attributes). One such characteristic may, for example, be the age of the applicant, and the attributes for these characteristics then are a number of age ranges that an applicant can fall into. For each answer, the applicant receives a certain amount of points – more if the attribute is one of low risk, less vice versa. If the application’s total score exceeds a specified cut-off amount of points, it is recommended for acceptance. This is less of a model and more of a heuristic. It fails to learn and does poorly when the goal is to maximize the loan amount without default. It merely tries to avoid default.



Regression and Decision Trees:

These are both examples of other conventional techniques that are adopted by the lenders. However, there is a fundamental problem associated with these models: they diminish the richness of information that the organization can collect on the applicants and thereby erode the basis for future modeling. With the decision tree, we could see that there is such thing as a decision rule that is too easy to understand and thereby invites fraud.



A way forward is using a more generic model, such as Neural networks. But Neural Networks require a lot more features for them to work. Neural networks are extremely flexible models that combine combinations of characteristics in a variety of ways. Their predictive accuracy can, therefore, be far superior to scorecards and they don’t suffer from sharp ‘splits’ as decision trees do. However, it is virtually impossible to explain or understand the score that is produced for a particular application in any simple way. It can therefore be difficult to justify a decision that is made on the basis of a neural network model. A neural network of superior predictive power therefore is best suited for certain behavioral or collection scoring purposes, where the average accuracy of the prediction is more important than the insight into the score for each particular case.