First, let’s see how the tool works

Here is a brief overview (partially based on IBM’s Personality Insight’s GitHub page and other public documentation):

Input: Personality Insights takes your tweets, emails, text messages, blog posts, and/or anything written by the individual whose personality is being assessed. The tool currently supports English, Spanish, Japanese, Korean, and Arabic, although according to the website, the results for Arabic and Korean are not good enough to be conclusive. You can feed the tool with as little as 100 words to get a result, however, for the best accuracy you need around 3,000 words of an input text. (The demo and IBM’s documentation go into more detail about the acceptable formats for inputs.)

Output: After processing the input data, the tool returns the full result (in JSON or CSV format) showing your 52 personality characteristics in numerical scores in addition to your consumption behavior. The score is expressed as a percentage of the sample population. For example, if my “adventurous” characteristic score is 0.25 it means that based on my writing, I’m more adventurous than the 25% of the sample population and less adventurous than 75% of them.

Note: The sample population is comprised of Twitter users whose information was collected and analyzed by IBM’s Personality Insights. The sample population for each language is one million users for the English language, two-hundred thousand users for Korean, one-hundred thousand users for each of Arabic and Japanese, and eighty thousand users for Spanish. The demographics of the sample population — including age, gender, literacy-level, etc. — were not revealed.

The tool also supplies the raw scores if you want to do a custom normalization based on your own sample population (e.g. your score compared to the employees of the company you work for). More about output format and its interpretation can be found here and here.

Model: The underlying method is based on the Open-Vocabulary approach. This method was developed by researchers at the University of Pennsylvania who analyzed the Facebook statuses of 75,000 volunteer users. On the basis of this analysis and accompanying personality questionnaires, they built models to predict an individual’s age, gender, and personality.

The infrastructure of the Open Vocabulary language analysis (source)

Earlier versions of Personality Insights, however, used the Linguistic Inquiry and Word Count (LIWC) psycholinguistic dictionary. (You can read more about the LIWC dictionary here.)

To build the Personality Insights tool, IBM researchers also conducted a set of background studies and developed different machine learning models to understand the relationship between people’s Twitter activity and their personality characteristics. For example, by studying 3500 Twitter users, they found out that people who retweet more are more likely to be rated as modest, open, and friendly. To read and understand the background studies check out this link.

To put it in a nutshell, Personality Insights uses the open-source GloVe Word Embedding technique to build vector representation of each word of the input text. It then feeds them into a machine learning algorithm for training and testing (there is not any further explanation about the details of this algorithm; however in a study entitled 25 Tweets to Know You: A New Model to Predict Personality with Social Media, IBM researchers integrated GloVe word embedding features with Gaussian Processes regression to infer personality characteristics.)

Training: The model is trained based on surveys conducted among thousands of users, along with data from their Twitter feeds. There are not any further details about the demographics (age, gender, language, literacy-level) of the population who were surveyed, but previous IBM studies mostly used Twitter data and surveys from English speaking users to train and test their models.

Evaluation Metrics: To understand the accuracy of Personality Insights, IBM conducted a validation study by collecting survey responses and Twitter feeds of 1500 to 2000 participants for all languages. They then compared the survey scores with scores derived from Personality Insights and measured average Mean Absolute Error (MAE) and the average correlation between the two scores for different categories of personality characteristics. (Note that MAE is between 0 and 1, where 0 means the predicted score is the exact same as the actual (survey) score, and 1 means maximum error. Correlation is on a scale of -1 to 1. Note that the best average correlation is 0.35 which is not high, however, according to the IBM website, in the research literature for this domain, correlations greater than 0.2 are considered acceptable.)

Average Mean Absolute Error and Average Correlation by language for the IBM Personality Insights (source)

A few important points about the model: