My next immediate thought was “How can I be sure that these numbers are even remotely correct?”. The difficulty in attempting to compare the results of Watson’s profile with the results of more conventional personality tests is that Watson gives much finer grained analysis of one’s personality. Conventional personality tests don’t even kid themselves into believing that they can provide such precise detail about one’s personality, so it is kinda impossible to get unified metrics on all these stats. However, in pursuit of thoroughness, I took a couple other personality tests online in addition to asking some of my closest friends to rate me on a scale from 1–10 on some of the categories that IBM Watson scored me on [data link]. These extra measures are meant to provide a means to objectively judge the performance of Watson’s assessment of my personality. I will add the results as they come trickling in.

It is also probably of note that their is a distinct difference between the person that I think I am when compared with the person that others perceive me to be. Especially because the corpus of text that was used to build my personality profile was taken from my private journal entries. It would be an interesting study to compare how Watson scores my personality based on my journal entries and a corpus of my dialogue, if a sufficiently large corpus of text could be ascertained.

I took the Humanmetrics Jung Typology Test (pictured left) [results link]. It was a 72 question test that I doubted strongly until it’s results were so strongly corroborated by another personality test based off Myers-Briggs offered by 16 Personalities (results pictured below). The Myers-Briggs personality test is somewhat of an industry standard in measuring personality. Although the test self-ascribes itself to be a indicator of how people perceive and interact with the world and not necessarily their personality (as it is classically perceived).

Results from the personality test offered by 16personalities.com.

It is difficult to interpolate the sparse results from these sub-100 question tests to the 52 traits that Watson rated me on, but there are a couple points that can be compared rather directly. All tests seem to agree that I am not quite an introvert or an extravert. Watson scored me as a 46% on the Extraversion scale which very closely aligns with the results from the pseudo-Myers-Briggs tests.

Both MB tests agree on their designation of me as a ENTJ which is a rather stringent leader type. While some of the ratings that Watson gives could be used to support this claim: high uncompromising (85%), moderate trust (59%), extremely high challenge (100%), moderate cautiousness (54%), others seem to provide evidence in the opposite direction. My moderately low assertiveness score (35%), high susceptibility to stress (94%), high self consciousness (86%), and low self-discipline (6%) all seem to chip away at my ability to be a great leader. It is of course difficult to judge these two very different assessments of personality on the same scale, but differences such as this make it difficult to trust Watson’s results entirely.

As I mentioned previously, I asked some of my close friends to rate me on sixteen of the categories of that Watson scored me on. The results of that test can be found here. After a quick naïve analysis of the results, it appears that the ratings given by my friends and those given by Watson are entirely uncorrelated, but after closer inspection I found that they are only mostly entirely uncorrelated. By the very nature of statistical analysis it is unsound to compare only the elements of a set that correlate closely with one another, even if wonderful, non-mathematical rational can be provided for why it should be so. The average difference between my friends ratings and Watson’s ratings is 32 points with a Standard Deviation of 20.4 points. When furthermore compared with a correlation coefficient that is negligible, it can be determined that a useful comparison can’t be made. Of course the ratings provided by my friends are far more likely to be erratic and error-prone, but it is troubling that there is simply no correlation between the two sets of ratings.

Evolution of Emotionality

The final leg in my analysis of my personality was to track the evolution of my personality over time. I have kept a digital journal for going on four years now in which I have written a total of over 82,000 words in 308 distinct journal entries. Because the corpus is so large, I feel rather comfortable in assuming that my results will not be inaccurate if I divide it into several discrete pieces to provide the axis of time in this study.

It is probably worth mentioning that when I stared journaling in 2011, I was just starting my junior year in high school. My digital record, therefore, tracks my transition from high school to college and more importantly, follows me during a period of intense individualization. Since I have started journaling, I have decided who I really wish to become and have made grand strides into becoming that person. So this is an excellent circumstance to test Watson’s ability’s to passively discern personality.

In order to properly study my evolution, I chopped up the corpus of my journal entries by the calendar year. I then fed each discrete segment into Watson’s User Modeling demo to ascertain the various levels of the measured characteristics at each stage. I published the raw data from these tests online for your viewing pleasure. It is also worth noting that only 2% of of my journal was written in 2011, so it doesn’t offer really the richest set of data points. Despite the limited scope of this segment, I do think that my writing is pretty indicative of my emotional state at the time.

I then decided to see if I could uncover any underlying trends over time. My first intuition was to plot all 52 elements over the full interval and see if I could discern any trends, but instead I was presented with a rat’s nest of lines that is pictured below. Although in this amalgamation of colored lines it can be sort of determined that my personality profile tends to favor the extremes of the scale. I’m guessing this is an artifact of an imperfect algorithm present in Watson, but it may very well be that my personality in particular favors the extremes. Another accompanying conjecture is that there seem to be two basic trends away from the middle of the scale, until 2014 when the traits at the lower end of the scale fan out again.

All characteristics plotted concurrently. Notice the clustered around 100% and 0%.

With my recent failure in mind, I decided to see if I I then wanted to see if there were actually any trends that manifested themselves over time. In order to do this, I calculated the linear correlation coefficients for each characteristic and then grabbed the eight that both decreased and increased the most over the interval.