The goal of this article is to explain at a high level the concept of machine learning. We'll say nothing about implementation details.

Lets say my name is Tom, and I start a car insurance company with 50 customers. We'll pretend this number is normal for car insurance companies.

Year 1

A Simple Average Across the Board

I charge all 50 customers an average amount. Things are going well. We lose a customer every once in a while. But that's normal. We gain one, too, every once in a while. However, over the course of the year we're becoming less and less profitable!

I look at the data and realize that our best customers (the ones that very rarely get into accidents) are the ones that are leaving, and the customers that are staying (and our new ones) are bad customers (ones that get into accidents alot!) Yikes! Other insurance companies must have some rubric for recognizing good customers, and be luring away our good customers with a better-than-average price.

Year 2

Introducing My Pre-Conceived Stereotypes

I gather all the information we know about our customers (age, type of car, city) and make a chart.

Age Car Type City 25 2008 Honda Civic Nashville 65 2018 Toyota Prius Brentwood 45 2001 Ford Torus Coopertown ...

I think to myself: young drivers and older drivers are probably more likely to get into accidents; people living in cities probably are more likely too. And I change our prices accordingly.

Year 3

Single-purpose Input Data and Result-based Human Learning

Things have actually gotten worse. Most surprisingly, this was driven by two "bad" customers leaving, and two "good" customers joining.

I realize my conception about drivers is not accurate, and it's costing me. I realize I must use actual results data (accidents) to derive my conception of a good/bad customers. However, even my inputs have implicit connotations. I make a new chart with fine-tuned inputs conveying exactly one thing (no secret (secondary) connotations), and add the result data as the final column.

Here is my new table I'll use to inform my learning:

Driver Age Car Age Car Self Stop Car Blindspot Detection City Density (population per sq mile) Number of Accidents 25 11 no yes 80,000 1 65 1 yes yes 20,000 0 45 18 no no 10,000 1 ...

Using result data, I can start to see trends: car safety features matter a lot, age is a mixed bag, city density seems to matter... I adjust prices accordingly.

Year 4

Getting Better Data for my Human Learning

Things have started to get better! Every few months I look at the data again (with updated results / new drivers) and immensely enjoy looking for the thread that connects good drivers vs bad drivers. To my frustration, though, I keep bumping up against a limit - insufficient data. Not results data, but input data. I'll be finding a nice thread, only to see it disappear for some reason that isn't on my chart. Age for example. There are cases where it is the only factor to explain higher percentages of accidents, but there are other cases where it seems to be a non-factor. I am missing some other data input that would account for this. Maybe old age is a factor but only for those driving at night a lot.. I am obsessed with finding the thread, and learn about hardware customers can put in their car so that I can gather data about their actual driving behavior! I immediately offer 10% for drivers who install it.

A month later I get the first data dump, and it is like Christmas morning!! I have average data into: speed, acceleration from stop, deceleration to stop, music level, percentage of times blinker applied when turning, percentage of time driving with headlights on.

I add these attributes to my data:

Driver Age Car Age Car Self Stop Car Blindspot Detection City Density (population per sq mile) Avg. Speed Avg. Accel Avg. Decel Avg. Music Level Pct. Blinker When Turning Pct. Headlights Number of Accidents 25 11 no yes 80,000 25 1.4 1.2 40 88 22 1 65 1 yes yes 20,000 45 .8 1.9 30 92 2 0 45 18 no no 10,000 55 1 1 32 97 30 1 ...

Wow! I spend days and days in my office trying to find the thread... elderly who drive at night a lot and decelerate quickly, young who have loud music and don't use their blinkers when turning... these make sense in my mind (perhaps vision problems and distraction, respectively). But it bothers me that while it is a very strong indicator, there are notable exceptions for each case. Using the previous data, there was a weak indication with many outliers. Now it is a stronger indication (75%!), and with less outliers, but it feels like I'm missing a third or even fourth or fifth factor for each of these cases!

Year 5

The Crossroads

My company is doing great again, even with a model that is not as accurate as I want it to be. But I find myself at a cross-roads. I've learned of two paths I could go down.

The first approach would be to create a feedback mechanism into the car hardware. Based on the learnings from my data, I can have the hardware "beep" at the driver to indicate they are in a bad situation. For example, the young driver can have loud music (23%), or occasionally miss applying a blinker (31%), but if both (75%) it crosses my chosen 70% threshold, and will beep. It seems like this would help drivers be safer, and turn bad drivers into good ones! This would help our company!

The second approach is to apply Machine Learning to the data. With machine learning, an algorithm would look at the data inputs and result (accidents) and itself decide what inputs (by themselves, but more likely coupled with any number of other inputs) are factors. I read that whereas humans have a difficult time finding deeply-nested related factors, machines can do this easily. The tradeoff of this approach, is that I no longer know which inputs are the ones which result in the projected output - and even if I did, it might be so intertwined and nuanced (maybe even touching every input under different circumstance) - that I can not have the hardware "beep" at the driver: it'd be far too nuanced for a human to backward derive from a simple "beep".

I've seen so much promise with my Human Learning model, that I decide to go the second path and see if a machine can push it further.

Year 6

Machine Learning

I spend some time "normalizing" my inputs for machine learning. I already have each input expressing only one thing, but now need to adjust all values to be values between 0 and 1.



const normalize = (val, min, max) => (val - min) / (max - min);

I assume a min/max based on the data I have, not theoretical. (EG, min age is 25, max is 65). For any new data I get, everything is recomputed anyway.

Driver Age Car Age Car Self Stop Car Blindspot Detection City Density (population per sq mile) Avg. Speed Avg. Accel Avg. Decel Avg. Music Level Pct. Blinker When Turning Pct. Headlights Number of Accidents 0 .6 0 1 1 0 1 .2 1 0 .7 1 1 0 1 1 .1 .7 0 1 0 .4 0 0 .5 1 0 0 0 1 .3 0 .2 1 1 1 ...

I am curious how the machine does! So, like before, I omit one record at random, perform the learning (machine instead of my own), and check the one omitted record. The record has an accident, and the machine correctly projects it! I do this over and over, and find that it has a 92% correct projection! Not just under a favorite situation (my "old age" logic accounting for elderly drivers at night was 85% - but that was my cherry-picked best scenario!) but as an average across any driver data.

I am incredulous, and spend the year changing my business model.

Year 6

Use Machine Learning to Upset the Market

I create hardware similar to the data-gathering hardware I used before, but which transmits the data in real time to our servers. Customers pay based on their real time risk. There is no 6 month plans or contracts of any type. We don't run any background information or offer any pre-set pricing. You incur cost as you drive, commensurate with your real-time driving behavior.

By year end, we upset the coverage industry and have almost half the market. The following year we eat the whole pie.

Reflections from the Caymans

As I sip a Mojito, retired on a beach, I think back through my years.

My first year neglected that patterns exist: I charged every driver the same.

The next few years I struggled to identify the patterns.

The final years I realized the futility of the human mind to maintain large number of interconnected factors (A if a lot of B and no C, coupled with D if alot of A otherwise E, negated if F, always disregarding the popular G as a red herring, etc, etc). Human minds cannot handle many factors; machines can; and so I had machines making these decisions.

I think to myself how cruel the human condition: receiving an overload of inputs in life, and not having the mechanism to correctly understand the true patterns. How thousands of years of history have seen humans erecting systems of understandings based on low-hanging fruit, not deep and nuanced learnings, that unfairly classifies and discriminates - that mistakes the important issues (and solutions) for popular red herrings. How stereotypes exist as the reduction of the true number of factors constituting a realty.

But what do I know, I'm just a car insurance guy drunk on a beach.