Many articles have been written about the top machine learning algorithms: click here and here for instance. Most of them seem to define top as oldest, and thus most used, ignoring modern, efficient algorithms fit for big data, such as indexation, attribution modeling, collaborative filtering, or recommendation engines used by companies such as Amazon, Google, or Facebook.

I received this morning and advertisement for a (self-published) book called Master Machine Learning Algorithms, and I could not resist to post the author's list of top 10 machine learning algorithms::

Linear Algorithms:

Algorithm 1 : Linear Regression

: Linear Regression Algorithm 2 : Logistic Regression

: Logistic Regression Algorithm 3: Linear Discriminant Analysis

Nonlinear Algorithms:

Algorithm 4 : Classification and Regression Trees

: Classification and Regression Trees Algorithm 5 : Naive Bayes

: Naive Bayes Algorithm 6 : K-Nearest Neighbors

: K-Nearest Neighbors Algorithm 7 : Learning Vector Quantization

: Learning Vector Quantization Algorithm 8: Support Vector Machines

Ensemble Algorithms:

Algorithm 9 : Bagged Decision Trees and Random Forest

: Bagged Decision Trees and Random Forest Algorithm 10: Boosting and AdaBoost

Bonus #1: Gradient Descent

The Gradient Descent algorithm is also covered as it us used as the optimization algorithm at the core of so many machine learning algorithms

You can check the book here.

Some of these techniques such as Naive Bayes (variables are almost never uncorrelated), Linear Discriminant Analysis (clusters are almost never separated by hyperplanes), or Linear Regression (numerous model assumptions - including linearity - are almost always violated in real data) have been so abused that I would hesitate teaching them. This is not a criticism of the book; most textbooks mention pretty much the same algorithms, and in this case, even skipping all graph-related algorithms. Even k Nearest Neighbors have modern, fast implementations not covered in traditional books - we are indeed working on this topic and expect to have an article published shortly about it.

If anything, it proves that modern techniques take a lot of time to hit the classroom and the textbooks. You might have to attend classes taught by real practitioners (people who worked for big data solutions vendors) to learn modern tools that will give you a competitive edge on the job market. Though you can discover a lot of this free "hidden knowledge" on our website, using our data science search engine. An publisher such as O'Reilly, as well as some universities with an applied data science department, provide good education about these state-of-the-art techniques, with case studies. My upcoming book Data Science 2.0 will cover much of the topic, and my previous Wiley book is a good starting point. And you can learn quite a bit from our apprenticeship (for self-learners only at this time).

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge