This was the subject of a question asked on Quora: What are the top 10 data mining or machine learning algorithms?

Some modern algorithms such as collaborative filtering, recommendation engine, segmentation, or attribution modeling, are missing from the lists below. Algorithms from graph theory (to find the shortest path in a graph, or to detect connected components), from operations research (the simplex, to optimize the supply chain), or from time series, are not listed either. And I could not find MCM (Markov Chain Monte Carlo) and related algorithms used to process hierarchical, spatio-temporal and other Bayesian models. What else in missing?

In 2006, the IEEE Conference on Data Mining identified the top 10 ML algorithms as

C4.5 (Decision Trees) k-Means (clustering) Support Vector Machines (SVM) Apriori Expectation Maximization (EM) PageRank AdaBoost k-Nearest Neighbors (kNN) Naive Bayes Classification and Regression Tree (CART)

An answer to the Quora question, in 2011, lists the following as potential candidates or additions:

Kernel Density Estimation and Non-parametric Bayes Classifier K-Means Kernel Principal Components Analysis Linear Regression Neighbors (Nearest, Farthest, Range, k, Classification) Non-Negative Matrix Factorization Support Vector Machines Dimensionality Reduction Fast Singular Value Decomposition Decision Tree Bootstapped SVM Decision Tree Gaussian Processes Logistic Regression Logit Boost Model Tree Naïve Bayes Nearest Neighbors PLS Random Forest Ridge Regression Support Vector Machine Classification: logistic regression, naïve bayes, SVM, decision tree Regression: multiple regression, SVM Attribute importance: MDL Anomaly detection: one-class SVM Clustering: k-means, orthogonal partitioning Association: A Priori Feature extraction: NNMF

And a 2015 answer provides the following:

Linear regression Logistic regression k-means SVMs Random Forests Matrix Factorization/SVD Gradient Boosted Decision Trees/Machines Naive Bayes Artificial Neural Networks For the last one I'd let you pick one of the following: Bayesian Networks Elastic Nets Any other clustering algo besides k-means LDA Conditional Random Fields HDPs or other Bayesian non-parametric model

My point of view is of course biased, but I would like to also add some algorithms developed or re-developed at the Data Science Central's research lab:

Jackknife regression

Feature extraction / selection (mentioned above, but this version is very different)

Hidden decision trees

Indexation and tagging algorithms

These algorithms are described in the article What you wont learn in statistics classes.

Regarding the Indexation algorithms (see Part 2 after clicking on this link): This must be at least 20 years old. It is an incredibly fast clustering technique indeed: it does not require n x n memory storage, only n, where n is the number of observations. Also, it is easy to implement in distributed Map-Reduce or Hadoop environments. It is a fundamental algorithm: the core algorithm used to build taxonomies, catalogs (see this article about Amazon), search engines, and enterprise search solutions. DSC used it successfully in numerous contexts including for IoT automated growth hacking for digital publishing, to categorize articles and boost them depending (among other things) on category, for maximum efficiency. Here's another illustration.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge