By Gregory Piatetsky, KDnuggets.

Boosting , up 40% to 32.8% share in 2016 from 23.5% share in 2011

, up 40% to 32.8% share in 2016 from 23.5% share in 2011 Text Mining , up 30% to 35.9% from 27.7%

, up 30% to 35.9% from 27.7% Visualization , up 27% to 48.7% from 38.3%

, up 27% to 48.7% from 38.3% Time series/Sequence analysis , up 25% to 37.0% from 29.6%

, up 25% to 37.0% from 29.6% Anomaly/Deviation detection , up 19% to 19.5% from 16.4%

, up 19% to 19.5% from 16.4% Ensemble methods , up 19% to 33.6% from 28.3%

, up 19% to 33.6% from 28.3% SVM , up 18% to 33.6% from 28.6%

, up 18% to 33.6% from 28.6% Regression, up 16% to 67.1% from 57.9%

K-nearest neighbors, 46% share

PCA, 43%

Random Forests, 38%

Optimization, 24%

Neural networks - Deep Learning, 19%

Singular Value Decomposition, 16%

Association rules, down 47% to 15.3% from 28.6%

Uplift modeling, down 36% to 3.1% from 4.8% (that is a surprise, given strong results published)

Factor Analysis, down 24% to 14.2% from 18.6%

Survival Analysis, down 15% to 7.9% from 9.3%

Employment Type % Voters Avg Num Algorithms Used % Used Super-

vised % Used Unsuper-

vised % Used Meta % Used Other Methods Industry 59% 8.4 94% 81% 55% 83% Government/Non-profit 4.1% 9.5 91% 89% 49% 89% Student 16% 8.1 94% 76% 47% 77% Academia 12% 7.2 95% 81% 44% 77% All 8.3 94% 82% 48% 81%

Algorithm Industry Government/Non-profit Academia Student All Regression 71% 63% 51% 64% 67% Clustering 58% 63% 51% 58% 57% Decision 59% 63% 38% 57% 55% Visualization 55% 71% 28% 47% 49% K-NN 46% 54% 48% 47% 46% PCA 43% 57% 48% 40% 43% Statistics 47% 49% 37% 36% 43% Random Forests 40% 40% 29% 36% 38% Time series 42% 54% 26% 24% 37% Text Mining 36% 40% 33% 38% 36% Deep Learning 18% 9% 24% 19% 19%

Latest KDnuggets Poll askedHere are the results, based on 844 voters.The top 10 algorithms (and methods) and their share of voters are:See full table of all algorithms and methods at the end of the post.(Note: The goal of the poll was to find the top tools used by Data Scientists, but the word "tools" is ambiguous, so for simplicity I originally called this table top 10 "algorithms". Of course, as many of you justifiably pointed out, Statistics or Visualization ( and several other options) are not algorithms, but can be better described as methods or approaches. I stand corrected and renamed this post to "Top 10 algorithms" .)The average respondent used 8.1 algorithms/methods, a big increase vs a similar poll in 2011.Comparing with 2011 Poll Algorithms for data analysis / data mining we note that the top methods are still Regression, Clustering, Decision Trees/Rules, and Visualization. The biggest relative increases, measured by (pct2016 /pct2011 - 1) are forMost popular among new options added in 2016 areThe biggest declines are forThe following table shows usage of different algorithms types: Supervised, Unsupervised, Meta, and other by Employment type. We excluded NA (4.5%) and Other (3%) employment types.We note that almostGovernment and Industry Data Scientists usedthan students or academic researchers,andNext, we analyzed the usage of top 10 algorithms + Deep Learning by employment type.To make the differences easier to see, we compute the algorithm bias for a particular employment type relative to average algorithm usage as Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All) - 1.We note that Industry Data Scientists are more likely to use Regression, Visualization, Statistics, Random Forests, and Time Series. Government/non-profit are more likely to use Visualization, PCA, and Time Series. Academic researchers are more likely to use PCA and Deep Learning. Students generally use fewer algorithms, but do more text mining and Deep Learning.Next, we look at regional participation which was representative of overall KDnuggets visitors.