By Gregory Piatetsky, KDnuggets.

Top Analytics, Data Science, Machine Learning Tools

Software 2018

% share % change

2018 vs 2017 Python 65.6% 11% RapidMiner 52.7% 65% R 48.5% -14% SQL 39.6% 1% Excel 39.1% 24% Anaconda 33.4% 37% Tensorflow 29.9% 32% Tableau 26.4% 21% scikit-learn 24.4% 11% Keras 22.2% 108%

Python eats away at R

RapidMiner surges

"Like many vendors, RapidMiner promotes the KDnuggets survey to users through a number of channels, including sending a few emails to people who have used our product in the past 12 months. We've done the same promotion before, but two different things happened this year. First we received a much better response. Over 400 users personally replied to my email expressing how happy they were to help us out. But more importantly, we've seen a 300% increase in monthly active RapidMiner users over the past year, so we emailed more people than in prior years. We're humbled to have such an engaged and loyal user community."

SQL is steady

Trends

Tool % change 2018

% share 2017

% share Keras 108% 22.2% 10.7% PyTorch 92% 6.4% 3.4% Amazon Machine Learning 74% 3.3% 1.9% RapidMiner 65% 52.7% 31.9% Other free analytics/data mining tools 53% 8.3% 5.4% DeepLearning4J 39% 3.4% 2.4% Anaconda 37% 33.4% 24.3% PyCharm 33% 13.5% 10.1% Tensorflow 32% 29.9% 22.7% Excel 24% 39.1% 31.5% Tableau 21% 26.4% 21.8%

Consolidation

Tool % change 2018

% share 2017

% share Caffe -58% 1.5% 3.5% Microsoft Machine Learning Server (former R Server) -57% 2.1% 4.9% IBM Data Science Experience -55% 1.4% 3.2% KNIME -41% 12.3% 21.0% IBM Watson / Watson Analytics -35% 3.1% 4.8% Hadoop: Open Source Tools -35% 11.0% 16.8% Hadoop: Commercial Tools -33% 5.7% 8.5% SAS Enterprise Miner -30% 4.3% 6.2% IBM SPSS Modeler -29% 4.9% 6.9% Scala -29% 5.9% 8.3% SAS Base -29% 5.5% 7.7% Alteryx -28% 4.0% 5.7% MLlib -26% 3.8% 5.1% Theano -25% 4.9% 6.5%

Deep Learning Tools

Tensorflow, 29.9%

Keras, 22.2%

PyTorch, 6.4%

Theano, 4.9%

Other Deep Learning Tools, 4.9%

DeepLearning4J, 3.4%

Microsoft Cognitive Toolkit (Prev. CNTK), 3.0%

Apache MXnet, 1.5%

Caffe, 1.5%

Caffe2, 1.2%

TFLearn, 1.1%

Torch, 1.0%

Lasagne, 0.3%

Big Data Tools: Hadoop Drops

Tool % change 2018

% share 2017

% share Apache Spark -15% 21.5% 25.5% Spark SQL new 11.7% Hadoop: Open Source Tools -35% 11.0% 16.8% SQL on Hadoop tools -12% 10.2% 11.6% Hadoop: Commercial Tools -33% 5.7% 8.5%

Programming Languages

Python, 65.6% (was 59.0% in 2017), 11% up

R, 48.5% (was 56.6%), 14% down

SQL, 39.6% (was 39.2%), 1% up

Java, 15.1% (was 15.5%), 3% down

Unix, shell/awk/gawk, 9.2% (was 10.8%), 15% down

Other programming and data languages, 6.9%, (was 7.6%), -9% down

C/C++, 6.8%, (was 7.1%), 3% down

Scala, 5.9%, (was 8.3%), 29% down

Perl, 1.0% (was 1.9%), 46% down

Julia, 0.7% (was 1.2%), 45% down

Lisp, 0.3% (was 0.4%), -25% down

Clojure, 0.2% (was 0.3%), -38% down

F, # 0.1% (was 0.5%), -73% down

The 19th annual KDnuggets Software Poll had over 2,300 voters, somewhat less than in 2017, perhaps because only one vendor - RapidMiner - had a very active campaign to vote in KDnuggets poll. On average, a participant selected about 7 different tools used, so votes with just one tool selected stood out. We removed about 260 such "lone" votes (which mainly were from RapidMiner), because even if they represented legitimate users of that tool, their experience was very atypical and would skew the results. To compare "apples" to "apples" I also removed such lone votes from 2016 and 2017 data (about 11% in 2017 and 12% in 2016), so the 2017 percentage in this blog for most tools will be slightly higher than what was reported in 2017 post.Here is my initial analysis, based on 2052 participants, after "lone" voters were removed. More detailed association analysis and anonymized data will be published in about 2 weeks.(* for a more valid comparison, we recomputed the results of 2016, 2017 polls to exclude "lone" votes)Here are the top 11 tools, which all had at least 20% share.Hereis % of voters who used this tool,is the change in share vs 2017 Software Poll , with green and red highlighting changes up and down of 10% or more.The average number of tools per respondent was 7.0, slightly higher than 6.75 in 2017 Poll (also excluding just 1-tool responses).Compared to 2017 Software Poll, the one new entry is Keras. Knime dropped from top 11, perhaps because this year they did not have a campaign among their users to vote.Here are some observations.Python already had over 50% share in 2017, and increased its share to 66%, while R share has decreased for the first time since we have done this poll, and dropped to below 50%.RapidMiner, which was the top Data Science platform in the past several polls, dramatically increased its share to about 50%, up from 33% in 2017.What part of this is due to user growth, and what part to vendor promotion?I asked RapidMiner what they did to encourage their users, and here is a response from Ingo Mierswa , RapidMiner founder and president.For the record I note that RapidMiner is not a current advertiser on KDnuggets.SQL, including Spark SQL, and SQL to Hadoop tools, continues to have a share of about 40% in each of the last 3 polls. So, if you are an aspiring Data Scientist, learn SQL - it will likely be useful for a long while!The only new entry in the poll with over 2% share of usage was Spark SQL, with 11.7% share.The table below lists the tools that have grown 20% or more in share and reached at least 3% share in 2018.We note that among 56 tools with 2% or higher share in 2017, 19 (only about one third) have increased share in 2018, while 37 have dropped in share. This, along with recent acquisitions (Datawatch buying Angoss, Minitab buying Salford) suggests that consolidation of Data Science platforms is on the way.Tools that had at least 3% share in 2017 and declined 25% or more in their share in 2018 are in the next table.The share of voters who used Deep Learning tools remained stable, at 33% of voters, vs 32% in 2017 and 18% in 2016.Google Tensorflow is by far the dominant platform, but Keras emerged as a very popular wrapper on top of Tensorflow.Top Deep Learning tools were:In 2018, about 33% used Big Data tools, either Hadoop or Spark - about the same as in 2017, but Hadoop usage has markedly declined - about 30%.Here are the details:Python seems to swallow not only R, but also most other languages, except for SQL, Java, C/C++ which remained at about the same level. R has declined for the first time since we have run this survey. Other languages have also declined.Here are the main programming languages sorted by popularity.Next page shows full 3-year results, regional participation, and links to past polls.