Machine learning hasn't been commoditized yet, but that doesn't mean you need a PhD

This post has been translated into Chinese here.

Here’s the latest installment of my Ask-A-Data-Scientist advice column. Please email your data science related quandaries to [email protected]. Note that questions are edited for clarity and brevity. Other posts include:

In the last week I received two questions with diametrically opposed premises: one was excited that machine learning is now automated, the other was concerned that machine learning takes too many years of study. Here are the questions:

Q1: I heard that Google Cloud announced that entrepreneurs can easily and quickly build on top of ML/NLP APIs. Is this statement true: “The future of ML and data post Google Cloud - the future is here, NLP and speech advancements have been figured out by Google and are accessible by API. The secret sauce has been commoditized so you can build your secret sauce on top of it. The time to secret sauce is getting shorter to shorter”?

Q2: Is it true that in order to work in machine learning, you need a PhD in the field? Is it true that before you can even begin studying machine learning, you must start by studying math, take “boring university level full length courses in calculus, linear algebra, and probability/statistics, and then learn C/C++ and parallel and distributed programming (CUDA, MPI, OpenMP, etc). According to this top rated comment on a Hacker News post, even after doing all that, we must then implement Machine Learning algorithms from scratch first in plain C, next in MPI or CUDA, and then in Numpy, before implementing them in Theano or TensorFlow.

A: It’s totally understandable that many people are having trouble navigating the hype, and the “AI is an exclusive tool for geniuses” warnings. AI is a hard topic for journalists to cover, and sadly many misrepresentations are spread. See for instance this great post for a recent case study of how DeepCoder was misconstrued in the media.

The answer to both of these questions is: NO. On the surface, they sound like opposite extremes. However, they have a common thread–many of those working in machine learning have an interest in either:

Convincing you to buy their general purpose machine learning API (none of which have been good for anything other than getting acqui-hired). Convincing you that what they’re doing is so complicated, hard, and exclusive, that us mere mortals have no chance of understanding their sorcery. (This is such a common theme that recently a reddit parody of it was voted to the top of the machine learning page: A super harsh guide to machine learning)

Yes, advancements in machine learning are coming rapidly, but for now, you need to be able to code to effectively use the technology. We’ve found from our free online course Practical Deep Learning for Coders that it takes about 70 hours of study to become an effective deep learning practitioner.

Why “Machine Learning As A Service” (MLaaS) is such a disappointment in practice

A general purpose machine learning API seems like a great idea, but the technology is simply not there yet. Existing APIs are too overly specified to be widely useful, or attempt to be very general and have unacceptably poor performance. I agree with Bradford Cross, former founder of Flightcaster and Prismatic and partner at Data Collective VC, who recently wrote about the failure of many AI companies to try to build products that customers need and would pay for: “It’s the attitude that those working in and around AI are now responsible for shepherding all human progress just because we’re working on something that matters. This haze of hubris blinds people to the fact that they are stuck in an echo chamber where everyone is talking about the tech trend rather than the customer needs and the economics of the businesses.” (emphasis mine)

Cross continues, “Machine Learning as a Service is an idea we’ve been seeing for nearly 10 years and it’s been failing the whole time. The bottom line on why it doesn’t work: the people that know what they’re doing just use open source, and the people that don’t will not get anything to work, ever, even with APIs. Many very smart friends have fallen into this tarpit. Those who’ve been gobbled up by bigcos as a way to beef up ML teams include Alchemy API by IBM, Saffron by Intel, and Metamind by Salesforce. Nevertheless, the allure of easy money from sticking an ML model up behind an API function doesn’t fail to continue attracting lost souls. Amazon, Google, and Microsoft are all trying to sell an MLaaS layer as a component of their cloud strategy. I’ve yet to see startups or big companies use these APIs in the wild, and I see a lot of AI usage in the wild so its doubtful that its due to the small sample size of my observations.”

Is Google Cloud the answer?

Google is very poorly positioned to help democratize the field of deep learning. It’s not because of bad intentions– it’s just that they have way too many servers, way too much cash, and way too much data to appreciate the challenges the majority of the world faces in how to make the most of limited GPUs, on a limited budget (those AWS bills add up quickly!), and with limited size data sets. Google Brain is so deeply technical as to be out of touch with the average coder.

For instance, TensorFlow is a low level language, but Google seemed unaware of this when they released it and in how they marketed it. The designers of TensorFlow could have used a more standard Object-Oriented approach (like the excellent PyTorch), but instead they kept with the fine Google tradition of inventing new conventions just for Google.

So if Google can’t even design a library that is easily usable by sophisticated data scientists, how likely is it that they can create something that regular people can use to solve their real-world problems?

The Hacker News plan: “Implement algorithms in plain C, then CUDA, and finally plain Numpy/MATLAB”

Why do Hacker News contributors regularly give such awful advice on machine learning? While the theory behind machine learning draws on a lot of advanced math, that is very different from the practical knowledge needed to use machine learning in practice. As a math PhD, knowing the math has been less helpful than you might expect in building practical, working models.

The line of thinking espoused in that Hacker News comment is harmful for a number of reasons:

It’s totally wrong

Good education motivates the study of underlying concepts. To borrow an analogy from Paul Lockhart’s Mathmatician’s Lament, kids would quit music if you made them study music theory for years before they were ever allowed to sing or touch an instrument

Good education doesn’t overly complicate the material. If you truly understand something, you can explain it in an accessible way. After weeks of work, in Practical Deep Learning for Coders, Jeremy Howard implemented different modern optimization techniques (often considered a complex topic) in Excel to make it clearer how they work.

As I wrote a few months ago, it is “far better to take a domain expert within your organization and teach them deep learning, than it is to take a deep learning expert and throw them into your organization. Deep learning PhD graduates are very unlikely to have the wide range of relevent experiences that you value in your most effective employees, and are much more likely to be interested in solving fun engineering problems, instead of keeping a razor-sharp focus on the most commercially important problems.

“In our experiences across many industries and many years of applying machine learning to a range of problems, we’ve consistently seen organizations under-appreciate and under invest in their existing in-house talent. In the days of the big data fad, this meant companies spent their money on external consultants. And in these days of the false ‘deep learning exclusivity’ meme, it means searching for those unicorn deep learning experts, often including paying vastly inflated sums for failing deep learning startups.”

Cutting through the hype (when you’re not an ML researcher)

Computational linguist Dan Simonson wrote a handy guide of questions for the to ask to evaluate NLP, ML, and AI and identify snake oil:

Is there existing training data? If not, how do they plan on getting it?

Do they have an evaluation procedure built into their application development process?

Does their proposed application rely on unprecedentedly high performance on specific AI components?

Do the proposed solutions rely on attested, reliable phenomena?

If using pre-packaged AI components, do they have a clear plan on how they will go from using those components to having meaningful application output?

As an NLP researcher, Simonson is excited about the current advances in AI, but points out that the whole field is harmed when people exploit the gap in knowledge between practiotioners and the public.

Deep learning researcher Stephen Merity (of Salesforce/Metamind) has an aptly titled post It’s ML, not magic: simple questions you should ask to help reduce AI hype. His questions include:

How much training data is required?

Can this work unsupervised (= without labelling the examples)?

Can the system predict out of vocabulary names? (i.e. Imagine if I said “My friend Rudinyard was mean to me” - many AI systems would never be able to answer “Who was mean to me?” as Rudinyard is out of its vocabulary)

How much does the accuracy fall as the input story gets longer?

How stable is the model’s performance over time?

Merity also provides the reminder that models are often evaluated on highly processed, contrived, or limited datasets that don’t accurately reflect the real data you are working with.

What does this mean for you?

If you are an aspiring machine learning practitioner: Good news! You don’t need a PhD, you don’t need to code algorithms from scratch in CUDA or MPI. If you have a year of coding experience, we recommend that you try Practical Deep Learning for Coders, or consider my additional advice about how to become a data scientist.

You work in tech and want to build a business that uses ML: Good news! You don’t need to hire one of those highly elusive, highly expensive AI PhDs away from OpenAI. Give your coders the resources and time they need to get up to speed. Focus on a specific domain (together with experts from that domain) and build a product that people in that domain need and could use.