$\begingroup$

What it's about

Just knowing about techniques is akin to knowing the animals in a zoo -- you can name them, describe their properties, perhaps identify them in the wild.

Understanding when to use them, formulating, building, testing, and deploying working mathematical models within an application area while avoiding the pitfalls --- these are the skills that distinguish, in my opinion.

The emphasis should be on the science, applying a systematic, scientific approach to business, industrial, and commercial problems. But this requires skills broader than data mining & machine learning, as Robin Bloor argues persuasively in "A Data Science Rant".

So what can one do?

Application areas: learn about various application areas close to your interest, or that of your employer. The area is often less important than understanding how the model was built and how it was used to add value to that area. Models that are successful in one area can often be transplanted and applied to different areas that work in similar ways.

Competitions: try the data mining competition site Kaggle, preferably joining a team of others. (Kaggle: a platform for predictive modeling competitions. Companies, governments and researchers present datasets and problems and the world’s best data scientists compete to produce the best solutions.)

Fundamentals: There are four: (1) solid grounding in statistics, (2) reasonably good programming skills, (3) understanding how to structure complex data queries, (4) building data models. If any are weak, then that's an important place to start.

A few quotes in this respect:

``I learned very early the difference between knowing the name of something and knowing something. You can know the name of a bird in all the languages of the world, but when you're finished, you'll know absolutely nothing whatever about the bird... So let's look at the bird and see what it's doing -- that's what counts.'' -- Richard Feynman, "The Making of a Scientist", p14 in What Do You Care What Other People Think, 1988

Keep in mind:

``The combination of skills required to carry out these business science [data science] projects rarely reside in one person. Someone could indeed have attained extensive knowledge in the triple areas of (i) what the business does, (ii) how to use statistics, and (iii) how to manage data and data flows. If so, he or she could indeed claim to be a business scientist (a.k.a., “data scientist”) in a given sector. But such individuals are almost as rare as hen’s teeth.'' -- Robin Bloor, A Data Science Rant, Aug 2013, Inside Analysis

And finally:

``The Map is Not the Territory.'' -- Alfred Korzybski, 1933, Science & Sanity.

Most real, applied problems are not accessible solely from ``the map''. To do practical things with mathematical modelling one must be willing to get grubby with details, subtleties, and exceptions. Nothing can substitute for knowing the territory first-hand.