It’s nothing like at a big mature company.

This’ll probably be an unbounded series of posts that spawned from this question that came across the awesome community that is the data-nerd twitter cluster:

Some Background

I’ve spent almost 12 years now at companies sized between 15–150 wearing various hats of data analyst, engineer, and occasionally, scientist. Wandering into mega-corp Google Cloud as a UX researcher is a bit out of character, but with new products constantly being churned out, it feels like a startup in terms of questions and chaos, despite the billions of gross revenue involved.

I’ve worked in a mix of businesses from interior design (office design), ad-tech, social networks, link shortening, e-commerce, and now enterprise cloud. I come from a social science background, with a dash of NLP, applied math, and business administration. I mostly live in back end systems, logs, and SQL.

In a word, I’m a generalist, the sort that people seem to recommend for startups, and that’s where I’ve thrived my whole career.

Those are my biases, and the experiences I draw from. If you’re in a startup where AI/ML is literally the core foundation of the business, I’m likely irrelevant to your needs. YMMV.

Being the 1st “Data Person”

I’ll go ahead and say it, small companies don’t need a data scientist, but they need a “data person”. They might call the job “data scientist/engineer/analyst/ninja”, whatever.

My experience is that somewhere between 20 and 60 employees, there’s enough customers, accumulated data and role specialization that the need to bring in someone who can use data to give useful business insight starts to justify the cost of hiring someone. Until then, they make do with the skills available.

The job title can be almost anything, but the job descriptions tend to be various mixes of:

Make sense of the data we have Help build out our data systems Help us be data driven/ run experiments Grow the business Education/Certifications that may or may not be relevant to anything

It’s usually quite likely that they don’t really have a full understanding of what they need. There’s just a generalized sense of “we have data, it seems useful, but we don’t have anyone who has the skills to make it useful.”

In practical terms, there are 2 big things someone taking this position needs to do in parallel:

Help the company succeed TODAY

Set up the company to be data driven TOMORROW

Helping the Company Succeed TODAY

Startups are surrounded by uncertainty. They’re not sure who their customers are, production systems can be dodgy, they don’t know what their customers are doing with the product, they don’t know how to make decisions using data they have, they don’t know if the data they have is useful.

Smart answers to the questions lead to smarter decisions and hopefully that mythical hockey-stick growth everyone dreams of. The problem is that most of those issues don’t lend themselves to fancy methods. The useful ones are often a century old and/or based on qualitative methods instead of quant.

Most DS methods are most powerful when optimizing an existing process, they’ll get you, 5%, 10%, even 25% growth on things like customer acquisition, conversion, retention and spend. A/B testing, recommendation systems, ML classifiers, all of them help to optimize. The gains are real, quantifiable, and can be significant, but early on there’s likely bigger fish to fry.

The biggest impacts early on often involve insights. Insight changes what the company fundamentally does. They come from very pedestrian things like research into user preferences/behavior that uncovers a new marketing concept for sales folk, or helping product teams realize that the most vocally hated feature on Twitter is actually used by 90% of paying customers and they shouldn’t drop it for no reason.

My view of this “help the company now” role is that the data person is a force multiplier. People within the business have problems, the job is to help them solve them.

Being the 1st “Data Person” = Being a “Scientist with data”

Being a scientist to me means that you have a problem, a research question, and you use whatever methods you can to come to a solid answer to that question.

As data scientists, we tend answer questions using quantitative methods and data collected from systems, but that’s not the only path to insight. Sometimes you flat out observe or ask users (qualitative methods), or you go out and collect data (experiments and surveys), or you watch others (competitive analysis). A good scientist doesn’t define themselves by their methods, and neither should the 1st data person (or any subsequent data person).

The goal is to answer the pressing business needs: “Why is no one using our product?” “How come our returns are so high?” Should we be running this expensive sale or not?” “What drives customer churn?” “What’s a customer’s lifetime value, what drives that?”

Becoming Data Driven TOMORROW

A common trap I see are people who come out of Data Science programs joining these positions expecting to be using sexy things like Spark and applying RNNs to their work. But sadly, they want to live on top of a mountain of foundation work that needs to be done first, both from an engineering standpoint and from cultural standpoint. The mismatch is brutal.