Data Visualization, a key skill of the For Product function

Data science is a broad term and includes job roles with many different functions within organizations. This post presents a way of categorizing different job functions in the data science landscape, and identifies useful skills for each of these specializations.

Here are the four key functions of data science that I’ve experienced in my career, with job titles often associated with these roles:

For Product (Inference Scientist)

As Product (Applied Scientist)

For Operations (Systems Scientist)

As Operations (ML Engineer)

There’s two dimensions that I’m using to classify job functions:

“For” vs “As”: Is the data scientist supporting a team that is building something (For), or building something themselves (As)?

Is the data scientist supporting a team that is building something (For), or building something themselves (As)? Product vs Operations: Is the data scientist building something that is customer facing (Product) or a backend system that is critical to running the business (Operations)?

The key distinction for the first dimension is whether or not the data science team is building data products. If a data science team owns part of a system, then the team is responsible for implementation of findings. If the team works with another group that performs the implementation of findings, then data science is in a support role. The second dimension differentiates whether the output of the data science team is directly visible to customers. For example, the Netflix recommendation system is customer facing, since it impacts the titles and artwork shown to users, while a fraud detection system may be critical to running an online game, but not visible to users.

I’ve worked in many of these different functions myself: at Twitch I was embedded on the mobile product team, and had a product analytics focused role (For Product), at Windfall Data I have an applied science role focused on building customer-facing data products (As Product), and at Twitch I supervised a scientist focused on forecasting operational metrics of the platform, such as page-load times (For Operations). I haven’t worked in the As Operations function yet, but the most common example I’m aware of is ad bidding systems that companies such as Quantcast and Pinterest use.

Data Science for Product

This is the most common category of data science roles that I’ve experienced in the gaming industry. At Daybreak Games, EA, and Twitch, many data scientists had analytics focused roles that supported product managers or game producers. Many of these data science teams aspired to build data products, but didn’t have the tooling and infrastructure in place to own data products themselves. I’ve also seen this type of role referred to as inference data scientist or decision scientist.

One of the key responsibilities of this role is to provide insights to teams, which are then used to improve products and company roadmaps. This can include high-level analysis around strategy, or more tactical analysis on the performance of a specific product. Performing well in this role usually requires the following skills:

Exploratory Analysis: This involves using scripting and SQL to explore and summarize data sets and answer questions such as: can we identify which behavior is important to track for monitoring product health, and can we identify which factors are correlated with this behavior?

This involves using scripting and SQL to explore and summarize data sets and answer questions such as: can we identify which behavior is important to track for monitoring product health, and can we identify which factors are correlated with this behavior? Experimentation: If the product team makes a change, how do you evaluate the impact? This can include A/B testing and staged rollouts.

If the product team makes a change, how do you evaluate the impact? This can include A/B testing and staged rollouts. Influence: If the data science team is constantly working on ad-hoc questions about the data, rather than having some autonomy to find useful insights, this role can become more of a business intelligence function. Successful data scientists in this role are able to get buy-in from teams to operationalize their findings in products.

Having strong written and verbal communication is also important for all of these data science functions. It’s particularly useful for the product support function, in order to build influence with other teams.

Data Science as Product

This is another data science role focused on improving products, but the distinction from the previous function is that one of the key outputs is data products that power customer-facing products. At Twitch, the applied science team fit this function, and used machine learning to build products such as the Champion Detector for League of Legions.

Job titles for this function may include applied scientist or machine learning engineer. It’s also a role that often reports into an engineering manager rather than an analytics or science manager. Here are some of the skills that are useful for this type of role:

Machine Learning: While predictive modeling is a prerequisite for all data science functions, this role requires more hands-on experience working with different types of data sources including text, images, and video. It also requires knowledge of how to scale these predictive models.

While predictive modeling is a prerequisite for all data science functions, this role requires more hands-on experience working with different types of data sources including text, images, and video. It also requires knowledge of how to scale these predictive models. Prototyping: It’s useful to be able to build MVPs of data products before allocating a significant portion of resources to building out a system.

It’s useful to be able to build MVPs of data products before allocating a significant portion of resources to building out a system. Software Engineering: Building data products that scale requires knowledge of system programming languages that can be deployed in distributed environments. Code for data products needs to be robust and maintainable.

Data products are usually live systems, and data scientists in this function need the knowledge to be able to scale predictive models.

Data Science for Operations

This is a function that was in its infancy while I was at Twitch. The key responsibility of this position was to understand how different factors influence operational metrics of our products, such as page-load times. We labeled this role as a systems scientist, because it required building a deep understanding of our infrastructure and the various factors that could influence various system metrics.

This particular role was focused on root-cause analysis of degregations to system performance, but the broader focus of this function is building models to better understand how various internal and external factors impact systems. It requires the following skills:

System Infrastructure: Understanding how different factors influence operational metrics requires intimate knowledge of the systems and infrastructure used to build products. For example, tracking page loads requires knowledge of CDNs, caching, and API call dependencies.

Understanding how different factors influence operational metrics requires intimate knowledge of the systems and infrastructure used to build products. For example, tracking page loads requires knowledge of CDNs, caching, and API call dependencies. Forecasting: In order to detect anomalies in metrics, it’s necessary to establish baselines and expected behavior. Forecasting can be used to model the factors that influence system behavior.

In order to detect anomalies in metrics, it’s necessary to establish baselines and expected behavior. Forecasting can be used to model the factors that influence system behavior. Alerting: This role may also be responsible for identifying when to alert other teams about anomalous system behavior. It’s important to be able to set thresholds for when to alert teams, without too many false positives.

Systems science is a newer job function, and requires much more knowledge of infrastructure than other data science roles.

Data Science as Operations

This is a data science role that is usually part of an engineering team, where the goal is to build data products that are required to run the business that are not customer facing. Building automated ad bidding systems is one example of this role, and building fraud detection systems is another. The main difference from the data science as products group is that these systems tend to be much more automated. For example, an ad bidding system may use the same system for training and production, due to the scale of data and real-time requirements, while customer-facing data products can often be prototyped and iterated-on at a smaller scale.

Here are some of the skills are that are useful for this function:

Distributed Systems: This function requires knowledge of building distributed systems, which may include Spark or other cloud technologies for scaling out processes. At Windfall we use Cloud DataFlow.

This function requires knowledge of building distributed systems, which may include Spark or other cloud technologies for scaling out processes. At Windfall we use Cloud DataFlow. Online Learning: The real-time requirements of these systems usually mean that batch learning process are not appropriate, and instead online methods for updating models need to be leveraged.

The real-time requirements of these systems usually mean that batch learning process are not appropriate, and instead online methods for updating models need to be leveraged. DevOps: Building data products that run business functions means maintaining these systems, and this is the data science role with the most DevOps responsibility.

This is a function I don’t have experience with, but view it one of the most valuable roles in data science.

Conclusion

Data science terms may perform a variety of different functions. It’s important to have a clear charter for the team, so that you’re able to hire appropriately and support the needs of the organization. These functions require different skills sets, and a data scientist’s preferred function may change over the course of their career.