Trust is a funny thing. One day you’re on top of the world, the next you can do no right. The recent election has obviously shaken trust in the analytics establishment. While there are many reasons why this might have happened, an untrusted analysis is an unused one, regardless of the quality. So how does one go about building, or rebuilding (as the case may be), trust in the face of challenges and failure?



If you’ve been following us at Clover, you’ve probably noticed that we approach data science as a human problem. Our goal is to help humans at every level make better decisions — either through analysis or data products. And what’s the cornerstone of solving human problems? Trust. Whether it’s a result generated by a team member, our team as a whole, or a system we’ve designed — all of our data consumers, from executive leaders, nurse practitioners and wellness managers, to our call center agents and provider services reps — need to trust in the output. If they don’t, they won’t use it.



So how do we build trust? The easy answer is by producing high quality work. The hard part is how you get there.



In our experience, high quality work is iterative. If you’re doing things right, the quality you strive for is never where you’d like it to be at the beginning, but improves over time. In order to get the quality where you need it, you must have processes that enable continuous improvement, both in your people and in your systems. We do this with the three Ts: Training, Testing, and Transparency.

First, the Failures

And we’ve had our share:

We miscalculated a core business metric over and over again. The metric in question had a ton of edge cases and many unknowns in the calculation. We made every error in the book, from misinterpreting data to typos in the code to copy-and-paste mistakes. Needless to say, our executive team was not impressed. A model we shipped in one of our internal applications was generating results that were counterintuitive to our team members who were using it. Essentially, there was a mismatch between what the data predicted and what they could actually achieve in the field. The team’s response to this was to rebel against using the software. We reversed the rank ordering of a queue in one of our workflow tools due to some faulty assumptions at the interface between the data pipeline loading the work list and the application where the workflow was embedded. Our operations team tried to flag symptoms of this in several ways but weren’t using the level of specificity our technical teams were used to. The underlying problem persisted for 7 months before discovery (even though the users knew it was broken!).

In the first case, we eventually got it right after much pain. In the second case, we rolled the model back. After the third… well, we had a giant hole to dig ourselves out of. All of these situations cemented that we needed to do better. So how did we do it?

Training

No one is a finished product. From new junior hires to an experienced lead we all have things we need to work on. Creating space and processes for improvement and skills development for our team members is one element of our trust building strategy. People may need improvement on several dimensions:

Communication skills , meaning a person’s ability to translate their thoughts to others. Our company is home to a large group of smart, diverse, motivated people. While we are all trying to do right by the company and our members, there are many strong opinions on how to execute against this. Aligning personalities and getting everyone marching in the same direction means that our team needs to be good at many communication styles and pathways.

, meaning a person’s ability to translate their thoughts to others. Our company is home to a large group of smart, diverse, motivated people. While we are all trying to do right by the company and our members, there are many strong opinions on how to execute against this. Aligning personalities and getting everyone marching in the same direction means that our team needs to be good at many communication styles and pathways. Methodological skills , meaning a person’s ability to translate their thoughts into math. As our operations matured, we reached the peak of what simple counts could give us (and they got us pretty far!), and started operating closer to the margins. More difficult distributional and model-based thinking became the key to solving posed problems, especially when examining thorny questions around skill ranking and experimental analysis with low sample sizes. Our team was tenacious at understanding data and extracting bulk value from it, but we didn’t all have the skill set needed to push to the next level.

, meaning a person’s ability to translate their thoughts into math. As our operations matured, we reached the peak of what simple counts could give us (and they got us pretty far!), and started operating closer to the margins. More difficult distributional and model-based thinking became the key to solving posed problems, especially when examining thorny questions around skill ranking and experimental analysis with low sample sizes. Our team was tenacious at understanding data and extracting bulk value from it, but we didn’t all have the skill set needed to push to the next level. Programming skills, meaning a person’s ability to translate their math into code. As our membership grew, so did our data — in both size and complexity. Edge cases we hadn’t considered started to appear, causing failures in our pipelines. Queries and calculations that had been running in a few minutes started to take hours. More and more code was duplicated in our repos. We don’t focus on technical skill in our hiring process, so none of this was a surprise to us, but we realized we needed to help our team members up-skill explicitly in this area.

We’ve developed and embarked on a long skills-development process, investing in our team members as they’ve invested in us. We produce many “How to” and “Data Scientists’ Guides to” documents, to give people background in concepts they may not have thought much about, such as versioning and working in branches, deploying code, and data modeling. We use code reviews as a tool to help improve the implementations in our code bases. We run a “lunch-and-learn” program, where we cover elements like unit testing, modeling methodologies, and pipeline implementations. And we work directly with individuals to help them hone their skill base in the dimensions they are looking to improve. Most importantly, we acknowledge that there is no forcing function as powerful as necessity, and so we ensure that team members consistently work on projects that stretch their skills and help them grow.



Seeing us put effort into improvement helps other teams build trust in us. By striving to be better we show others that we care about how our work affects their efforts and outcomes. So when something does go wrong, they know we’ll do what it takes to prevent that same mistake from occurring again.

Testing

Training improves our skills, but can’t solve all problems. Our data, and therefore our transforms, calculations, and algorithms, have high complexity associated with them. It is unreasonable to ask our humans to reason about this complexity and expect nothing to ever break, no matter how talented they may be. Fortunately, this is a place where computers excel.



We’ll be writing about this in more detail in an upcoming post, but over the past year we’ve added over 3,000 automated tests to our analytics pipelines. These include consistency tests, validations tests, and functional tests. At a high level, our tests allow us to build confidence that our system is working as expected:

Consistency tests check that the tables that underlie all of our infrastructure have a consistent dependency structure and that the schema will build. These tests prevent typos in column names or inconsistent table definitions from being added to a production run, which would prevent data from propagating through the production systems. (Kudos to everyone who caught the typo in the initial version of this- test everything!)

check that the tables that underlie all of our infrastructure have a consistent dependency structure and that the schema will build. These tests prevent typos in column names or inconsistent table definitions from being added to a production run, which would prevent data from propagating through the production systems. (Kudos to everyone who caught the typo in the initial version of this- test everything!) Validation tests check that the data in the production tables make sense, relative to acceptance conditions we’ve decided upon. They throw errors when data flowing through the production pipelines looks corrupted.

check that the data in the production tables make sense, relative to acceptance conditions we’ve decided upon. They throw errors when data flowing through the production pipelines looks corrupted. Functional tests check that each calculation or data transform is doing what we designed it to do. Akin to unit tests, these are the most labor intensive, as they require that the data scientist reason about the edge cases that could cause the calculation to fail and generate test data that encapsulate those edge cases. Over time, these catch regressions in the code as implementations are changed or assumptions violated.

We use pytest and testing.postgresql along with some custom structures put together by our data engineers to make this happen. All of our tests are run as part of a continuous integration process.



Through our tests, we know very quickly if something has broken in the course of our development efforts or through corrupt data. Copy-and-paste errors and typo bugs have been virtually eliminated, and we are able to see that our system as a whole is improving daily. This solves the critical problem of confirming what we’ve built works. This has reduced the number of failures in our production systems over time, critical for our teams that rely on them for their work.

Transparency

And yet, despite our best efforts, failures will still occur. In fact, specifically because of these efforts, the severity of these failures increases over time- the more robust our processes, the more systemic the error required to get through. Thus the third leg in our trust building strategy is transparency. Transparency around:

Who we are- our strengths and challenges, what we are focusing on improving, and how to best work with us.

we are- our strengths and challenges, what we are focusing on improving, and how to best work with us. Why we decided to build something and where it fits in the company roadmap.

we decided to build something and it fits in the company roadmap. What we have built and how it works- the assumptions that went into our efforts, the mechanics that produce the end result, and the validations that show us the feature is working.

And when a problem happens:

Where the failure occurred and what was affected.

the failure occurred and was affected. Why the failure occurred, who will fix it, and how we will mitigate the risk of future such events.

We do retros on all of our projects, even the successful ones, and share what we’ve learned. We dig into our failures to understand where our processes (not our people) have broken, and make sure that the company knows what was discovered. Mistakes and errors will happen, especially in a young company trying to do difficult things. Transparency helps our partner teams trust that we won’t hide when things go wrong, that we will ensure they are aware of system issues that affect their work, and that we will identify places where we need to improve as a team and work to make them better.

Our Results

We have seen a marked improvement in the quality of our work since we started emphasizing these programs. Pipelines fail less often, errors are caught earlier, and regressions have all but disappeared. Moreover, we have forged relationships across many different departments by delivering value more consistently, which has helped everyone to understand our point of view.



As always, your mileage may vary :-). And if this approach appeals to you, we’re hiring!