Tidepool is a nonprofit organization focused on delivering high quality diabetes software with a mission to make diabetes data more accessible, meaningful, and actionable. Tidepool believes in being radically transparent with their work so everything is done in the open. Everything from the Quality Management System, Employee Handbook, FDA pre-submission meeting minutes, and most importantly the open source code are all available to the public.

Tidepool open sources all of their software releases to their public GitHub repositories and encourages other companies to do the same with the belief that this improves public health and safety by enabling broader community inspection of, and improvement to, the operation and security of the software.

What Brought You to Tidepool

I’ve spent over 30 years in tech in various technical leadership roles and recently as the CEO of a cloud-native consultancy. I joined Tidepool because I believe in the mission to help people manage their diabetes because too many people suffer needlessly with this horrendous disease. Tidepool gives them tools to manage their disease. In this role, I am responsible for the design and implementation of all aspects of the migration, including architectural design, selection of components, and integration.

The Tidepool team

Challenge

A little over five years ago, Tidepool.org launched our first service enabling people with diabetes to upload and visualize their diabetes data, including insulin dosing, food intake, and blood glucose levels. Our legacy stack consists of a dozen or so microservices written in Node.js and Go. These microservices are deployed on Amazon EC2 instances constructed using AWS CloudFormation that are managed with Ansible and SSH access. Included in that infrastructure were custom built components like an API Gateway, Service Discovery System and Load Balancer.

Currently we are developing a new product that would drive even more load to the existing backend system. To deploy that service, we need a backend system that could support the end users we had today and be stable, maintainable and scalable to handle the new services and users that would use those services. The existing backend was out of date as the original engineers who wrote the tools were no longer with the organization, making adding new features and improving scalability or high availability impossible.

Selecting the Solution Components

In the last five years, Google open-sourced Kubernetes and many open source projects emerged to augment Kubernetes with feature rich, tested and supported infrastructure tools.

To be able to take advantage of the cloud native movement, we prioritized our investigation and evaluation to the following criteria:

Kubernetes native

Open source

Well maintained and adopted project

Low maintenance requirement

As a non-profit organization, finding best of breed yet inexpensive tools that do not require a lot of resources to maintain are a priority for our technology stack.

Tidepool Solution

We started the work of preparing our application services before this year, and in January 2019, began the work of replatforming the environment. Our technology selections within each area of our environment are articulated in the image below:

Tidepool platform stack before and after

Migrating to Kubernetes

Our project began before 2019 with the containerization of our dozen Node.js and Go microservices to familiarize our developers with containers. We started with Docker Compose and then shifted to Tilt for its flexibility and support for very fast cycle times for making source code changes and having those changes deployed in an existing, local Kubernetes environment.

In January 2019, we brought up our microservices in a local Kubernetes (minikube) environment and once we had the system up and running, looked at our options for running Kubernetes in the cloud. We selected AWS EKS to host our Kubernetes cluster and manage its control plane, in addition to our existing infrastructure already hosted on AWS. To that we added Weaveworks eksctl as the CLI tool to create and destroy our EKS clusters.

In this transition we adopted Weaveworks Gitops and Flux, a GitOps controller for Kubernetes that has recently become a CNCF sandbox project. This did a couple of things for us. First, we were able to eliminate the need for Ansible. We deploy several Kubernetes clusters and each cluster is managed with a separate GitOps configuration repo.

For managing secrets, our legacy system stores secrets in plaintext in a private GitHub repo. When we deploy a service with our custom provisioning tool, it grabs the secrets from the repo and copies them to a VM running our microservices. We eliminated this in our migration to Kubernetes and needed another mechanism, preferably one that we would not have to invent. In alignment with our focus on sustainability of the new platform and cost sensitivity, we chose GoDaddy External Secrets and Amazon Secrets Manager so that we can also share secrets with non-Kubernetes services running in our AWS infrastructure.

Connect and Secure with an API Gateway and Service Mesh

During this process we also looked at how we handled application traffic, access and security and replaced many homegrown tools or manual processes with new tools.

We replaced our custom-built API Gateway and selected Gloo API Gateway from Solo.io. We experimented with other solutions as well but chose Gloo for its elegant API, its exceptionally clean implementation, and the outstanding responsiveness of the Solo.io team to issues and requests for enhancements.

Specifically Gloo provided us a way to simplify our environment and meet HIPAA compliance

Gloo allowed us to simplify our DNS and Amazon ELB setup from multiple DNS entries and multiple ELBs for a single virtual service and migrate to one DNS entry and one ELB to support multiple virtual services. Gloo is one of the few gateways to support HTTP method-based routing to avoid conflicts that we would otherwise have had with path-based routing alone and enabled us to eliminate the use of multiple domain names.

Gloo allowed us to replace our inconsistent application-generated logs with the Gloo/Envoy access logs to meet HIPAA-compliance environment for our environment

Additionally for HIPAA compliance of the communication within the cluster, we shifted from using a single static TLS certificate for all microservices for inter-service communication to a service mesh with mutual TLS using Linkerd to provide a service mesh with a separate dynamically generated TLS certificate for each microservice. The Linkerd service mesh on AWS EKS was also integrated to our Gloo API Gateway. We chose Linkerd for its simplicity and stability

As we rolled the services out, we continued with the following changes to the environment

Address latent issues with microservices inter-dependencies

Replace custom load balancer and service discovery with Kubernetes services

Introduce feature flags

Implement Logging and dig into our noisy and extra quiet services

Continual refinement of the infrastructure with distributed tracing

New authentication and authorization system

The Future of Diabetes Management with Technology

Before there were literally dozens and dozens of issues in our tracking system that we have labelled technical debt and that no one realistically expected would be resolved. Many of those problems are now addressed by our Kubernetes efforts and many more will be easy to address with our open-source infrastructure. As we modify our API, rolling out those changes will be straightforward with Kubernetes and Gloo. We also plan to investigate Flagger for progressive delivery, a technique that would have been prohibitively cumbersome to implement without Kubernetes.

These savings may seem incidental or minor, but have a major impact on productivity. For a nonprofit healthcare startup with HIPAA and FDA quality requirements, productivity is already hard enough to achieve! But among the most important achievements of this project is our ability to scale quickly when the need arises.

Specifically, we are pleased with the outstanding support and responsiveness we have received from Solo.io. Our infrastructure depends on the Gloo API Gateway and we have submitted pull requests (#547, #734, #743, #949) that Solo.io has integrated and released faster than our own internal processes would allow! We have submitted requests for enhancements (e.g. #1062 for the Gloo RouteTable design, #814 access logging) and an occasional bug report which were acted upon quickly and expertly — plus Solo.io team is fun and easy to work with.

We don’t know how popular the release of our next product, Tidepool Loop, will be, but because people with diabetes will be relying on our services to be available, we can’t fail. The number of people with diabetes has risen from 108 million in 1980 to 422 million in 2014. The global prevalence of diabetes among adults over 18 years of age has risen from 4.7% in 1980 to 8.5% in 2014. If we are to have a serious impact on the lives of people with diabetes, our product must work and work well.

Learn More