Executive Summary

CI/CD and DevOps efforts at Punchh, just like most organizations, happened organically with no clear roadmap or strategy. While this worked in the short term, the Punchh team realized that it wasn’t a scalable, efficient, or repeatable approach. Their CI/CD process was slow and people-dependent, with extensive automation code and manual steps that cobbled together a fragmented toolchain.

Aditya Sanghi, the Punchh CTO, realized that he had to find an approach that would provide them with economies of scale as their customer base grew. His search for a CI/CD platform led him to Shippable, where he found a team that was willing to partner with him and provide him with a complete solution.

Using Shippable, Punchh achieved Continuous Delivery across all their services in less than 1 month, while shrinking their automation code by 80%, and increasing deployment speed by 5x.

Customer

Founded in 2010, Punchh helps restaurants and other retailers acquire and engage customers, predict consumer behavior, and increase sales by executing tailored marketing campaigns. Leading global chains in the restaurant sector rely on Punchh to grow revenue by building customer relationships at every stage, from anonymous, to known, to brand loyalists.

Punchh serves more than 100 enterprise customers, representing $12B+ in annual spend.

Challenge

For compliance and security reasons, the Punchh team deploys and operates a dedicated instance of its marketing platform for each customer. This means that for each new customer, they create a new AWS account, several regions within the account, and infrastructure for each region. They also create/update deployment pipelines for their 40+ microservices to deploy across all regions in this new account.

When they started out, the DevOps team at Punchh cobbled together a process that consisted of many different tools like Jenkins, CircleCI, and Amazon services (Lambda, ECS, and ECR) for CI and deployments, CloudFormations for infrastructure provisioning, and several homegrown bash scripts and manual steps. While this approach worked for a first few customer installations, they quickly realized that their approach did not scale as Punchh’s business grew rapidly, mainly for the following reasons:

Each pipeline was custom-built, with very little reusability. This meant that adding new pipelines or changing existing ones was time-consuming and prone to errors and drifts.

Each pipeline was custom-built, with very little reusability. This meant that adding new pipelines or changing existing ones was time-consuming and prone to errors and drifts. People-dependent manual steps led to many inadvertent mistakes that hurt reliability and increased risk of churn. It also introduced delays while waiting for manual input.

People-dependent manual steps led to many inadvertent mistakes that hurt reliability and increased risk of churn. It also introduced delays while waiting for manual input. Secrets weren’t managed centrally and not abstracted from automation scripts. Adding or updating secrets was error prone and introduced a potential security risk.

Secrets weren’t managed centrally and not abstracted from automation scripts. Adding or updating secrets was error prone and introduced a potential security risk. Due to several tool limitations, they needed to run dedicated instances of Jenkins and maintain CloudFormation config per customer. Since these are UI-driven tools, keeping configuration in sync across accounts and regions was a nightmare to scale as number of accounts grew.

Due to several tool limitations, they needed to run dedicated instances of Jenkins and maintain CloudFormation config per customer. Since these are UI-driven tools, keeping configuration in sync across accounts and regions was a nightmare to scale as number of accounts grew. The fragmented toolchain required a steep learning curve for new developers joining the team, and there was a lot of tribal knowledge one needed to know in order to deploy anything.

Aditya Sanghi, the Punchh CTO, was frustrated that the current CI/CD process lacked economies of scale. “We lacked a clear CI/CD strategy and our automation was too people-dependent and custom as a result. Onboarding new customers or making changes to existing services required a lot of custom work and our environments and configs were prone to drifts. We needed a way to keep up with a rapidly growing customer base without burning out the team.”

Aditya realized they need to start over again and take a much more systematic approach to solving this problem. At the same time, he was worried that diverting resources to implementing CI/CD would be time-consuming, disruptive, and slow down core product development. It was a classic catch-22.

Researching CI/CD platforms

Aditya investigated several CI/CD platforms before choosing Shippable. He wanted to find one that eliminated manual steps, required very little custom coding, and could be evolved easily to meet future requirements.

Shippable satisfied all evaluation criteria and stood out because we took the approach of “solving the problem” as opposed to just “selling a platform”. The team was willing to actively partner with Punchh and not only offer a complete solution that leveraged the Shippable platform, but also go much beyond that by providing services to help redesign and implement Punchh’s CI/CD workflows with an “everything as code” philosophy while keeping scalability, consistency, and simplicity in mind.

Solution

The Shippable team worked with Punchh to design and automate CI/CD and infrastructure provisioning workflows, as well as define best practices and processes for managing deployments. In less than 4 weeks, the team leveraged the Shippable platform to achieve the following:

Toolchain was streamlined to include Slack for chatops, Shippable for CI/CD and infrastructure automation workflows, and Terraform as the infrastructure definition language.

Toolchain was streamlined to include Slack for chatops, Shippable for CI/CD and infrastructure automation workflows, and Terraform as the infrastructure definition language. All CI/CD and infrastructure provisioning workflows were implemented as-code with a YAML-based configuration, which was stored in Github and versioned.

All CI/CD and infrastructure provisioning workflows were implemented as-code with a YAML-based configuration, which was stored in Github and versioned. The Shippable team built streamlined automation templates, using common functions and minimal custom config. The Punchh team easily scaled this across AWS accounts and regions.

The Shippable team built streamlined automation templates, using common functions and minimal custom config. The Punchh team easily scaled this across AWS accounts and regions. Secrets were managed at an organizational level and secured with RBAC. They are encrypted at-rest and in-flight and completely abstracted from automation scripts and logs.

Secrets were managed at an organizational level and secured with RBAC. They are encrypted at-rest and in-flight and completely abstracted from automation scripts and logs. Docker build process was improved with a two-step process that only built incremental changes.

Docker build process was improved with a two-step process that only built incremental changes. Permissions could be configured at a workflow or job level to provide visibility across the team but allow execution access on an as-needed basis.

Permissions could be configured at a workflow or job level to provide visibility across the team but allow execution access on an as-needed basis. Automated infrastructure provisioning was implemented with Terraform templates as the definition language and Shippable as the automation engine.

As a result of these changes, Punchh achieved Continuous Delivery with streamlined, interconnected CI/CD workflows spanning development, deployment, and infrastructure provisioning activities.

Results

With Shippable, the Punchh team achieved zero-touch automation with a streamlined toolchain in less than 1 month. Some highlights of the new CI/CD automation workflows are:

Customer on-boarding takes less than one day, during which the Punchh team team can bring up a brand new instance of their service. This was reduced from 2-3 weeks of effort.

takes less than one day, during which the Punchh team team can bring up a brand new instance of their service. This was reduced from 2-3 weeks of effort. Deployment reliability has gone up 80% , and time spent in debugging release issues has reduced by 4x.

has gone up 80% , and time spent in debugging release issues has reduced by 4x. Automation code reuse has gone from ~10% to 95%.

has gone from ~10% to 95%. Adding new workflows or changing existing ones takes less than half a day, down from 2-3 weeks.

takes less than half a day, down from 2-3 weeks. Ramp up time for new developers has gone from months to less than 1 week.

for new developers has gone from months to less than 1 week. Docker build performance has improved 400%, from 120 mins/day to 30 mins/day.

The Punchh team completely owns their CI/CD automation workflows and deployments. Due to reusability, they have reduced their automation codebase by 80% and can easily scale it horizontally and vertically.

Aditya Sanghi summarized the impact of this CI/CD transformation: “I love that every service is now deployed in a consistent fashion and the complexity and quantity of automation code has gone down significantly. Anyone from our team can now deploy with confidence and we have greater visibility into our builds and deployments. CI/CD is no longer the bottleneck slowing us down. “

Conclusion

If your organization is in the process of automating CI/CD and/or infrastructure automation workflows, you need to take a template-focused approach that makes your workflows scalable and consistent. It is worth exploring a relationship with a vendor that can provide a holistic and systematic approach to automation that goes beyond a platform or toolchain.