Media got it wrong: HealthCare.gov failed despite agile practices

This article has been updated to correct a reference to the typical duration of a sprint.

Many websites including the New Yorker, Washington Post and MedCity News have proclaimed that HealthCare.gov would not have failed if it had just used “modern” software practices, called agile development, just like commercial companies do. The convenient meme being that the poor, backwards government just is not as up to date as the commercial world.

Unfortunately, these sites must have been simply parroting the shtick of agile consultants because the HealthCare.gov front-end GUI and back-end data services hub were both developed using agile processes. It is a shame that harsh reality had to ruin a nice and convenient silver bullet.

Other views Why Agile can work for complex systems like HeathCare.gov Development of large, complex systems requires a method that also allows a progressive discovery of project scope and dependencies, along with identification and mitigation of key risks. Read more. Lesson from HealthCare.gov: A launch is no time for a beta test The past month of problems and de-bugging on the Affordable Care Act portal has not been so bad, if you think of it as a beta. Read more.

I have seen some of the developer documentation, and it clearly discusses sprints, user stories and incremental testing — all of which are hallmarks of an agile process. A sprint is a fixed duration of time (normally two weeks) where specific work is completed. A user story is a short description of a feature from the end-user’s perspective that follows this template: “As a <type of user> I want <to perform some task> so that I can <achieve some goal/benefit/value>”.

The bottom line is that those people who claimed that all would be well if HealthCare.gov had used an agile process are wrong. The reality is that the developers did use agile, and the project failed miserably. Before the agile practitioners, fans and consultants get in an uproar with the chant, “but they did it wrong,” let me examine some of the facts of what they did and compare them to some of my recent experiences in requirements analysis and the design of data integration hubs.

1. User stories vs. requirements decomposition. I am currently involved in a Defense-related big data project where we are analyzing and decomposing customer requirements into system requirements. The process we are using begins with a detailed set of customer requirements that include both functional requirements (what the software should do) and performance requirements (metrics it should meet). This is painstaking work, as we analyze, debate, discuss and finally decompose hundreds of end-user requirements. Sometimes this is done with the customer, and sometimes we generate requests for information to clarify a vague requirement. The goal is to generate a systems requirement document and then to further decompose those system requirements into software and hardware requirements. This is a robust process with dedicated and talented engineers that is working well. This will take a few months, but when it is done we will know exactly what we have to build.

I have done this process many times and if it is managed well, it works well. In contrast, the data services hub documentation refers to user stories for its requirements. Let’s think about that for a moment with our designer hats on. A complex back-end data services hub — a piece of software with zero actual, living, breathing end-users — has to be described in terms of “user” stories. Does something sound off-key to you? It should, because user stories are great for user interfaces but poor, confusing and often misconstrued for non-observable behavior. Evidence of this is widely available in the many debates and questions on sites like Stack Overflow, CodeProject and blogs. Yes, I’m sure you can bend and twist user stories to address non-user based functionality, but should you? In my opinion, a software requirements document with UML (Unified Modeling Language) use-case diagrams would have served the data services hub more effectively.

2. Design documents. For the state of Florida, I led a team in designing and developing a data integration hub that recently successfully passed its first milestone. The software design document was reviewed by the customer, Gartner, and the key software infrastructure vendor. Such due diligence has paid off in keeping us on time and within budget. In contrast, what are the agile design artifacts and what due diligence was undertaken to ensure the HealthCare.gov developers followed best practices? Again, we see a site that failed a test of 200 to 300 people when it was supposedly designed for 50,000 simultaneous users (and even that number is a gross underestimate). Where are the design and architecture documents that show how the system was built to be scalable? Oh, agile processes don’t like design documents … hmmm, that could make due diligence difficult (or even impossible).

3. Asynchronous vs. synchronous. Recently, CGI reported that the “hub services are intermittently unavailable.” In examining some of the Business Service Description (BSD) documents, we see that key interfaces (like verify income and verify citizenship) were designed as synchronous instead of asynchronous interfaces. This is strange because many frameworks and platforms, like Google Web Toolkit, Android, AWS Flow Framework, Play and many others, promote asynchronous calls as a best practice (and some mandate it).

Additionally, modern cloud-native applications built for massive scalability and elasticity should be based on loose-coupling, messaging and asynchronous calls. Frankly, for a site like this that requires high levels of reliability and scalability, a synchronous API design for the data hub is inexcusable. There was enough time, enough money and enough political muscle (as the president’s signature achievement) to get it right (even given intransigent partners).

Let me close by clarifying why I think agile is a good thing, even though I don’t agree with all its practices. Agile is part of the evolution of the software development process that gets some things right and some things wrong. I like many of the more moderate parts of agile, such as small iterations, test-driven development and refactoring. However, I advocate a more balanced, in-between approach, especially in relation to requirements and design.

I look at agile as a Stage 2 technology in accordance with Robert Heinlein’s three stages of technology: “Every technology goes through three stages: first, a crudely simple and quite unsatisfactory gadget; second, an enormously complicated group of gadgets designed to overcome the shortcomings of the original and achieving thereby somewhat satisfactory performance through extremely complex compromise; third, a final stage of smooth simplicity and efficient performance based on correct understanding of natural laws and proper design therefrom.”

The key point is that agile is a reaction to the waterfall method and, as with most reactions, the pendulum swung a bit too far. Thus we can expect a more moderated third-stage technology to get the balance right. In relation to HealthCare.gov, an agile process was implemented and the software was a national failure. This does not mean agile was the primary cause of that failure but it is not unreasonable to assume it played a part. My hope is that we can learn from this mess and through it forge a better software development process that strikes the right balance between the extremes.

Michael C. Daconta ([email protected]) is the Vice President of Advanced Technology at InCadence Strategic Solutions and the former Metadata Program Manager for the Homeland Security Department. His new book is entitled, The Great Cloud Migration: Your Roadmap to Cloud Computing, Big Data and Linked Data.