"How much longer do I have, doc?"

OK, I've never had a patient actually ask me that, but I think about the question frequently when I'm watching poorly-scripted doctor shows on TV, applying for life insurance, or reading a medical paper which employs the near-ubiquitous Cox proportional hazards model.

If you haven't guessed, we're talking about survival today. Something that, I would wager, many clinicians (and patients) are particularly interested in. At first it seems like a simple proposition – how long does the average person live beyond a certain point? That point could be an age, a diagnosis, or even a calendar date. Of more interest, perhaps, is whether a person receiving a certain treatment will live longer than if they hadn't received that treatment.

But it turns out that survival analysis is really complicated. It's also complicated for a kind of beautiful reason, but I'll get to that.

To avoid making this article super-depressing, we're not going to look at people. We're going to look at something much more important.

TV shows.

You see, TV shows have a lifespan too. Some blaze out all too quickly (I'm looking at you, Firefly), while some struggle into a long senescence (General Hospital is still on? Really?).

To get a sense of TV lifespans, I used data that the good folks at IMDB.com make available (thanks to u/judasblue on reddit for help with this). Here is what the average lifespan of all TV shows (for which I could find reliable start and end-dates) looks like:

Figure 1: Lifespan of 19,031 TV Shows in the IMDB.com index. Sadly, 94% of these shows are no longer with us.

If you've been reading these posts regularly, you'll see a problem right away. This distribution is not "normal" – it's not a nice, smooth, bell-shape. Why not? Why isn't there some average duration of a TV show (say, 5 years), and some last a bit longer and some last a bit shorter? It turns out, and here's the beautiful part: survivors survive.

We see the exact same pattern when we look at human survival times.

Whatever your diagnosis is -- even if your diagnosis is "being born" -- the longer you live, the more likely you are to live longer still. This is why the life-expectancy of a U.S. infant boy born in 2015 is 83 years, but that same boy will be expected to live to 89 years once he makes it to age 70. Survivors survive.

Dealing with Survival Data

So survival data is weird. What that figure tells us is that a phenomenal amount of TV shows burn out early, while very few can go the distance.

Let's try to figure out how long we can expect a show to stay on, given how long it has already been on. It turns out, the insurance people had this all worked out way before the biostatisticians did. They call it a life table.

Table 1: Most shows get cancelled before their first birthday. If you can make it to year 1, you've got better than even odds of getting renewed.

We can graph this "chance of making it" to create a picture called a hazard function:

Figure 2: The greatest risk to a TV show is in the first few years. Make it past that point and your chance of making it one additional year is not too bad.

Hazard functions are strange, scary, and (relevant to a statistician) hard to describe mathematically. But it turns out it doesn't matter.

I'll say it again. It doesn't matter.

A very brilliant statistician named David Cox made the key realization here. If you are looking at two groups, you can express the difference in risk of outcome by the RATIO of their two hazard functions. What's cooler – a given intervention (say a new drug) will often change the hazard function of a population by a fixed amount. That is to say, the drug may reduce the risk of death by 50% at any time-point. That's why we can say taking drug X has a "Hazard Ratio" of 0.5 compared with placebo. We don't even need to know the underlying hazard functions, the ratio is stable! What's more, that ratio can be interpreted just like the somewhat-less-daunting "relative risk" -- it's just a relative risk that keeps up with time passing (technically that makes it a relative rate).

Let's bring this to something more practical than saving lives though. Let's say I want to know what factors influence the longevity of a TV show. For this, I can't use our ~19,000 show database; all it contains is the start and stop dates. For this analysis, I used the Wikipedia article entitled "List of longest-running United States television series." Obviously, this is a biased sample (after all – it's the shows that ran for a minimum of 10 years – but for illustrative purposes, it's kosher.

Here's the underlying hazard function for this dataset. ..

Figure 3: Again we see that the risk of being cancelled decreases over time. The datapoints are a bit helter-skelter here since we are examining fewer shows â only 207 in this dataset.

The data looks nicer if we look at the cumulative survival. This is the familiar Kaplan-Meier graph, and instead of asking how much longer a show can survive given that it survived to a particular point, it shows us how many shows are left after a certain number of years. Since (with rare exceptions) shows can't be resurrected, the survivor function is continually downtrending ...

Figure 4: A Kaplan-Meier showing the number of TV shows left on the air after a given number of years. Note that the curve is flat up to ten years. This is because of the dataset we used â only TV shows on the air for more than 10 years were considered.

So let's figure out WHY shows get cancelled. What factors associate with cancellation? That Wikipedia page doesn't give us too much info, but it does give us the network the show was on. I broke the data down into CBS (46 shows), ABC (23 shows), NBC (31 shows), and other (99 shows).

Breaking the data down into four Kaplan-Meier Curves, we get this:

Figure 5: Sorry, NBC.

We can use Cox (yes, that Cox) regression to obtain the hazard ratios between the big networks and "other." We find that, compared with "other network," ABC shows have about a 1.8 times risk of cancellation, CBS shows a 2x risk of cancellation, and NBC about 2.5 times. Note that these factors are averages over the whole time period; it might be better to be on ABC earlier in a show's life and NBC later. This change in hazard ratio over time violates the "proportional hazards" rule, which is something you should look for in any analysis like this. A hint is that the Kaplan-Meier curves cross each other.

Censoring

If you're still with me, you might be wondering how we handle shows that didn't start 60 years ago, but still haven't been cancelled, like "The O'Reilly Factor" which (can this possibly be true?) has been running continuously for 18.5 years. Do we just pretend Bill got cancelled at the 18 year time point? Of course not – that's not fair, even to Bill.

In human terms, imagine we're studying a new cancer drug. We give it (or placebo) to people at "Time 0" and see how long they live. But realistically, we're not going to give 1,000 people the drug on the same day. We enroll in dribs and drabs. Some people have longer follow-up, some people have shorter – like this:

Figure 6: Time-span of some selected shows. How do we handle shows that are still going?

We want to use ALL the data, though. How do we manage?

We "censor" individuals at loss to follow-up. That means we include the data we have, but don't assume they were cancelled. In fact, we assume that their risk of being cancelled is the same as shows that we have more data for.

So when we censor data, the statistical models give us results as if the censored data point would have experienced the same effect as all the other datapoints, had it not been censored. In the TV example, our statistics on the rate of TV show loss would assume that a censored show had the same risk of loss as one that we continued to have data for. This is probably reasonable. It may not always be reasonable in humans.

I wrote at the beginning that survival analysis is kind of beautiful. In the end, maybe the best thing to take from this article is not the nitty-gritty details of how these analyses are done, but rather to remember, when life is tough and you're feeling low, that if you can hold on just one more day, you're that much more likely to hold on one day more. Survivors survive. So keep on surviving.

The Methods Man is F. Perry Wilson, Assistant Professor of Medicine at the Yale School of Medicine. He earned his BA from Harvard University, graduating with honors with a degree in biochemistry. He then attended Columbia College of Physicians and Surgeons in New York City. From there he moved to Philadelphia to complete his internal medicine residency and nephrology fellowship at the Hospital of the University of Pennsylvania. During his research time, Wilson also obtained a Masters of Science in Clinical Epidemiology from the University of Pennsylvania. He is an accomplished author of many scientific articles and holds several NIH grants. If you'd like to see more of his work, please visit him at www.methodsman.com or follow @methodsmanmd on Twitter.