In a previous post we looked at the popular Hosmer-Lemeshow test for logistic regression, which can be viewed as assessing whether the model is well calibrated. In this post we'll look at one approach to assessing the discrimination of a fitted logistic model, via the receiver operating characteristic (ROC) curve.

Before discussing the ROC curve, first let's consider the difference between calibration and discrimination, in the context of logistic regression. As in previous posts, I'll assume that we have an outcome , and covariates . The logistic regression model assumes that:

The model parameters are the regression coefficients , and these are usually estimated by the method of maximum likelihood.

Good calibration is not enough

For given values of the model covariates, we can obtain the predicted probability . The model is said to be well calibrated if the observed risk matches the predicted risk (probability). That is, if we were to take a large group of observations which are assigned a value , the proportion of these observations with ought to be close to 20%. If instead the observed proportion were 80%, we would probably agree that the model is not performing well - it is underestimating risk for these observations. The comparison between predicted probabilities and observed proportions is the basis for the Hosmer-Lemeshow test.

Should we be content to use a model so long as it is well calibrated? Unfortunately not. To see why, suppose we fit a model for our outcome but without any covariates, i.e. the model:

This (null) model assigns every observation the same predicted probability, since it does not use any covariates. The estimate of the single parameter will be the observed overall log odds of a positive outcome, such that the predicted value of will be identical to the proportion of observations in the dataset.

This (rather useless) model assigns every observation the same predicted probability. It will have good calibration - in future samples the observed proportion will be close to our estimated probability. However, the model isn't really useful because it doesn't discriminate between those observations at high risk and those at low risk. The situation is analogous to a weather forecaster who, every day, says the chance of rain tomorrow is 10%. This prediction might be well calibrated, but it doesn't tell people whether it is more or less likely to rain on a given day, and so isn't really a helpful forecast!

As well as being well calibrated, we would therefore like our model to have high discrimination ability. In the binary outcome context, this means that observations with ought to be predicted high probabilities, and those with ought to be assigned low probabilities. Such a model allows us to discriminate between low and high risk observations.

Sensitivity and specificity

To explain the ROC curve, we first recall the important notions of sensitivity and specificity of a test or prediction rule. The sensitivity is defined as the probability of the prediction rule or model predicting an observation as 'positive' given that in truth ( ). In words, the sensitivity is the proportion of truly positive observations which is classified as such by the model or test. Conversely the specificity is the probability of the model predicting 'negative' given that the observation is 'negative' ( ).

Our model or prediction rule is perfect at classifying observations if it has 100% sensitivity and 100% specificity. Unfortunately in practice this is (usually) not attainable. So how can we summarize the discrimination ability of our logistic regression model? For each observation, our fitted model can be used to calculate the fitted probabilities . On their own, these don't tell us how to classify observations as positive or negative. One way to create such a classification rule is to choose a cut-point , and classify those observations with a fitted probability above as positive and those at or below it as negative. For this particular cut-off, we can estimate the sensitivity by the proportion of observations with which have a predicted probability above , and similarly we can estimate specificity by the proportion of observations with a predicted probability at or below .

If we increase the cut-point , fewer observations will be predicted as positive. This will mean that fewer of the observations will be predicted as positive (reduced sensitivity), but more of the observations will be predicted as negative (increased specificity). In picking the cut-point, there is thus an intrinsic trade off between sensitivity and specificity.

The receiver operating characteristic (ROC) curve

Now we come to the ROC curve, which is simply a plot of the values of sensitivity against one minus specificity, as the value of the cut-point is increased from 0 through to 1:

A model with high discrimination ability will have high sensitivity and specificity simultaneously, leading to an ROC curve which goes close to the top left corner of the plot. A model with no discrimination ability will have an ROC curve which is the 45 degree diagonal line.

Plotting the ROC curve in R

There are a number of packages in R for creating ROC curves. The one I've used here is the pROC package. First, let's simulate a dataset with one predictor x: