The Sigmoid Function in Logistic Regression¶

In learning about logistic regression, I was at first confused as to why a sigmoid function was used to map from the inputs to the predicted output. I mean, sure, it's a nice function that cleanly maps from any real number to a range of $-1$ to $1$, but where did it come from? This notebook hopes to explain.

Logistic Regression¶

With classification, we have a sample with some attributes (a.k.a features), and based on those attributes, we want to know whether it belongs to a binary class or not. The probability that the output is 1 given its input could be represented as:

$$P(y=1 \mid x)$$

If the data samples have $n$ features, and we think we can represent this probability via some linear combination, we could represent this as:

$$P(y=1 \mid x) = w_o + w_1x_1 + w_2x_2 + ... + w_nx_n$$

The regression algorithm could fit these weights to the data it sees, however, it would seem hard to map an arbitrary linear combination of inputs, each would may range from $-\infty$ to $\infty$ to a probability value in the range of $0$ to $1$.

The Odds Ratio¶

The odds ratio is a related concept to probability that can help us. It is equal to the probability of success divided by the probability of failure, and may be familiar to you if you ever look at betting lines in sports matchups:

$$odds(p) = \frac{p}{1-p}$$

Saying, "the odds of the output being 1 given an input" still seems to capture what we're after. However, if we plot the odds function from 0 to 1, there's still a problem: