
Logistic Regression
February 11, 2026
Concepts: what logistic regression is and how classification is evaluated
R coding labs: fitting models, average marginal effects, predicting probabilities, tuning thresholds, and reporting results
By the end of this mini-unit, you should be able to:
Many problems are naturally yes/no (0/1):
We want:
Let the linear combination of predictors be
\[ z_i = \beta_0 + \beta_1 x_{1,i} + \cdots + \beta_k x_{k,i}. \]
Logistic regression models the probability
\[ \Pr(y_i = 1 \mid x_i) = G(z_i) = \frac{\exp(z_i)}{1 + \exp(z_i)}. \] That is, the logistic function \(G(z_i)\) maps the linear combination of predictors to the probability that the outcome \(y_{i}\) is \(1\):
\[ z_{i} \rightarrow \Pr(y_i = 1 \mid x_i). \]
Interpretation:
The function \(G(z_i)\) is called the logistic function because the function \(G(z_i)\) is the inverse function of a logit (or a log-odd) of the probability that the outcome \(y_{i}\) is 1. \[ \begin{align} G^{-1}(z_i) &\,\equiv\, \text{logit} (\text{Prob}(y_{i} = 1))\\ &\,\equiv \log\left(\, \frac{\text{Prob}(y_{i} = 1)}{\text{Prob}(y_{i} = 0)} \,\right)\\ &\,=\, b_0 + b_{1}x_{1,i} + b_{2}x_{2,i} + \,\cdots\, + b_{k}x_{k,i} \end{align} \]
Logistic regression is a linear regression model for log odds.
\[ \text{Deviance} = -2 \log(\text{Likelihood}) + C, \] where \(C\) is constant that we can ignore.
\[ G(b_0 + b_{1}x_{1,i} + b_{2}x_{2,i} + \,\cdots\, + b_{k}x_{k,i}) \]
is the best possible estimate of the binary outcome \(y_i\).
Logistic regression finds the beta parameters that maximize the log likelihood of the data, given the model, which is equivalent to minimizing the sum of the residual deviances.
\[ \text{AIC} = -2\,\log L + 2k \qquad\qquad \text{BIC} = -2\,\log L + k\log(n) \]
“Is the improvement in fit worth the extra complexity?”
Rule: Lower AIC/BIC is better.
\[ \begin{align} \text{Prob}(y_{i} = 1) &\,=\, G( b_0 + b_{1}x_{1,i} + b_{2}x_{2,i} + \,\cdots\, + b_{k}x_{k,i} )\\ \text{ }\\ \Leftrightarrow\qquad \log\left(\dfrac{\text{Prob}( y_i = 1 )}{\text{Prob}( y_i = 0 )}\right) &\,=\, b_0 + b_{1}x_{1,i} + b_{2}x_{2,i} + \,\cdots\, + b_{k}x_{k,i} \end{align} \]
A one-unit increase in \(x_k\) changes the log-odds by \(\beta_k\).
For a binary predictor, it’s relative to the reference category.
In logistic regression, the coefficient \(\beta_k\) is not a constant change in probability but the log-odds.
The marginal effect on the probability depends on \(z_i\):
\[ \frac{\partial p_i}{\partial x_{k,i}} = \beta_k \; p_i (1-p_i). \]
So the same \(\beta_k\) can imply different probability changes for different observations.


The receiver operating characteristic curve (or ROC curve) plots both the true positive rate (recall) and the false positive rate (or 1 - specificity) for all threshold levels.
fail = TRUE when safety = low, andfail = FALSE when safety ≠ low,safety.fail = TRUE for all cars with safety = low,fail is mixed for safety = med or high.