Classwork 5

Logistic Regression II - Quasi-Separation & Regularization

Author

Byeong-Hak Choe

Published

February 23, 2026

Modified

March 4, 2026

0.1 Packages

library(tidyverse)
library(broom)
library(stargazer)
library(margins)
library(yardstick)
library(WVPlots)
library(pROC)
library(glmnet)
library(gamlr)

0.2 Data

df <- read_csv('https://bcdanl.github.io/data/car-data.csv')

df |> 
  rmarkdown::paged_table()

0.3 Variable description

Variable Description
buying Buying price of the car (vhigh, high, med, low)
maint Maintenance cost (vhigh, high, med, low)
doors Number of doors (2, 3, 4, 5more)
persons Capacity in terms of persons to carry (2, 4, more)
lug_boot Size of luggage boot (small, med, big)
safety Estimated safety of the car (low, med, high)
rating Car acceptability (unacc, acc, good, vgood)
fail TRUE if the car is unacceptable (unacc), otherwise FALSE



1 Question 1

  • Divide the df DataFrame into training and test DataFrames.
    • Use dtrain and dtest for training and test DataFrames, respectively.
    • 70% of observations in the df are assigned to dtrain; the rest is assigned to dtest.



2 Question 2

Fit the following regression model:

\[ \begin{align} &\quad\;\; \text{Prob}(\text{fail}_{i} = 1) \\ &= G\Big(\beta_{0} \\ &\qquad\quad\;\;\; \,+\, \beta_{4} \text{buying\_med}_{i} \,+\, \beta_{4} \text{buying\_high}_{i} \,+\, \beta_{4} \text{buying\_vhigh}_{i} \\ &\qquad\quad\;\;\; \,+\, \beta_{4} \text{maint\_med}_{i} \,+\, \beta_{4} \text{maint\_high}_{i} \,+\, \beta_{4} \text{maint\_vhigh}_{i} \\ &\qquad\quad\;\;\; \,+\, \beta_{7} \text{persons\_4}_{i} \,+\, \beta_{8} \text{persons\_more}_{i} \\ &\qquad\quad\;\;\; \,+\, \beta_{10} \text{lug\_boot\_med}_{i}\,+\, \beta_{10} \text{lug\_boot\_big}_{i} \\ &\qquad\quad\;\;\; \,+\, \beta_{11} \text{safety\_med}_{i}\,+\, \beta_{11} \text{safety\_high}_{i} \Big), \end{align} \]

where \(G(\,\cdot\,)\) is

\[ G(\,\cdot\,) = \frac{\exp(\,\cdot\,)}{1 + \exp(\,\cdot\,)}. \]

Provide the summary of the regression result.

  • Set the reference levels accordingly.


3 Question 3

  • How are coefficient estimates?


4 Question 4

  • Calculate the followings:
    • Confusion matrix with the appropriate threshold level.
    • Accuracy
    • Precision
    • Recall
    • Specificity
    • Average rate of at-risk babies
    • Enrichment


5 Question 5

Visualize the variation in recall and enrichment across different threshold levels.


6 Question 6

  • Draw the receiver operating characteristic (ROC) curve.
  • Calculate the area under the curve (AUC).


7 Question 7

  • Use glmnet to fit a Lasso logistic regression.
    • Repeat Questions 2-6.


8 Question 8

  • Use glmnet to fit a Ridge logistic regression.
    • Repeat Questions 2-6.


9 Question 9

  • Use glmnet to fit a Elastic Net logistic regression.
    • Repeat Questions 2-6.



10 Discussion

Welcome to our Classwork 5 Discussion Board! πŸ‘‹

This space is designed for you to engage with your classmates about the material covered in Classwork 5.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 5 materials or need clarification on any points, don’t hesitate to ask here.

All comments will be stored here.

Let’s collaborate and learn from each other!

Back to top