library(tidyverse)
library(broom)
library(stargazer)
library(margins)
library(yardstick)
library(WVPlots)
library(pROC)
library(glmnet)
library(gamlr)Classwork 5
Logistic Regression II - Quasi-Separation & Regularization
0.1 Packages
0.2 Data
df <- read_csv('https://bcdanl.github.io/data/car-data.csv')
df |>
rmarkdown::paged_table()0.3 Variable description
| Variable | Description |
|---|---|
buying |
Buying price of the car (vhigh, high, med, low) |
maint |
Maintenance cost (vhigh, high, med, low) |
doors |
Number of doors (2, 3, 4, 5more) |
persons |
Capacity in terms of persons to carry (2, 4, more) |
lug_boot |
Size of luggage boot (small, med, big) |
safety |
Estimated safety of the car (low, med, high) |
rating |
Car acceptability (unacc, acc, good, vgood) |
fail |
TRUE if the car is unacceptable (unacc), otherwise FALSE |
1 Question 1
- Divide the
dfDataFrame into training and test DataFrames.- Use
dtrainanddtestfor training and test DataFrames, respectively. - 70% of observations in the
dfare assigned todtrain; the rest is assigned todtest.
- Use
2 Question 2
Fit the following regression model:
\[ \begin{align} &\quad\;\; \text{Prob}(\text{fail}_{i} = 1) \\ &= G\Big(\beta_{0} \\ &\qquad\quad\;\;\; \,+\, \beta_{4} \text{buying\_med}_{i} \,+\, \beta_{4} \text{buying\_high}_{i} \,+\, \beta_{4} \text{buying\_vhigh}_{i} \\ &\qquad\quad\;\;\; \,+\, \beta_{4} \text{maint\_med}_{i} \,+\, \beta_{4} \text{maint\_high}_{i} \,+\, \beta_{4} \text{maint\_vhigh}_{i} \\ &\qquad\quad\;\;\; \,+\, \beta_{7} \text{persons\_4}_{i} \,+\, \beta_{8} \text{persons\_more}_{i} \\ &\qquad\quad\;\;\; \,+\, \beta_{10} \text{lug\_boot\_med}_{i}\,+\, \beta_{10} \text{lug\_boot\_big}_{i} \\ &\qquad\quad\;\;\; \,+\, \beta_{11} \text{safety\_med}_{i}\,+\, \beta_{11} \text{safety\_high}_{i} \Big), \end{align} \]
where \(G(\,\cdot\,)\) is
\[ G(\,\cdot\,) = \frac{\exp(\,\cdot\,)}{1 + \exp(\,\cdot\,)}. \]
Provide the summary of the regression result.
- Set the reference levels accordingly.
3 Question 3
- How are coefficient estimates?
4 Question 4
- Calculate the followings:
- Confusion matrix with the appropriate threshold level.
- Accuracy
- Precision
- Recall
- Specificity
- Average rate of at-risk babies
- Enrichment
5 Question 5
Visualize the variation in recall and enrichment across different threshold levels.
6 Question 6
- Draw the receiver operating characteristic (ROC) curve.
- Calculate the area under the curve (AUC).
7 Question 7
- Use
glmnetto fit a Lasso logistic regression.- Repeat Questions 2-6.
8 Question 8
- Use
glmnetto fit a Ridge logistic regression.- Repeat Questions 2-6.
9 Question 9
- Use
glmnetto fit a Elastic Net logistic regression.- Repeat Questions 2-6.
10 Discussion
Welcome to our Classwork 5 Discussion Board! π
This space is designed for you to engage with your classmates about the material covered in Classwork 5.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 5 materials or need clarification on any points, donβt hesitate to ask here.
All comments will be stored here.
Letβs collaborate and learn from each other!