Classwork 4

Logistic Regression I

Author

Byeong-Hak Choe

Published

February 18, 2026

Modified

February 23, 2026

0.1 Packages

library(tidyverse)
library(broom)
library(stargazer)
library(margins)
library(yardstick)
library(WVPlots)
library(pROC)

0.2 Data

titanic <- read_csv("https://bcdanl.github.io/data/titanic_details.csv")

titanic |> 
  rmarkdown::paged_table()

0.3 Part A. Quick EDA (Warm-up)

0.3.1 A1) Outcome rate

Compute the overall survival rate.
Report it as a proportion and as a percent.

0.3.2 A2) Basic group comparisons

Compute survival rates by:

gender
class

Question: Which group(s) appear to have higher survival rates?

0.4 Part B. Logistic Regression Models

We model:

\[ \Pr(\text{survived}=1 \mid X) = \frac{\text{exp(z)}}{1 + \text{exp}(z)} \] where $z$ is a linear combination of predictors.

0.4.1 B1) Create a split

Use an 80/20 split with a fixed seed.

0.4.2 B2) Baseline model

Fit a logistic regression model1:

outcome: survived
predictors: gender + class

Questions:

What does a positive coefficient mean in a logistic regression?
Why is it hard to interpret coefficients directly in probability units?

0.5 Part C. Average Marginal Effects (AME)

Recall: an AME is the average change in predicted probability when a predictor increases by 1 unit (or changes 0→1 for a dummy), averaging over the sample.

0.5.1 C1) Compute AMEs

Compute AMEs for gender and class.

Questions:

Interpret the AME for gender (be precise about which category is the reference).
Interpret the AME for class.

0.5.2 C2) AMEs for a richer model

Fit a richer model model2 with the following predictors:

gender
class
age
fare

Compute AMEs for gender, class, age, fare:

Questions:

Interpret the AME for age in percentage points per year.
Interpret the AME for fare. Rescale your interpretation to a meaningful change (e.g., per $10 increase in fare).
Did the AME for gender change from model1 to model2? Why might it change?

0.6 Part D. Prediction

Use model2 to predict probabilities on the test set.

0.7 Part E. Classification at a Threshold

We convert probabilities into class predictions:

\[ \hat{y} = \begin{cases} 1 & \text{if } \hat{p} \ge t \\\\ 0 & \text{if } \hat{p} < t \end{cases} \]

0.7.1 E1) Confusion matrix at $t=0.5$

Compute:

accuracy
sensitivity (recall for class 1)
specificity
precision

Questions:

If the dataset is imbalanced, why can accuracy be misleading?
Which error is “worse” here: false positive or false negative? Explain.

0.7.2 E2) Try different thresholds

Repeat the metrics for thresholds:

Questions:

What happens to recall when you lower the threshold?
What happens to precision when you raise the threshold?

0.8 Part F. ROC Curve and AUC

Questions:

What does AUC measure in plain English?
Can a model have a good AUC but poor accuracy at threshold 0.5? Why?

1 Discussion

Welcome to our Classwork 4 Discussion Board! 👋

This space is designed for you to engage with your classmates about the material covered in Classwork 4.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 4 materials or need clarification on any points, don’t hesitate to ask here.

All comments will be stored here.

Let’s collaborate and learn from each other!

0.1 Packages

0.2 Data

0.3 Part A. Quick EDA (Warm-up)

0.3.1 A1) Outcome rate

0.3.2 A2) Basic group comparisons

0.4 Part B. Logistic Regression Models

0.4.1 B1) Create a split

0.4.2 B2) Baseline model

0.5 Part C. Average Marginal Effects (AME)

0.5.1 C1) Compute AMEs

0.5.2 C2) AMEs for a richer model

0.6 Part D. Prediction

0.7 Part E. Classification at a Threshold

0.7.1 E1) Confusion matrix at \(t=0.5\)

0.7.2 E2) Try different thresholds

0.8 Part F. ROC Curve and AUC

1 Discussion