Classwork 4

Logistic Regression I

Author

Byeong-Hak Choe

Published

February 18, 2026

Modified

February 23, 2026

0.1 Packages

library(tidyverse)
library(broom)
library(stargazer)
library(margins)
library(yardstick)
library(WVPlots)
library(pROC)

0.2 Data

titanic <- read_csv("https://bcdanl.github.io/data/titanic_details.csv")

titanic |> 
  rmarkdown::paged_table()

0.3 Part A. Quick EDA (Warm-up)

0.3.1 A1) Outcome rate

  1. Compute the overall survival rate.
  2. Report it as a proportion and as a percent.

0.3.2 A2) Basic group comparisons

Compute survival rates by:

  • gender
  • class

Question: Which group(s) appear to have higher survival rates?


0.4 Part B. Logistic Regression Models

We model:

\[ \Pr(\text{survived}=1 \mid X) = \frac{\text{exp(z)}}{1 + \text{exp}(z)} \] where \(z\) is a linear combination of predictors.

0.4.1 B1) Create a split

Use an 80/20 split with a fixed seed.

0.4.2 B2) Baseline model

Fit a logistic regression model1:

  • outcome: survived
  • predictors: gender + class

Questions:

  1. What does a positive coefficient mean in a logistic regression?
  2. Why is it hard to interpret coefficients directly in probability units?

0.5 Part C. Average Marginal Effects (AME)

Recall: an AME is the average change in predicted probability when a predictor increases by 1 unit (or changes 0β†’1 for a dummy), averaging over the sample.

0.5.1 C1) Compute AMEs

Compute AMEs for gender and class.

Questions:

  1. Interpret the AME for gender (be precise about which category is the reference).
  2. Interpret the AME for class.

0.5.2 C2) AMEs for a richer model

Fit a richer model model2 with the following predictors:

  • gender
  • class
  • age
  • fare

Compute AMEs for gender, class, age, fare:

Questions:

  1. Interpret the AME for age in percentage points per year.
  2. Interpret the AME for fare. Rescale your interpretation to a meaningful change (e.g., per $10 increase in fare).
  3. Did the AME for gender change from model1 to model2? Why might it change?

0.6 Part D. Prediction

Use model2 to predict probabilities on the test set.


0.7 Part E. Classification at a Threshold

We convert probabilities into class predictions:

\[ \hat{y} = \begin{cases} 1 & \text{if } \hat{p} \ge t \\\\ 0 & \text{if } \hat{p} < t \end{cases} \]

0.7.1 E1) Confusion matrix at \(t=0.5\)

Compute:

  • accuracy
  • sensitivity (recall for class 1)
  • specificity
  • precision

Questions:

  1. If the dataset is imbalanced, why can accuracy be misleading?
  2. Which error is β€œworse” here: false positive or false negative? Explain.

0.7.2 E2) Try different thresholds

Repeat the metrics for thresholds:

  • 0.3
  • 0.5
  • 0.7

Questions:

  1. What happens to recall when you lower the threshold?
  2. What happens to precision when you raise the threshold?

0.8 Part F. ROC Curve and AUC

Questions:

  1. What does AUC measure in plain English?
  2. Can a model have a good AUC but poor accuracy at threshold 0.5? Why?

1 Discussion

Welcome to our Classwork 4 Discussion Board! πŸ‘‹

This space is designed for you to engage with your classmates about the material covered in Classwork 4.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 4 materials or need clarification on any points, don’t hesitate to ask here.

All comments will be stored here.

Let’s collaborate and learn from each other!

Back to top