Classwork 3

Linear Regression II

Author

Byeong-Hak Choe

Published

February 4, 2026

Modified

February 4, 2026

R Packages

library(tidyverse)
library(stargazer)
library(broom)
library(skimr)

Question 1

oj <- read_csv("https://bcdanl.github.io/data/dominick_oj_feat.csv")
  • What are continuous variables? What are categorical variables?
  • For each continuous variable, provide descriptive statistics for each OJ brand.


Variable description

Variable Description
sales Quantity of OJ cartons sold
price Price of OJ
brand Brand of OJ
ad Advertisement status



Question 2

  • Divide the df data.frame into training and test data.frames.
    • Use dtrain and dtest for training and test data.frames, respectively.
    • 70% of observations in the df are assigned to dtrain; the rest is assigned to dtest.



Question 3

Train the following linear regression model. Provide the summary of the regression result.

  • Set β€œdominicks” as the reference level for the \(\text{brand}\) variable.
  • Set β€œNo Ad” as the reference level for the \(\text{ad\_status}\) variable.

Model 1

\[ \begin{align} \log(\text{sales}_{\text{i}}) &\,=\, \;\; b_{\text{intercept}} \,+\, b_{\,\text{mm}}\,\text{brand}_{\,\text{mm}, \text{i}} \,+\, b_{\,\text{tr}}\,\text{brand}_{\,\text{tr}, \text{i}}\\ &\quad\,+\, b_{\text{price}}\,\log(\text{price}_{\text{i}}) \,+\, e_{\text{i}},\\ \text{where}\qquad\qquad&\\ \text{brand}_{\,\text{tr}, \text{i}} &\,=\, \begin{cases} \text{1} & \text{ if an orange juice } \text{i} \text{ is } \text{Tropicana};\\\\ \text{0} & \text{otherwise}.\qquad\qquad\quad\, \end{cases}\\ \text{brand}_{\,\text{mm}, \text{i}} &\,=\, \begin{cases} \text{1} & \text{ if an orange juice } \text{i} \text{ is } \text{Minute Maid};\\\\ \text{0} & \text{otherwise}.\qquad\qquad\quad\, \end{cases} \end{align} \]

Model 2

\[ \begin{align} \log(\text{sales}_{\text{i}}) \,=\,&\;\; \quad b_{\text{intercept}} \,+\, \color{Green}{b_{\,\text{mm}}\,\text{brand}_{\,\text{mm}, \text{i}}} \,+\, \color{Blue}{b_{\,\text{tr}}\,\text{brand}_{\,\text{tr}, \text{i}}}\\ &\,+\, b_{\text{price}}\,\log(\text{price}_{\text{i}}) \\ &\, +\, b_{\text{price*mm}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Green} {\text{brand}_{\,\text{mm}, \text{i}}} \\ &\,+\, b_{\text{price*tr}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Blue} {\text{brand}_{\,\text{tr}, \text{i}}} \,+\, e_{\text{i}} \end{align} \]

Model 3

\[ \begin{align} \log(\text{sales}_{\text{i}}) \,=\,\quad\;\;& b_{\text{intercept}} \,+\, \color{Green}{b_{\,\text{mm}}\,\text{brand}_{\,\text{mm}, \text{i}}} \,+\, \color{Blue}{b_{\,\text{tr}}\,\text{brand}_{\,\text{tr}, \text{i}}} \\ &\,+\; b_{\,\text{ad}}\,\color{Orange}{\text{ad}_{\,\text{i}}} \qquad\qquad\qquad\qquad\quad \\ &\,+\, b_{\text{mm*ad}}\,\color{Green} {\text{brand}_{\,\text{mm}, \text{i}}}\,\times\, \color{Orange}{\text{ad}_{\,\text{i}}}\,+\, b_{\text{tr*ad}}\,\color{Blue} {\text{brand}_{\,\text{tr}, \text{i}}}\,\times\, \color{Orange}{\text{ad}_{\,\text{i}}} \\ &\,+\; b_{\text{price}}\,\log(\text{price}_{\text{i}}) \qquad\qquad\qquad\;\;\;\;\, \\ &\,+\, b_{\text{price*mm}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Green} {\text{brand}_{\,\text{mm}, \text{i}}}\qquad\qquad\qquad\;\, \\ &\,+\, b_{\text{price*tr}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Blue} {\text{brand}_{\,\text{tr}, \text{i}}}\qquad\qquad\qquad\;\, \\ & \,+\, b_{\text{price*ad}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Orange}{\text{ad}_{\,\text{i}}}\qquad\qquad\qquad\;\;\, \\ &\,+\, b_{\text{price*mm*ad}}\,\log(\text{price}_{\text{i}}) \,\times\,\,\color{Green} {\text{brand}_{\,\text{mm}, \text{i}}}\,\times\, \color{Orange}{\text{ad}_{\,\text{i}}} \\ &\,+\, b_{\text{price*tr*ad}}\,\log(\text{price}_{\text{i}}) \,\times\,\,\color{Blue} {\text{brand}_{\,\text{tr}, \text{i}}}\,\times\, \color{Orange}{\text{ad}_{\,\text{i}}} \,+\, e_{\text{i}} \end{align} \]



Question 4

For each model, make a prediction on the outcome variable using the test data.frame and the regression result from Question 3.



Question 5

  • Across the three models, how is the percentage change in the price of OJ sensitive to the percentage change in the OJ purchases for each brand?

  • How does promo affect such sensitivity in the Model 3?



Question 6

  • Compare RMSEs using a test data.frame across the models.



Question 7

  • Draw a residual plot from each of the three models.
    • On average, are the prediction correct? Are there systematic errors?



Question 8

  • How would you explain different estimation results across different models?

  • Which model do you prefer? Why?



Discussion

Welcome to our Classwork 3 Discussion Board! πŸ‘‹

This space is designed for you to engage with your classmates about the material covered in Classwork 3.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 3 materials or need clarification on any points, don’t hesitate to ask here.

All comments will be stored here.

Let’s collaborate and learn from each other!

Back to top