library(tidyverse)
library(stargazer)
library(broom)
library(skimr)Classwork 3
Linear Regression II
R Packages
Question 1
oj <- read_csv("https://bcdanl.github.io/data/dominick_oj_feat.csv")- What are continuous variables? What are categorical variables?
- For each continuous variable, provide descriptive statistics for each OJ brand.
Variable description
| Variable | Description |
|---|---|
sales |
Quantity of OJ cartons sold |
price |
Price of OJ |
brand |
Brand of OJ |
ad |
Advertisement status |
Question 2
- Divide the
dfdata.frame into training and test data.frames.- Use
dtrainanddtestfor training and test data.frames, respectively. - 70% of observations in the
dfare assigned todtrain; the rest is assigned todtest.
- Use
Question 3
Train the following linear regression model. Provide the summary of the regression result.
- Set βdominicksβ as the reference level for the \(\text{brand}\) variable.
- Set βNo Adβ as the reference level for the \(\text{ad\_status}\) variable.
Model 1
\[ \begin{align} \log(\text{sales}_{\text{i}}) &\,=\, \;\; b_{\text{intercept}} \,+\, b_{\,\text{mm}}\,\text{brand}_{\,\text{mm}, \text{i}} \,+\, b_{\,\text{tr}}\,\text{brand}_{\,\text{tr}, \text{i}}\\ &\quad\,+\, b_{\text{price}}\,\log(\text{price}_{\text{i}}) \,+\, e_{\text{i}},\\ \text{where}\qquad\qquad&\\ \text{brand}_{\,\text{tr}, \text{i}} &\,=\, \begin{cases} \text{1} & \text{ if an orange juice } \text{i} \text{ is } \text{Tropicana};\\\\ \text{0} & \text{otherwise}.\qquad\qquad\quad\, \end{cases}\\ \text{brand}_{\,\text{mm}, \text{i}} &\,=\, \begin{cases} \text{1} & \text{ if an orange juice } \text{i} \text{ is } \text{Minute Maid};\\\\ \text{0} & \text{otherwise}.\qquad\qquad\quad\, \end{cases} \end{align} \]
Model 2
\[ \begin{align} \log(\text{sales}_{\text{i}}) \,=\,&\;\; \quad b_{\text{intercept}} \,+\, \color{Green}{b_{\,\text{mm}}\,\text{brand}_{\,\text{mm}, \text{i}}} \,+\, \color{Blue}{b_{\,\text{tr}}\,\text{brand}_{\,\text{tr}, \text{i}}}\\ &\,+\, b_{\text{price}}\,\log(\text{price}_{\text{i}}) \\ &\, +\, b_{\text{price*mm}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Green} {\text{brand}_{\,\text{mm}, \text{i}}} \\ &\,+\, b_{\text{price*tr}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Blue} {\text{brand}_{\,\text{tr}, \text{i}}} \,+\, e_{\text{i}} \end{align} \]
Model 3
\[ \begin{align} \log(\text{sales}_{\text{i}}) \,=\,\quad\;\;& b_{\text{intercept}} \,+\, \color{Green}{b_{\,\text{mm}}\,\text{brand}_{\,\text{mm}, \text{i}}} \,+\, \color{Blue}{b_{\,\text{tr}}\,\text{brand}_{\,\text{tr}, \text{i}}} \\ &\,+\; b_{\,\text{ad}}\,\color{Orange}{\text{ad}_{\,\text{i}}} \qquad\qquad\qquad\qquad\quad \\ &\,+\, b_{\text{mm*ad}}\,\color{Green} {\text{brand}_{\,\text{mm}, \text{i}}}\,\times\, \color{Orange}{\text{ad}_{\,\text{i}}}\,+\, b_{\text{tr*ad}}\,\color{Blue} {\text{brand}_{\,\text{tr}, \text{i}}}\,\times\, \color{Orange}{\text{ad}_{\,\text{i}}} \\ &\,+\; b_{\text{price}}\,\log(\text{price}_{\text{i}}) \qquad\qquad\qquad\;\;\;\;\, \\ &\,+\, b_{\text{price*mm}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Green} {\text{brand}_{\,\text{mm}, \text{i}}}\qquad\qquad\qquad\;\, \\ &\,+\, b_{\text{price*tr}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Blue} {\text{brand}_{\,\text{tr}, \text{i}}}\qquad\qquad\qquad\;\, \\ & \,+\, b_{\text{price*ad}}\,\log(\text{price}_{\text{i}})\,\times\,\color{Orange}{\text{ad}_{\,\text{i}}}\qquad\qquad\qquad\;\;\, \\ &\,+\, b_{\text{price*mm*ad}}\,\log(\text{price}_{\text{i}}) \,\times\,\,\color{Green} {\text{brand}_{\,\text{mm}, \text{i}}}\,\times\, \color{Orange}{\text{ad}_{\,\text{i}}} \\ &\,+\, b_{\text{price*tr*ad}}\,\log(\text{price}_{\text{i}}) \,\times\,\,\color{Blue} {\text{brand}_{\,\text{tr}, \text{i}}}\,\times\, \color{Orange}{\text{ad}_{\,\text{i}}} \,+\, e_{\text{i}} \end{align} \]
Question 4
For each model, make a prediction on the outcome variable using the test data.frame and the regression result from Question 3.
Question 5
Across the three models, how is the percentage change in the price of OJ sensitive to the percentage change in the OJ purchases for each brand?
How does
promoaffect such sensitivity in the Model 3?
Question 6
- Compare RMSEs using a test data.frame across the models.
Question 7
- Draw a residual plot from each of the three models.
- On average, are the prediction correct? Are there systematic errors?
Question 8
How would you explain different estimation results across different models?
Which model do you prefer? Why?
Discussion
Welcome to our Classwork 3 Discussion Board! π
This space is designed for you to engage with your classmates about the material covered in Classwork 3.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 3 materials or need clarification on any points, donβt hesitate to ask here.
All comments will be stored here.
Letβs collaborate and learn from each other!