Questions 11-12
Consider the following oj
data.frame for Questions 11-12:
oj <- read_csv("https://bcdanl.github.io/data/dominick_oj_na.csv")
Question 11
How can you filter the data.frame oj
to calculate descriptive statistics (mean and standard deviation) of sales
and price
for tropicana, minute.maid, and dominicks, respectively?
Complete the code by filling in the blanks.
oj_tr <- oj |>
filter(__BLANK__ "tropicana")
oj_mm <- oj |>
filter(__BLANK__ "minute.maid")
oj_do <- oj |>
filter(__BLANK__ "dominicks")
oj_tr_sum <- skim(oj_tr)
oj_mm_sum <- skim(oj_mm)
oj_do_sum <- skim(oj_do)
Question 12
How would you create a new data.frame, oj_no_NA
, in which there is no missing value in price
and sales
?
Complete the code by filling in the blanks.
oj_no_NA <- oj |>
filter(__BLANK__)
Question 13
How would you describe how the distribution of price
varies by brand
?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 2__,
__BLANK 3__)) +
__BLANK 4__(show.legend = FALSE, # `show.legend = FALSE` turns off legend
__BLANK 5__ = 40) +
facet_wrap(__BLANK 6__,
__BLANK 7__ = 1)
Question 14
Provide a comment to describe how the distribution of price
varies by brand
.
Question 15
How would you describe how the relationship between (1) the base-10 log of sales
and (2) the base-10 log of price
varies by brand
?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 2__,
y = __BLANK 3__,
__BLANK 4__ = brand,
__BLANK 5__ = brand)) +
geom_point(__BLANK 6__ = .1) +
geom_smooth(__BLANK 7__)
Question 16
- Provide a comment to describe how the relationship between (1) the base-10 log of
sales
and (2) the base-10 log of price
varies by brand
?
Question 17
How would you visualize how the relationship between (1) the base-10 log of sales
and (2) the base-10 log of price
varies by brand
and ad_status
?
Complete the code by filling in the blanks (BLANKS 1-7 are the same as the ones in Question 13).
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 2__,
y = __BLANK 3__,
__BLANK 4__ = brand,
__BLANK 5__ = brand)) +
geom_point(__BLANK 6__ = .1) +
geom_smooth(__BLANK 7__) +
facet_wrap(__BLANK 8__)
Question 18
Provide a comment to describe how the relationship between (1) the base-10 log of sales
and (2) the base-10 log of price
varies by brand
and ad_status
.
Questions 19-20
Consider the following mlb_bat
data.frame for Questions 19-20:
mlb_bat <- read_csv("https://bcdanl.github.io/data/MLB_batting.csv")
Question 19
How would you describe the yearly trends in hit percentages for each hit_type
(e.g., Single, Double, Triple, and HomeRun)?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 2__,
y = __BLANK 3__,
color = __BLANK 4__,
fill = __BLANK 5__)) +
__BLANK 6__() +
__BLANK 7__() +
__BLANK 8__() +
labs(title = "Hits by Type in Major League Baseball",
x = "Major League Baseball Season",
y = "Percentage",
fill = "Hit",
color = "Hit") # labs() allows for
# labeling x, y, color, fill, title, etc.
Question 20
Write a comment explaining how you would describe the yearly trends in hit percentages for each hit_type
(e.g., Single, Double, Triple, and HomeRun).