Questions 11-12
Consider the following oj data.frame for Questions 11-12:
oj <- read_csv("https://bcdanl.github.io/data/dominick_oj_na.csv")
Question 11
How can you filter the data.frame oj to calculate descriptive statistics (mean and standard deviation) of sales and price for tropicana, minute.maid, and dominicks, respectively?
Complete the code by filling in the blanks.
oj_tr <- oj |>
filter(__BLANK__ "tropicana")
oj_mm <- oj |>
filter(__BLANK__ "minute.maid")
oj_do <- oj |>
filter(__BLANK__ "dominicks")
oj_tr_sum <- skim(oj_tr)
oj_mm_sum <- skim(oj_mm)
oj_do_sum <- skim(oj_do)
Question 12
Using the oj data.frame, how would you create a new data.frame, oj_no_NA, in which there is no missing value in price and sales?
Complete the code by filling in the blanks.
oj_no_NA <- oj |>
filter(__BLANK__)
Question 13
How would you describe how the distribution of price varies by brand?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 2__,
__BLANK 3__)) +
__BLANK 4__(show.legend = FALSE, # `show.legend = FALSE` turns off legend
__BLANK 5__ = 40) +
facet_wrap(__BLANK 6__,
__BLANK 7__ = 1)
Question 14
Provide a comment to describe how the distribution of price varies by brand. (Please be specific.)
Question 15
How would you describe how the relationship between (1) the base-10 log of sales and (2) the base-10 log of price varies by brand?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 2__,
y = __BLANK 3__,
__BLANK 4__ = brand,
__BLANK 5__ = brand)) +
geom_point(__BLANK 6__ = .1) +
geom_smooth(__BLANK 7__)
Question 16
Provide a comment to describe how the relationship between (1) the base-10 log of sales and (2) the base-10 log of price varies by brand? (Please be specific.)
Question 17
How would you visualize how the relationship between (1) the base-10 log of sales and (2) the base-10 log of price varies by brand and ad_status?
Complete the code by filling in the blanks (BLANKS 1-7 are the same as the ones in Question 13).
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 2__,
y = __BLANK 3__,
__BLANK 4__ = brand,
__BLANK 5__ = brand)) +
geom_point(__BLANK 6__ = .1) +
geom_smooth(__BLANK 7__) +
facet_wrap(__BLANK 8__)
Question 18
Provide a comment to describe how the relationship between (1) the base-10 log of sales and (2) the base-10 log of price varies by brand and ad_status. (Please be specific.)
Questions 19-20
Consider the following mlb_bat data.frame for Questions 19-20:
mlb_bat <- read_csv("https://bcdanl.github.io/data/MLB_batting.csv")
Question 19
How would you describe the yearly trends in hit percentages for each hit_type (e.g., Single, Double, Triple, and HomeRun)?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 2__,
y = __BLANK 3__,
color = __BLANK 4__,
fill = __BLANK 5__)) +
__BLANK 6__() +
__BLANK 7__() +
__BLANK 8__() +
labs(title = "Hits by Type in Major League Baseball",
x = "Major League Baseball Season",
y = "Percentage",
fill = "Hit",
color = "Hit") # labs() allows for
# labeling x, y, color, fill, title, etc.
Question 20
Write a comment explaining how you would describe the yearly trends in hit percentages for each hit_type (e.g., Single, Double, Triple, and HomeRun).