Name, Email, and Session















Short-Answer Questions























Data Transformation and Visualization with R tidyverse

The followings are the R packages for this homework assignment:

library(tidyverse) library(skimr)

Questions 11-12

Consider the following oj data.frame for Questions 11-12:

oj <- read_csv("https://bcdanl.github.io/data/dominick_oj_na.csv")

Question 11

How can you filter the data.frame oj to calculate descriptive statistics (mean and standard deviation) of sales and price for tropicana, minute.maid, and dominicks, respectively?

Complete the code by filling in the blanks.

oj_tr <- oj |> filter(__BLANK__ "tropicana") oj_mm <- oj |> filter(__BLANK__ "minute.maid") oj_do <- oj |> filter(__BLANK__ "dominicks") oj_tr_sum <- skim(oj_tr) oj_mm_sum <- skim(oj_mm) oj_do_sum <- skim(oj_do)



Question 12

How would you create a new data.frame, oj_no_NA, in which there is no missing value in price and sales?

Complete the code by filling in the blanks.

oj_no_NA <- oj |> filter(__BLANK__)



Question 13

How would you describe how the distribution of price varies by brand?

Complete the code by filling in the blanks.

ggplot(data = __BLANK 1__, mapping = aes(x = __BLANK 2__, __BLANK 3__)) + __BLANK 4__(show.legend = FALSE, # `show.legend = FALSE` turns off legend __BLANK 5__ = 40) + facet_wrap(__BLANK 6__, __BLANK 7__ = 1)



Question 14

Provide a comment to describe how the distribution of price varies by brand.



Question 15

How would you describe how the relationship between (1) the base-10 log of sales and (2) the base-10 log of price varies by brand?

Complete the code by filling in the blanks.

ggplot(data = __BLANK 1__, mapping = aes(x = __BLANK 2__, y = __BLANK 3__, __BLANK 4__ = brand, __BLANK 5__ = brand)) + geom_point(__BLANK 6__ = .1) + geom_smooth(__BLANK 7__)



Question 16

  • Provide a comment to describe how the relationship between (1) the base-10 log of sales and (2) the base-10 log of price varies by brand?



Question 17

How would you visualize how the relationship between (1) the base-10 log of sales and (2) the base-10 log of price varies by brand and ad_status?

Complete the code by filling in the blanks (BLANKS 1-7 are the same as the ones in Question 13).

ggplot(data = __BLANK 1__, mapping = aes(x = __BLANK 2__, y = __BLANK 3__, __BLANK 4__ = brand, __BLANK 5__ = brand)) + geom_point(__BLANK 6__ = .1) + geom_smooth(__BLANK 7__) + facet_wrap(__BLANK 8__)



Question 18

Provide a comment to describe how the relationship between (1) the base-10 log of sales and (2) the base-10 log of price varies by brand and ad_status.



Questions 19-20

Consider the following mlb_bat data.frame for Questions 19-20:

mlb_bat <- read_csv("https://bcdanl.github.io/data/MLB_batting.csv")

Question 19

How would you describe the yearly trends in hit percentages for each hit_type (e.g., Single, Double, Triple, and HomeRun)?

Complete the code by filling in the blanks.

ggplot(data = __BLANK 1__, mapping = aes(x = __BLANK 2__, y = __BLANK 3__, color = __BLANK 4__, fill = __BLANK 5__)) + __BLANK 6__() + __BLANK 7__() + __BLANK 8__() + labs(title = "Hits by Type in Major League Baseball", x = "Major League Baseball Season", y = "Percentage", fill = "Hit", color = "Hit") # labs() allows for # labeling x, y, color, fill, title, etc.



Question 20

Write a comment explaining how you would describe the yearly trends in hit percentages for each hit_type (e.g., Single, Double, Triple, and HomeRun).