Name, Email, and Session















Short-Answer Questions























Data Visualization with ggplot

The followings are the R packages for this homework assignment:

library(tidyverse)
library(skimr)
library(ggthemes)
library(gapminder)

Questions 11-17

Consider the following titanic data.frame for Questions 11-17:

titanic <- read_csv("https://bcdanl.github.io/data/titanic_cleaned.csv")


Question 11

How would you create the following data.frame, titanic_class_survival?

  • The titanic_class_survival data.frame counts the number of passengers who survived and those who did not survive within each class in the titanic data.frame.

Complete the code by filling in the blanks.

__BLANK 1__ <- titanic |> 
  count(__BLANK 2__)


Question 12

How would you describe the variation in the distribution of age across classes and genders?

Complete the code by filling in the blanks.

ggplot(data = __BLANK 1__,
       mapping = aes(x = gender,
                     __BLANK 2__ = age,
                     __BLANK 3__ = gender)) +
  __BLANK 4__(show.legend = F) +
  __BLANK 5__(~class) +
  scale_fill_tableau()


Question 13

Provide a comment on the variation in the distribution of age across classes and genders.


Question 14

How would you describe the variation in the distribution of survived across classes and genders?

Complete the code by filling in the blanks.

ggplot(data = __BLANK 1__,
       mapping = aes(__BLANK 2__ = class,
                     __BLANK 3__ = survived)) +
  __BLANK 4__() +
  __BLANK 5__(~gender) +
  labs(x = "Proportion") +
  scale_fill_tableau()


Question 15

How would you describe the variation in the distribution of survived across classes and genders?

Complete the code by filling in the blanks.

ggplot(data = __BLANK 1__,
       mapping = aes(__BLANK 2__ = class,
                     __BLANK 3__ = survived)) +
  __BLANK 4__(position = __BLANK 5__) +
  __BLANK 6__(~gender) +
  labs(x = "Proportion") +
  scale_fill_tableau()


Question 16

How would you describe the variation in the distribution of survived across classes and genders?

Complete the code by filling in the blanks.

ggplot(data = __BLANK 1__,
       mapping = aes(__BLANK 2__ = class,
                     __BLANK 3__ = survived)) +
  __BLANK 4__(position = __BLANK 5__) +
  __BLANK 6__(~gender) +
  scale_fill_tableau()


Question 17

Provide a comment on the variation in the distribution of survived across classes and genders.


Questions 18-20

Consider the following nyc_dogs data.frame for Questions 18-20:

nyc_dogs <- read_csv("https://bcdanl.github.io/data/nyc_dogs_cleaned.csv")
  • The nyc_dogs data.frame contains data on licensed dogs in New York city.


Question 18

How would you create the following data.frame, nyc_dogs_breeds?


  • The nyc_dogs_breeds data.frame counts the number of occurrences for each value in the breed variable in the nyc_dogs data.frame.
    • The nyc_dogs_breeds data.frame keeps observations if
      1. The number of occurrences (n) is greater than or equal to 2000;
      2. The value of breed is not missing.
    • The observations in the nyc_dogs_breeds data.frame is arranged by n in descending order.

Complete the code by filling in the blanks.

__BLANK 1__ <- nyc_dogs |> 
  __BLANK 2__ |> 
  filter(__BLANK 3__(breed)) |> 
  filter(__BLANK 4__) |> 
  arrange(__BLANK 5__)


Question 19

How would you describe the distribution of breed using the nyc_dogs_breeds data.frame?

Complete the code by filling in the blanks.

ggplot(data = __BLANK 1__,
       mapping = aes(x = __BLANK 1__,
                     __BLANK 3__)) +
  __BLANK 4__()


Question 20

How would you describe the distribution of breed using the nyc_dogs_breeds data.frame?

Complete the code by filling in the blanks.

ggplot(data = __BLANK 1__,
       mapping = aes(x = __BLANK 1__,
                     __BLANK 3__)) +
  __BLANK 4__() +
  labs(y = "Breed")