Data Visualization with ggplot
The followings are the R packages for this homework assignment:
library(tidyverse)
library(skimr)
library(ggthemes)
library(gapminder)
Questions 11-17
Consider the following titanic
data.frame for Questions 11-17:
titanic <- read_csv("https://bcdanl.github.io/data/titanic_cleaned.csv")
Question 11
How would you create the following data.frame, titanic_class_survival
?
- The
titanic_class_survival
data.frame counts the number of passengers who survived and those who did not survive within each class
in the titanic
data.frame.
Complete the code by filling in the blanks.
__BLANK 1__ <- titanic |>
count(__BLANK 2__)
Question 12
How would you describe the variation in the distribution of age
across classes
and genders
?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(x = gender,
__BLANK 2__ = age,
__BLANK 3__ = gender)) +
__BLANK 4__(show.legend = F) +
__BLANK 5__(~class) +
scale_fill_tableau()
Question 13
Provide a comment on the variation in the distribution of age
across classes
and genders
.
Question 14
How would you describe the variation in the distribution of survived
across classes
and genders
?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(__BLANK 2__ = class,
__BLANK 3__ = survived)) +
__BLANK 4__() +
__BLANK 5__(~gender) +
labs(x = "Proportion") +
scale_fill_tableau()
Question 15
How would you describe the variation in the distribution of survived
across classes
and genders
?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(__BLANK 2__ = class,
__BLANK 3__ = survived)) +
__BLANK 4__(position = __BLANK 5__) +
__BLANK 6__(~gender) +
labs(x = "Proportion") +
scale_fill_tableau()
Question 16
How would you describe the variation in the distribution of survived
across classes
and genders
?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(__BLANK 2__ = class,
__BLANK 3__ = survived)) +
__BLANK 4__(position = __BLANK 5__) +
__BLANK 6__(~gender) +
scale_fill_tableau()
Question 17
Provide a comment on the variation in the distribution of survived
across classes
and genders
.
Questions 18-20
Consider the following nyc_dogs
data.frame for Questions 18-20:
nyc_dogs <- read_csv("https://bcdanl.github.io/data/nyc_dogs_cleaned.csv")
- The
nyc_dogs
data.frame contains data on licensed dogs in New York city.
Question 18
How would you create the following data.frame, nyc_dogs_breeds
?
- The
nyc_dogs_breeds
data.frame counts the number of occurrences for each value in the breed
variable in the nyc_dogs
data.frame.
- The
nyc_dogs_breeds
data.frame keeps observations if
- The number of occurrences (
n
) is greater than or equal to 2000;
- The value of
breed
is not missing.
- The observations in the
nyc_dogs_breeds
data.frame is arranged by n
in descending order.
Complete the code by filling in the blanks.
__BLANK 1__ <- nyc_dogs |>
__BLANK 2__ |>
filter(__BLANK 3__(breed)) |>
filter(__BLANK 4__) |>
arrange(__BLANK 5__)
Question 19
How would you describe the distribution of breed
using the nyc_dogs_breeds
data.frame?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 1__,
__BLANK 3__)) +
__BLANK 4__()
Question 20
How would you describe the distribution of breed
using the nyc_dogs_breeds
data.frame?
Complete the code by filling in the blanks.
ggplot(data = __BLANK 1__,
mapping = aes(x = __BLANK 1__,
__BLANK 3__)) +
__BLANK 4__() +
labs(y = "Breed")