The following shows the first four variables in holiday_movie_with_genres:
ABCDEFGHIJ0123456789
tconst
<chr>
genres
<chr>
title_type
<chr>
primary_title
<chr>
simple_title
<chr>
year
<dbl>
tt0020356
Comedy
movie
Sailor's Holiday
sailors holiday
1929
tt0020823
Drama
movie
The Devil's Holiday
the devils holiday
1930
tt0020823
Romance
movie
The Devil's Holiday
the devils holiday
1930
tt0020985
Comedy
movie
Holiday
holiday
1930
tt0020985
Drama
movie
Holiday
holiday
1930
tt0021268
Comedy
movie
Holiday of St. Jorgen
holiday of st jorgen
1930
tt0021377
Comedy
movie
Sin Takes a Holiday
sin takes a holiday
1930
tt0021377
Romance
movie
Sin Takes a Holiday
sin takes a holiday
1930
tt0021381
Adventure
movie
Sinners' Holiday
sinners holiday
1930
tt0021381
Crime
movie
Sinners' Holiday
sinners holiday
1930
tt0021381
Romance
movie
Sinners' Holiday
sinners holiday
1930
tt0023039
Drama
movie
Husband's Holiday
husbands holiday
1931
tt0024869
Crime
movie
Beggar's Holiday
beggars holiday
1934
tt0024869
Drama
movie
Beggar's Holiday
beggars holiday
1934
tt0024869
Romance
movie
Beggar's Holiday
beggars holiday
1934
tt0025006
Western
movie
Cowboy Holiday
cowboy holiday
1934
tt0025037
Drama
movie
Death Takes a Holiday
death takes a holiday
1934
tt0025037
Fantasy
movie
Death Takes a Holiday
death takes a holiday
1934
tt0025037
Romance
movie
Death Takes a Holiday
death takes a holiday
1934
tt0027456
Comedy
movie
College Holiday
college holiday
1936
Q4b.
Provide the R code using skimr::skim() to see how the summary statistics—mean, median, standard deviation, minimum, maximum, first and third quartiles—of average_rating and num_votes varies by popular genres and title_type.
Consider only the five popular genres, which are selected in terms of the number of titles for each genre.
Removes the video type of the titles when calculating the summary statistics.
Provide R code to recreate the ggplot figure illustrating how the relationship between log10(num_votes) and average_rating varies by the popular genres and title_type.
The five popular genres are selected in terms of the number of titles for each genre.
The video type of the titles are removed in the ggplot figure.
Provide R code to recreate the ggplot figure illustrating the annual trend of (1) the number of movie titles with “holiday” varies by christmas.
Click to Check the Answer!
holiday_movies_with_genres |>group_by(year, christmas, holiday) |>count() |>ggplot(aes(x = year, y = n, color = holiday)) +geom_smooth() +geom_point(alpha = .33) +facet_wrap(christmas~., scales ="free")
Q4i.
Provide R code to recreate the ggplot figure illustrating how the mean value of num_votes varies by the popular genres for the titles with “christmas”.
TripAdvisor is an online travel research company that empowers people around the world to plan and enjoy the ideal trip.
TripAdvisor wanted to know whether promoting membership on their platform could drive engagement and bookings.
To do so, TripAdvisor had just run an experiment to explore user retention by offering a random subset of customers an easier sign-up process for membership.
The following is the data.frame, tripadvisor.
ABCDEFGHIJ0123456789
id
<dbl>
time
<chr>
easier_signup
<lgl>
days_visited
<dbl>
became_member
<lgl>
locale_en_US
<lgl>
1
PRE
FALSE
1
FALSE
TRUE
1
POST
FALSE
1
FALSE
TRUE
2
PRE
FALSE
10
FALSE
FALSE
2
POST
FALSE
15
FALSE
FALSE
3
PRE
FALSE
18
FALSE
TRUE
3
POST
FALSE
17
FALSE
TRUE
4
PRE
FALSE
17
FALSE
TRUE
4
POST
FALSE
6
FALSE
TRUE
5
PRE
FALSE
24
FALSE
TRUE
5
POST
FALSE
12
FALSE
TRUE
6
PRE
TRUE
11
TRUE
TRUE
6
POST
TRUE
16
TRUE
TRUE
7
PRE
FALSE
15
FALSE
FALSE
7
POST
FALSE
3
FALSE
FALSE
8
PRE
FALSE
11
FALSE
TRUE
8
POST
FALSE
1
FALSE
TRUE
9
PRE
TRUE
27
TRUE
FALSE
9
POST
TRUE
14
TRUE
FALSE
10
PRE
TRUE
0
TRUE
FALSE
10
POST
TRUE
1
TRUE
FALSE
Variable description
id: a unique identifier for a user.
time:
PRE if time is before the experiment;
POST if time is in the 28 days after the experiment.
For each id value, there are two observations—one with time == “PRE” and the other with time == “POST”.
days_visited: Number of days a user visited the TripAdvisor website.
easier_signup:
TRUE if a user was exposed to the easier signup process (e.g., one-click signup) during the experiment;
FALSE otherwise.
became_member:
TRUE if a user became a member during the experiment period;
FALSE otherwise.
locale_en_US:
TRUE if a user accessed the website from the US;
FALSE otherwise.
os_type: Windows, Mac, or Others
revenue_pre: Amount of dollars a user spent on the website before the experiment
Q5a.
Using the given data.frame, tripadvisor, create the data.frame, tripadvisor, for which
time is a factor-type variable of time with the first level, “PRE”.
Provide R code to recreate the ggplot figure illustrating how the relationship between time and days_visited varies by easier_signup and became_member.
Provide a comment to illustrate how the relationship between time and days_visited varies by easier_signup and became_member.
Q5d.
Provide a R code to create the data.frame Q5d that includes the variable diff, the difference between (1) the value of days_visited for time == PRE and (2) the value of days_visited for time == POST for each id.
brexit <- brexit |>mutate(region =fct_relevel(region, "london", "rest_of_south", "midlands_wales", "north", "scot"),region =fct_recode(region, London ="london", `Rest of South`="rest_of_south", `Midlands / Wales`="midlands_wales", North ="north", Scotland ="scot") )ggplot(brexit, aes(y = opinion, fill = opinion)) +geom_bar() +facet_wrap( ~ region, nrow =1, labeller =label_wrap_gen(width =12)) +guides(fill ="none") +labs(title ="Was Britain right/wrong to vote to leave EU?",subtitle ="YouGov Survey Results, 2-3 September 2019",caption ="Source: bit.ly/2lCJZVg",x =NULL, y =NULL ) +scale_fill_manual(values =c("gray","#67a9cf","#ef8a62" )) +theme_minimal()
Q6b
Replicate the following visualization
How is the story this visualization telling different than the story the plot in Q6a?
Click to Check the Answer!
ggplot(brexit, aes(y = opinion, fill = opinion)) +geom_bar() +facet_wrap(~region, scales ='free_x',nrow =1, labeller =label_wrap_gen(width =12),# ___ ) +guides(fill ="none") +labs(title ="Was Britain right/wrong to vote to leave EU?",subtitle ="YouGov Survey Results, 2-3 September 2019",caption ="Source: bit.ly/2lCJZVg",x =NULL, y =NULL ) +scale_fill_manual(values =c("Wrong"="#ef8a62","Right"="#67a9cf","Don't know"="gray" )) +theme_minimal()
Q6c
First, calculate the proportion of wrong, right, and don’t know answers in each region and then plot these proportions (rather than the counts) and then improve axis labeling.
Click to Check the Answer!
q6 <- brexit |>group_by(region, opinion) |>summarise(n =n()) |>mutate(tot =sum(n),prop = n / tot )
Replicate the following visualization
How is the story this visualization telling different than the story the plot in Q4b?
Click to Check the Answer!
ggplot(q6, aes(y = opinion, x = prop,fill = opinion)) +geom_col() +facet_wrap(~region,nrow =1, labeller =label_wrap_gen(width =12),# ___ ) +guides(fill ="none") +labs(title ="Was Britain right/wrong to vote to leave EU?",subtitle ="YouGov Survey Results, 2-3 September 2019",caption ="Source: bit.ly/2lCJZVg",x ='Percent', y =NULL ) +scale_fill_manual(values =c("Wrong"="#ef8a62","Right"="#67a9cf","Don't know"="gray" )) +scale_x_continuous(labels = scales::percent) +theme_minimal()
Q6d.
Recreate the same visualization from the previous exercise, this time dodging the bars for opinion proportions for each region, rather than faceting by region and then improve the legend.
How is the story this visualization telling different than the story the previous plot tells?
Click to Check the Answer!
ggplot(q6, aes(y = region, x = prop,fill = opinion)) +geom_col(position ="dodge") +labs(title ="Was Britain right/wrong to vote to leave EU?",subtitle ="YouGov Survey Results, 2-3 September 2019",caption ="Source: bit.ly/2lCJZVg",x ='Percent', y =NULL, fill ='Opinion' ) +scale_fill_manual(values =c("Wrong"="#ef8a62","Right"="#67a9cf","Don't know"="gray" )) +scale_x_continuous(labels = scales::percent) +theme_minimal()
Discussion
Welcome to our Classwork 7 Discussion Board! 👋
This space is designed for you to engage with your classmates about the material covered in Classwork 7.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 7 materials or need clarification on any points, don’t hesitate to ask here.