Classwork 2

Relationship Plots

Author

Byeong-Hak Choe

Published

January 28, 2026

Modified

February 9, 2026

R Packages

For Classwork 2, please load the tidyverse package:

library(tidyverse)

Question 1. Ice Cream Sales and Shark Attacks

Consider the data frame df, which records monthly ice cream sales and shark attacks.

df <- read_csv("http://bcdanl.github.io/data/icecream-shark-df.csv")

Part A

  • πŸ€– Task 1: Fill in the blanks in the provided ggplot() code chunk.
  • πŸ’¬ Task 2: Add a brief comment describing the relationship between ice cream sales (IceCreamSales) and shark attacks (SharkAttacks).

ggplot(data = __BLANK_1__,
       mapping = aes(x = __BLANK_2__,
                     y = __BLANK_3__)) +
  geom___BLANK_4__() +
  geom___BLANK_5__()
ggplot(data = df,
       mapping = aes(x = IceCreamSales,
                     y = SharkAttacks)) +
  geom_point() +
  geom_smooth() +
  scale_y_continuous(breaks = seq(2,12,2))
  • The scatterplot along with the fitted line shows a positive linear relationship between ice cream sales and drowning incidents. This trend is highlighted by the fitted line in the plot.

  • As ice cream sales increase, the number of drowning incidents also tends to increase.


Part B

Icecream and Shark

Icecream and Shark

  • ❓ Is the observed relationship one of correlation or causation? Explain your reasoning.
    • Consider the following monthly trends for IceCreamSales and SharkAttacks:

Monthly Trend of IceCreamSales


Monthly Trend of SharkAttacks

ggplot(data = df,
       mapping = aes(x = Month,
                     y = IceCreamSales)) +
  geom_point() +
  geom_line() +   # a geometric object for a line chart
  scale_x_continuous(breaks = seq(1,12,1)) +
  scale_y_continuous(breaks = seq(2, 7, 1)*10^3) 
  • The observed relationship is correlation, not causation. While the data shows that higher ice cream sales are associated with more drowning incidents, this does not imply that buying more ice cream causes more drownings.

  • This correlation could be due to a confounding factor, such as warmer weather, which increases both ice cream consumption and water-related activities, leading to more drowning incidents.



Question 2. NBC Show Data

The nbc_show dataset comes from NBC’s TV pilots, containing information about television shows, their viewership metrics, and audience engagement.

nbc_show <- read_csv("https://bcdanl.github.io/data/nbc_show.csv")
  • Gross Ratings Points (GRP):
    Measures the total viewership of a show β€” an indicator of its broadcast marketability.
    • πŸ“Ί A higher GRP suggests broader exposure and a more marketable program.
  • Projected Engagement (PE):
    Captures how attentive and engaged viewers were after watching a show β€” a more suitable measure of audience engagement.
    • 🧠 After viewing, audiences take a short quiz testing order and detail recall.
    • This reflects their level of attention and retention (for both the show and its ads).
    • High PE values indicate strong viewer engagement.


Tasks

Since GRP reflects how many people watch a show and PE reflects how engaged or attentive those viewers are, it’s reasonable to expect some connection between the two β€” shows that reach more people may also have higher engagement, although not always.

Our goal is to see whether greater viewership tends to coincide with stronger engagement (and how this varies by genre in Classwork 12).

  • πŸ€– Task 1: Fill in the blanks in the provided ggplot() code chunks.
  • πŸ’¬ Task 2: Add a brief comment describing the relationship between GRP and PE.

(1) Scatterplot with a Non-Linear Fitted Line

ggplot(data = __BLANK_1__,
       mapping = aes(x = __BLANK_2__,
                     y = __BLANK_3__)) +
  geom_point() +
  geom___BLANK_4__()
ggplot(data = nbc_show,
       mapping = aes(x = GRP,
                     y = PE)) +
  geom_point() +
  geom_smooth()


(2) Scatterplot with a Linear Fitted Line

ggplot(data = __BLANK_1__,
       mapping = aes(x = __BLANK_2__,
                     y = __BLANK_3__)) +
  geom_point() +
  geom___BLANK_4__(method = __BLANK_5__)
ggplot(data = nbc_show,
       mapping = aes(x = GRP,
                     y = PE)) +
  geom_point() +
  geom_smooth(method = "lm")


Question 3. GDP per capita vs. Life Expectancy

For Question 3, please install the R package gapminder before starting:

install.packages("gapminder")
library(gapminder)
??gapminder

The gapminder package provides a built-in dataset named gapminder, which contains country-level data on life expectancy, GDP per capita, and population across time.

Let’s assign it to a new object called df_gapminder:

df_gapminder <- gapminder::gapminder

Tasks

  • πŸ€– Task 1: Fill in the blanks in the provided ggplot() code chunks.
  • πŸ’¬ Task 2: Add a brief comment describing the relationship between GDP per capita (gdpPercap) and life expectancy (lifeExp).

(1) gdpPercap vs. lifeExp

ggplot(data = __BLANK_1__,
       mapping = aes(x = __BLANK_2__,
                     y = __BLANK_3__)) +
  geom_point(__BLANK_4__ = .1) + # Add transparency to reduce overplotting
  geom_smooth(__BLANK_5__ = "darkorange") +
  geom_smooth(__BLANK_6__)
ggplot(data = df_gapminder,
       mapping = aes(x = gdpPercap,
                     y = lifeExp)) +
  geom_point(alpha = .1) + # Add transparency to reduce overplotting
  geom_smooth(color = "darkorange") +
  geom_smooth(method = "lm")


(2) log(gdpPercap) vs. lifeExp

ggplot(data = __BLANK_1__,
       mapping = aes(x = __BLANK_2__,
                     y = __BLANK_3__)) +
  geom_point(__BLANK_4__ = .2) + # Add transparency to reduce overplotting
  geom_smooth(__BLANK_5__ = "darkorange") +
  geom_smooth(__BLANK_6__)
ggplot(data = df_gapminder,
       mapping = aes(x = log(gdpPercap),
                     y = lifeExp)) +
  geom_point(alpha = .2) + # Add transparency to reduce overplotting
  geom_smooth(color = "darkorange") +
  geom_smooth(method = "lm")
  • Log transformation reduces visual clutterβ€”a highly dense cluster of points has now disappeared.
    • Additionally, the linear model now fits well into the data.


Discussion

Welcome to our Classwork 2 Discussion Board! πŸ‘‹

This space is designed for you to engage with your classmates about the material covered in Classwork 2.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 2 materials or need clarification on any points, don’t hesitate to ask here.

All comments will be stored here.

Let’s collaborate and learn from each other!

Back to top