Relationship Plots

Classwork 9

Author

Byeong-Hak Choe

Published

October 28, 2024

Modified

November 17, 2024

Ice Cream Sales and Drowning Incidents

Consider the data.frame, df, the dataset recording monthly ice cream sales and drowning incidents.

df <- read_csv("http://bcdanl.github.io/data/icecream-drowning.csv")

Q1a

  • Provide both ggplot() and comment to describe the relationship between ice cream sales (IceCreamSales) and drowning incidents (DrowningIncidents).

Answer:

ggplot(data = df,
       mapping = aes(x = IceCreamSales,
                     y = DrowningIncidents)) +
  geom_point() +
  geom_smooth()

  • The scatterplot along with the fitted line shows a positive linear relationship between ice cream sales and drowning incidents. This trend is highlighted by the fitted line in the plot.

  • As ice cream sales increase, the number of drowning incidents also tends to increase.


Q1b

  • Is the relationship correlation or causation? Why?

Answer:

  • The observed relationship is correlation, not causation. While the data shows that higher ice cream sales are associated with more drowning incidents, this does not imply that buying more ice cream causes more drownings.

  • This correlation could be due to a confounding factor, such as warmer weather, which increases both ice cream consumption and water-related activities, leading to more drowning incidents.

ggplot(data = df,
       mapping = aes(x = Month,
                     y = IceCreamSales)) +
  geom_point() +
  geom_line()

ggplot(data = df,
       mapping = aes(x = Month,
                     y = DrowningIncidents)) +
  geom_point() +
  geom_line()



GDP per capita and Life Expectancy

For this Section, please install the R package, gapminder:

install.packages("gapminder")
??gapminder

The gapminder package provides the data.frame object, gapminder. Let’s assign this to df_gapminder:

df_gapminder <- gapminder::gapminder

Q2a

  • Provide both ggplot() and comment to describe the relationship between GDP per capita (gdpPercap) and life expectancy (lifeExp).

Answer:

ggplot(data = df_gapminder,
       mapping = aes(x = gdpPercap,
                     y = lifeExp)) +
  geom_point(alpha = .1) + # Add transparency to reduce overplotting
  geom_smooth(color = "darkorange") +
  geom_smooth(method = "lm")

  • There is a positive association between GDP per capita and life expectancy,
    • But the relationship may not be linear. Life expectancy increases with GDP per capita at a decreasing rate.
    • Additionally, developing countries appear to cluster near relatively lower life expectancy values.
ggplot(data = df_gapminder,
       mapping = aes(x = log(gdpPercap),
                     y = lifeExp)) +
  geom_point(alpha = .2) + # Add transparency to reduce overplotting
  geom_smooth(color = "darkorange") +
  geom_smooth(method = "lm")

  • Log transformation reduces visual clutter—a highly dense cluster of points has now disappeared.
    • Additionally, the linear model now fits well into the data.


Q2b

  • Provide both ggplot() and comment to describe how the relationship between GDP per capita (gdpPercap) and life expectancy (lifeExp) varies by continent.

Answer:

ggplot(data = df_gapminder,
       mapping = aes(x = log(gdpPercap),
                     y = lifeExp,
                     color = continent)) +  # different colors are used to distinguish continents
  geom_point(alpha = .5)  # Add transparency to reduce overplotting

  • While transparency (alpha) in the scatterplot partially reduces overplotting, it does not fully address the issue, especially in dense regions.
  • This is because, in general, the mixing of overlapping transparent colors may be no longer represent the colors of the categories.
  • Adding fitted lines clarifies the differences in relationships across continents.
ggplot(data = df_gapminder,
       mapping = aes(x = log(gdpPercap),
                     y = lifeExp,
                     color = continent)) +  # different colors are used to distinguish continents
  geom_point(alpha = .5) + # Add transparency to reduce overplotting
  geom_smooth(method = "lm")

  • The different slopes of the fitted lines across continents imply that the relationship between GDP per capita and life expectancy differs by continent.
    • Continents like the Americas and Oceania display steeper slopes, indicating a stronger positive association between GDP per capita and life expectancy.
    • This suggests that for the same percentage increase in GDP per capita, the improvement in life expectancy is greater in these regions compared to others.
ggplot(data = df_gapminder,
       mapping = aes(x = log(gdpPercap),
                     y = lifeExp,
                     color = continent)) +
  geom_point(alpha = .3) +
  geom_smooth(method = "lm") +
  facet_wrap(~continent)

  • The faceted view significantly reduces overplotting and provides a more detailed look at regional differences.
    • However, using color only can make it easier to compare the slope of the fitted lines across continents.


Back to top