Lecture 10

Data Visualization - Aesthetic Mappings and Facets

Byeong-Hak Choe

SUNY Geneseo

February 29, 2024

Learning Objectives

  • Aesthetic Mappings
  • Facets

R Basics

RStudio Workflow

Must-know Quarto Shortcuts

Mac

  • option + command + I: to create a R chunk
  • option + - : the shortcut for <-
  • command + return runs a current line or selected lines
  • command + shift + return: to run the code in the R chunk
  • command + shift + C: to comment out a line
  • command + shift + K: to render a current Quarto file

Windows

  • Alt+Ctrl+I : to create a R chunk
  • Alt + - : the shortcut for <-
  • Ctrl + Enter runs a current line or selected lines
  • Ctrl + Shift + Enter : to run the code in the R chunk
  • Ctrl + Shift + C: to comment out a line
  • Ctrl + Shift + K: to render a current Quarto file

Documenting Workflow

Must-know Shortcuts

  • Use the shortcuts below whenever you edit a document, including Quarto:

Mac

  • command + ⬆️/⬇️/⬅️/➡️

  • shift + ⬆️/⬇️/⬅️/➡️

  • command + shift + ⬆️/⬇️/⬅️/➡️:

  • command + PgUp/PgDn

  • shift + PgUp/PgDn

  • command + shift + PgUp/PgDn:

Windows

  • Ctrl + ⬆️/⬇️/⬅️/➡️

  • Shift + ⬆️/⬇️/⬅️/➡️

  • Ctrl + shift + ⬆️/⬇️/⬅️/➡️:

  • Ctrl + PgUp/PgDn

  • Shift + PgUp/PgDn

  • Ctrl + Shift + PgUp/PgDn:

Data Visualization - First Steps

Graphing Template

ggplot(data = <DATA.FRAME>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
  • To make a ggplot plot, replace the bracketed sections in the code above with a data.frame, a geom function, or a collection of mappings such as x = VAR_1 and y = VAR_2.

ggplot workflow

Common problems in ggplot()

ggplot(data = mpg) 
 + geom_point( mapping = 
                 aes(x = displ, 
                     y = hwy) )
  • One common problem when creating ggplot2 graphics is to put the + in the wrong place.

Aesthetic Mappings

Aesthetic Mappings

  • In the plot above, one group of points (highlighted in red) seems to fall outside of the linear trend.

    • How can we explain these cars? Are those hybrids?

Aesthetic Mappings

  • An aesthetic is a visual property (e.g., size, shape, color) of the objects (e.g., class) in our plot.

  • We can display a point in different ways by changing the values of its aesthetic properties.

Aesthetic Mappings

Adding a color to the plot

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy, 
                   color = class) )

Aesthetic Mappings

Adding a shape to the plot

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy, 
                   shape = class) )

Aesthetic Mappings

Adding a size to the plot

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy, 
                   size = class) )

Aesthetic Mappings

Adding an alpha (transparency) to the plot

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy, 
                   alpha = class) )

Aesthetic Mappings

Adding an alpha (transparency) to the plot

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy) )

  • How many observations are in the mpg data.frame?
  • How many dots are in the scatterplot above?

Aesthetic Mappings

Overplotting problem

  • Many points overlap each other.

    • This problem is known as overplotting.
  • When points overlap, it’s hard to know how many data points are at a particular location.

  • Overplotting can obscure patterns and outliers, leading to potentially misleading conclusions.

  • We can set a transparency level (alpha) between 0 (full transparency) and 1 (no transparency).

Aesthetic Mappings

Overplotting and alpha

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy),
             alpha = .2)

Aesthetic Mappings

Specifying a color to the plot

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy), 
             color = "blue")

Aesthetic Mappings

  • To set an aesthetic manually, set the aesthetic by name as an argument of our geom_ function; i.e. it goes outside of aes().
    • We will need to pick a level that makes sense for that aesthetic:
      • The name of a color as a character string.
      • The size of a point in mm.
      • The shape of a point as a number, as shown above.

Aesthetic Mappings

Specifying a color to the plot?

ggplot(data = mpg) + 
  geom_point( mapping = 
                aes(x = displ, 
                    y = hwy, 
                    color = "blue") )

Aesthetic Mappings

Categorical vs. Continuous Variables

  • A categorical variable is a variable whose value is obtained by counting.
    • Students’ letter grade (A/A-/B+/…)
    • Student Classification (Freshmen/Sophomore/Junior/Senior)
    • Beer brands
    • US state/county
  • Categorical variables should be factor or character.
    • We can use as.factor(variable) to make a variable factor.
  • A continuous variable is a variable whose value is obtained by measuring and can have a decimal or fractional value.
    • Height/weight of students
    • Time it takes to get to school
    • Fuel efficiency of a vehicle (e.g., miles per gallon)
    • Atmospheric carbon dioxide (CO2) concentrations
  • Continuous variables should be numeric.
    • We can use as.numeric(variable) to make a variable numeric.
  • For data visualization, integer-type variables could be treated as either categorical or continuous, depending on the context of analysis.

  • If the values of an integer-type variable means an intensity or an order, the integer variable could be continuous.

    • A variable of age integers (18, 19, 20, 21, …) could be continuous.
    • A variable of integer-valued MPG (27, 28, 29, 30, …) could be continuous.
  • If not, the integer variable is categorical.

    • A variable of month integers (1, ,2, …, 12) could be categorical.

Data Visualization

Aesthetic Mappings

Facets

Facets

  • One way to add a variable, particularly useful for categorical variables, is to use facets to split our plot into facets, subplots that each display one subset of the data.

Facets

facet_wrap( VAR ~ . )

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy), 
             alpha = .5) + 
  facet_wrap( class ~ . , nrow = 2)

  • To facet our plot by a single variable, use facet_wrap().

Facets

facet_grid( VAR_ROW ~ VAR_COL )

  • To facet our plot on the combination of two variables, add facet_grid( VAR_ROW ~ VAR_COL ) to our plot call.

  • The first argument of facet_grid() is also a formula.

    • This time the formula should contain two variable names separated by a ~.

Facets

facet_grid( VAR_ROW ~ VAR_COL )

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy),
             alpha = .5) + 
  facet_grid(drv ~ cyl)

Facets

scales in Facetting

  • Option scales in facet_*() is whether scales is
    • fixed ("fixed", the default),
    • free in one dimension ("free_x", "free_y"), or
    • free in two dimensions ("free").

Facets

scales in Facetting

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy),
             alpha = .5) + 
  facet_grid(drv ~ cyl, 
             scales = "free_x")

Data Visualization

Facets

Geometric Objects

Geometric Objects

How are these two plots similar?

Geometric Objects

  • A geom_*() is the geometrical object that a plot uses to represent data.
    • Bar charts use geom_bar() or geom_col();
    • Histograms use geom_histogram() or geom_freqpoly();
    • Line charts use geom_line();
    • Boxplots use the geom_boxplot();
    • Scatterplots use the geom_point();
    • Fitted lines use the geom_smooth();
    • and many more!
  • We can use different geom_*() to plot the same data.

Geometric Objects

Scatterplot

ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy),
             alpha = .3)

Geometric Objects

Fitted lines

ggplot(data = mpg) + 
  geom_smooth(mapping = 
                aes(x = displ, 
                    y = hwy))

Geometric Objects

  • Every geom function in ggplot2 takes a mapping argument.

  • However, not every aesthetic works with every geom.

    • We could set the shape of a point, but we could not set the shape of a line;
    • We could set the linetype of a line.

Geometric Objects

geom_smooth()

ggplot( data = mpg ) + 
  geom_smooth( mapping = 
                 aes( x = displ, 
                      y = hwy, 
                      linetype = drv) )

Geometric Objects

geom_smooth(method = lm)

  • Setting method = lm manually in geom_smooth() gives a straight line that fits into data points.
ggplot( data = mpg ) + 
  geom_smooth( mapping = 
                 aes( x = displ, 
                      y = hwy),
               method = lm)

Geometric Objects

geom_smooth(group = CATEGORICAL_VAR)

  • We can set the group aesthetic to a categorical variable to draw multiple objects.
    • ggplot2 will draw a separate object for each unique value of the grouping variable.

Geometric Objects

geom_smooth(group = CATEGORICAL_VAR)

ggplot(data = mpg) +
  geom_smooth(mapping = 
                aes(x = displ, 
                    y = hwy))

Geometric Objects

geom_smooth(group = CATEGORICAL_VAR)

ggplot(data = mpg) +
  geom_smooth(mapping = 
                aes(x = displ, 
                    y = hwy, 
                    group = drv))

Geometric Objects

  • In practice, ggplot2 will automatically group the data for these geoms whenever we map an aesthetic to a categorical variable (as in the linetype example).
ggplot(data = mpg) +
  geom_smooth(
    mapping = aes(x = displ, 
                  y = hwy, 
                  color = drv),
    show.legend = FALSE
  )

Geometric Objects

  • To display multiple geometric objects in the same plot, add multiple geom_*() functions to ggplot():
ggplot(data = mpg) + 
  geom_point(mapping = 
               aes(x = displ, 
                   y = hwy),
             alpha = .3) +
  geom_smooth(mapping = 
                aes(x = displ, 
                    y = hwy)) +
  geom_smooth(mapping = 
                aes(x = displ, 
                    y = hwy), 
              method = lm, 
              color = 'red')

Geometric Objects

Layers

  • Using geom_point(), geom_smooth(), and geom_smooth(method = lm) together is an excellent option to visualize the relationship between the two variables.

Geometric Objects

Layers

  • If we place mappings in a geom function, ggplot2 will treat them as local mappings for the layer.
ggplot(data = mpg, 
       mapping = 
         aes(x = displ, 
             y = hwy)) + 
  geom_point(mapping = 
               aes(color = class),
             alpha = .3) + 
  geom_smooth()

Geometric Objects

Multiple data.frames

df_subcompact <- filter(mpg, class == "subcompact")
  • We can use the same idea to specify different data for each layer.

  • Here, our smooth line displays just a subset of the mpg data.frame, the subcompact cars.

    • filter() is the tidyverse-way to filter observations in a data.frame.
  • The local data argument in geom_smooth() overrides the global data argument in ggplot() for that layer only.

Geometric Objects

Multiple data.frames

ggplot(data = mpg, 
       mapping = 
         aes(x = displ, 
             y = hwy)) + 
  geom_point(mapping = 
               aes(color = class), 
             alpha = .3) + 
  geom_smooth(data = df_subcompact, 
              se = FALSE)

  • The standard error (se) tells us how much the predicted values from a model might differ from the actual values we’re trying to predict.

Data Visualization

Geometric Objects