Lecture 27

Histogram ggplot(); Boxplot ggplot()

Byeong-Hak Choe

SUNY Geneseo

November 8, 2024

Distribution ggplot() - Histogram

Histogram with geom_histogram()

  • Histograms are used to visualize the distribution of a numeric variable.

  • Histograms divide data into bins and count the number of observations in each bin.

Histogram with geom_histogram()

Titanic Dataset

Histogram with geom_histogram()

titanic <- 
  read_csv(
    "https://bcdanl.github.io/data/titanic_cleaned.csv")

ggplot(data = titanic,
       mapping = 
         aes(x = age)) + 
  geom_histogram()

  • geom_histogram() creates a histogram.
    • We map the x aesthetic to the variable.

Histogram with geom_histogram() with bins

ggplot(data = titanic,
       mapping = 
         aes(x = age)) + 
  geom_histogram(bins = 5)

  • bins: Specifies the number of bins
  • The shape of a histogram can be sensitive to the number of bins!

Histogram with geom_histogram() with binwidth

ggplot(data = titanic,
       mapping = 
         aes(x = age)) + 
  geom_histogram(binwidth = 1)

  • binwidth: Specifies the width of each bin
  • We choose either the bins option or the binwidth option.

Histogram with geom_histogram()

Customizing the Aesthetics

ggplot(data = titanic,
       mapping = 
         aes(x = age)) + 
  geom_histogram(
    binwidth = 2,
    fill = 'lightblue',
    color = 'black')

  • fill: Fills the bars with a specific color.
  • color: Adds an outline of a specific color to the bars.

Distribution ggplot() - Boxplot

Boxplot with geom_boxplot()

  • Boxplots can be used to visualize how the distribution of a numeric variable varies by a categorical variable.
  • Boxplots display the median, quartiles, and potential outliers in the data.

Boxplot with geom_boxplot()

ggplot(data = mpg,
       mapping = 
         aes(x = class,
             y = hwy)) + 
  geom_boxplot() 

  • geom_boxplot() creates a boxplot;
    • Mappings: one numeric variable and one categorical variable to the x and y aesthetics

Boxplot with geom_boxplot()

Horizontal Boxplots

ggplot(data = mpg,
       mapping = 
         aes(x = hwy,
             y = class)) + 
  geom_boxplot() 

  • Boxplots can be horizontal or vertical.
    • A horizontal boxplot is a good option for long category names.

Boxplot with geom_boxplot()

Customizing the Aesthetics

# 1. `show.legend = FALSE` turns off 
#     the legend information
# 2. `scale_fill_colorblind()` or
#    `scale_fill_tableau()`
#     applies a color-blind friendly 
#     palette to the `fill` aesthetic
# To use the scale_fill_tableau():
library(ggthemes) 
ggplot(data = mpg,
       mapping = 
         aes(x = hwy,
             y = class,
             fill = class)) + 
  geom_boxplot(
    show.legend = FALSE) +
  scale_fill_tableau() 

  • fill: Maps a variable to the fill color of the boxes.
  • scale_fill_tableau(): A color-blind friendly palette to the fill aesthetic.

Boxplot with geom_boxplot()

Sorted Boxplot with fct_reorder(CATEGORICAL, NUMERICAL)

# labs() can label
#   x-axis, y-axis, and more

ggplot(data = mpg,
       mapping = 
        aes(x = hwy,
            y = 
             fct_reorder(class, hwy),
            fill = class)) + 
  geom_boxplot(
    show.legend = FALSE) +
  scale_fill_tableau() +
  labs(x = "Highway MPG",
       y = "Class") 

  • fct_reorder(CATEGORICAL, NUMERICAL): Reorders the categories of the CATEGORICAL by the median of the NUMERICAL.