Data Visualization

Author

YOUR NAME

Published

January 25, 2026

1 Getting Started

This document introduces the ggplot2 workflow for building clear and professional visualizations in R.

You will learn how to:

  • create scatterplots, line plots, and bar charts
  • map variables to aesthetics (aes())
  • control color, size, transparency, and labels
  • use facet_wrap() to compare groups
  • polish plots with themes and annotations


2 Setup

2.1 Install (one-time)

Code
# install.packages("tidyverse")

2.2 Load packages

Code
library(tidyverse)


3 The ggplot2 Grammar of Graphics

Most ggplot charts follow this structure:

Code
ggplot(data = DATA, aes(x = X_VAR, y = Y_VAR)) +
  geom_...( ) +
  labs(title = "...", x = "...", y = "...") +
  theme_minimal()

Key pieces:

  • Data: a data frame
  • Aesthetics (aes): how variables map to visual elements
  • Geoms: the type of plot (points, lines, bars, etc.)
  • Labels: title, subtitle, axes, caption
  • Theme: overall style


4 Example Dataset

We will use the built-in dataset mpg (fuel economy data).

Code
mpg |> glimpse()
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "…
$ model        <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "…
$ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.…
$ year         <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200…
$ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, …
$ trans        <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto…
$ drv          <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4…
$ cty          <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1…
$ hwy          <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2…
$ fl           <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p…
$ class        <chr> "compact", "compact", "compact", "compact", "compact", "c…


5 Scatterplots

5.1 Basic scatterplot

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point()

5.2 Add transparency

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.5)

5.3 Map color to a categorical variable

Code
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
  geom_point(alpha = 0.7)

5.4 Map size to a numeric variable

Code
ggplot(mpg, aes(x = displ, y = hwy, size = cyl)) +
  geom_point(alpha = 0.6)

βœ… Rule of thumb:
- color = ... inside aes() means it changes by data values
- color = "blue" outside aes() means fixed color


6 Trend Lines

6.1 Add a smooth trend

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.6) +
  geom_smooth()

6.2 Linear regression line

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE)


7 Faceting (Small Multiples)

Facets create a grid of plots split by a group variable.

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.6) +
  facet_wrap(~ class)


8 Bar Charts

Bar charts are usually used for categorical variables.

8.1 Count bar chart

Code
ggplot(mpg, aes(x = class)) +
  geom_bar()

8.2 Flip coordinates (nice for long labels)

Code
ggplot(mpg, aes(x = class)) +
  geom_bar() +
  coord_flip()

8.3 Bar chart with fill color

Code
ggplot(mpg, aes(x = class, fill = drv)) +
  geom_bar()


9 Summarized Bar Charts

Sometimes you want bars representing summary statistics (means, totals, etc.).

9.1 Step 1: Summarize with dplyr

Code
mpg_mean <- mpg |>
  group_by(class) |>
  summarise(mean_hwy = mean(hwy), .groups = "drop")

mpg_mean
# A tibble: 7 Γ— 2
  class      mean_hwy
  <chr>         <dbl>
1 2seater        24.8
2 compact        28.3
3 midsize        27.3
4 minivan        22.4
5 pickup         16.9
6 subcompact     28.1
7 suv            18.1

9.2 Step 2: Plot the summarized data

Code
ggplot(mpg_mean, aes(x = class, y = mean_hwy)) +
  geom_col()

βœ… geom_bar() counts rows automatically.
βœ… geom_col() uses your own y-values.


10 Line Charts

Line charts require an x-variable that has a meaningful order (often time).

We will build a small example dataset.

Code
sales <- tibble(
  month = 1:12,
  revenue = c(10, 12, 11, 13, 15, 18, 17, 20, 22, 23, 25, 28)
)

sales
# A tibble: 12 Γ— 2
   month revenue
   <int>   <dbl>
 1     1      10
 2     2      12
 3     3      11
 4     4      13
 5     5      15
 6     6      18
 7     7      17
 8     8      20
 9     9      22
10    10      23
11    11      25
12    12      28

10.1 Basic line plot

Code
ggplot(sales, aes(x = month, y = revenue)) +
  geom_line()

10.2 Line + points

Code
ggplot(sales, aes(x = month, y = revenue)) +
  geom_line() +
  geom_point()


11 Labels and Titles

Use labs() to make your charts understandable.

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.6) +
  labs(
    title = "Engine Size vs Highway MPG",
    subtitle = "Bigger engines usually get lower fuel efficiency",
    x = "Engine displacement (liters)",
    y = "Highway MPG",
    caption = "Source: ggplot2::mpg"
  )


12 Themes

Themes change the overall style.

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.6) +
  theme_minimal()

Try a few common ones:

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.6) +
  theme_classic()

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.6) +
  theme_light()


13 Changing Colors (Manually)

If you want a custom palette, you can set:

Code
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
  geom_point(alpha = 0.7) +
  scale_color_brewer(palette = "Set2")

Or use a fixed color:

Code
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.7, color = "steelblue")


14 Saving Plots

Use ggsave() to export a plot.

Code
p <- ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(alpha = 0.6)

ggsave("my_scatterplot.png", plot = p, width = 7, height = 5, dpi = 300)


15 Practice Problems βœ…

  1. Create a scatterplot of cty vs hwy.
  2. Color points by drv and set alpha = 0.6.
  3. Add a linear trend line (method = "lm", se = FALSE).
  4. Use facet_wrap(~ drv) to compare groups.
  5. Create a bar chart of manufacturer. Flip the axis using coord_flip().
  6. Create a summarized bar chart of mean hwy by manufacturer (top 10 only).


16 Challenge πŸ’‘ (Top 10 Manufacturers)

Code
top10 <- mpg |>
  count(manufacturer, sort = TRUE) |>
  slice_head(n = 10)

top10
# A tibble: 10 Γ— 2
   manufacturer     n
   <chr>        <int>
 1 dodge           37
 2 toyota          34
 3 volkswagen      27
 4 ford            25
 5 chevrolet       19
 6 audi            18
 7 hyundai         14
 8 subaru          14
 9 nissan          13
10 honda            9

Now:

  • filter mpg to only these 10 manufacturers
  • compute mean hwy by manufacturer
  • make a geom_col() plot
  • flip the coordinates and add titles