Lecture 9

Draw Maps

Byeong-Hak Choe

SUNY Geneseo

March 24, 2025

Draw maps

Draw maps

Maps with geospatial data

Data with map drawing, also known as geospatial data, is important for several reasons:

  • Analysis: Geospatial data can highlight patterns and relationships across spatial units in the data that may not be apparent in other forms of data.

  • Communication: Maps can be visually striking, especially when the spatial units of the map are familiar entities, like counties in the US.

Draw maps

Map U.S. state-level data

  • The socviz::election dataset has various measures of the vote and vote shares by state.
socviz::election |> select(state, total_vote,
                    r_points, pct_trump, party, census) |>
    sample_n(5)
  • We don’t have to represent spatial data spatially.

Draw maps

Map U.S. state-level data

party_colors <- c("#2E74C0", "#CB454A")  # Hex color codes for Dem Blue and Rep Red
p0 <- ggplot(data = filter(election, st != "DC"),
             mapping = aes(x = r_points,
                           y = reorder(state, r_points),
                           color = party))
p1 <- p0 + geom_vline(xintercept = 0, color = "gray30") +
    geom_point(size = 2)

p1
p2 <- p1 + scale_color_manual(values = party_colors)

p2
p3 <- p2 + scale_x_continuous(breaks = c(-30, -20, -10, 0, 10, 20, 30, 40),
                              labels = c("30\n (Clinton)", "20", "10", "0",
                                         "10", "20", "30", "40\n(Trump)"))
p3
p3 + facet_wrap(~ census, ncol=1, scales="free_y") +
     guides(color = "none") + labs(x = "Point Margin", y = "") +
     theme(axis.text=element_text(size=8))
# install.package("ggforce")
library(ggforce)

p3 + facet_col(~ census, scales="free_y", space = "free") +
     guides(color = "none") + labs(x = "Point Margin", y = "") +
     theme(axis.text=element_text(size=6),
           strip.text=element_text(size=rel(.6)))

Draw maps

Map U.S. state-level data

  • Let us get a data frame of the US state map.
us_states <- map_data("state") # from the 'maps' package
us_states
view(us_states)

Draw maps

Map U.S. state-level data

  • geom_polygon() can be used to visualize map data.
p <- ggplot(data = us_states,
            mapping = aes(x = long, y = lat,
                          group = group))

p + geom_polygon(fill = "white", color = "black")
  • A map is a set of lines drawn in the right order on a grid.

Draw maps

Map U.S. state-level data

  • Let’s fill the map.
p <- ggplot(data = us_states,
            aes(x = long, y = lat,
                group = group, fill = region))

p + geom_polygon(color = "gray90", size = 0.1) + guides(fill = FALSE)

Draw maps

Map U.S. state-level data

  • Let’s deal with the projection.
    • By default, the map is plotted using the venerable Mercator projection.
p <- ggplot(data = us_states,
            mapping = aes(x = long, y = lat,
                          group = group, fill = region))

p + geom_polygon(color = "gray90", size = 0.1) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
    guides(fill = FALSE)
  • We can transform the default projection used by geom_polygon(), via the coord_map() function.

    • The Albers projection requires two latitude parameters, lat0 and lat1.

Draw maps

Map U.S. state-level data

  • Let’s get the election data on to the map
election$region <- tolower(election$state)
us_states_elec <- left_join(us_states, election)
  • In the map data, us_states, the state names (in a variable named region) are in lower case.

  • Here we can create a variable in the election data frame to correspond to this, using the tolower() function to convert the state names.

  • It is important to know your data and variables well enough to check that they have merged properly.

    • FIPS code is useful in joining the data.

Draw maps

Map U.S. state-level data

p0 <- ggplot(data = us_states_elec,
            aes(x = long, y = lat,
                group = group, fill = party))

p0 + geom_polygon(color = "gray90", size = 0.1) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45) 
  • We use our party colors for the fill.

Draw maps

Map U.S. state-level data

p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45) 

p2 <- p1 + scale_fill_manual(values = party_colors) +
    labs(title = "Election Results 2016", fill = NULL)

p2 + theme_map() 

Draw maps

Map U.S. state-level data

  • To the fill aethetic, let’s try a continuous measure, such as the percentage of the vote received by Donald Trump (pct_trump).
p0 <- ggplot(data = us_states_elec,
             mapping = aes(x = long, y = lat, group = group, fill = pct_trump))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45) 

p1 + labs(title = "Trump vote") + theme_map() + labs(fill = "Percent")

Draw maps

Map U.S. state-level data

  • Blue is not the color we want here.

  • The color gradient runs in the wrong direction.

  • Let’s fix these problems using scale_fill_gradient():

p2 <- p1 + scale_fill_gradient(low = "white", high = "#CB454A") +
        labs(title = "Trump vote") 
p2 + theme_map() + labs(fill = "Percent")

Draw maps

Map U.S. state-level data

  • For election results, we might prefer a gradient that diverges from a midpoint.
    • The scale_*_gradient2() function gives us a blue-red spectrum that passes through white by default.
    • We can also re-specify the mid-level color along with the high and low colors.
p0 <- ggplot(data = us_states_elec,
             mapping = aes(x = long, y = lat, group = group, fill = d_points))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45) 

p2 <- p1 + scale_fill_gradient2() + labs(title = "Winning margins") 
p2 + theme_map() + labs(fill = "Percent")

Draw maps

Map U.S. state-level data

  • From the scale_*_gradient2() function, we can also re-specify the mid-level color along with the high and low colors.
p3 <- p1 + scale_fill_gradient2(low = "red", 
                                mid = scales::muted("purple"),
                                high = "blue", 
                                breaks = c(-25, 0, 25, 50, 75)) 
p3 + theme_map() + labs(fill = "Percent", title = "Winning margins")

Draw maps

Map U.S. state-level data

  • If you take a look at the gradient scale for this first “purple America” map, p3, you’ll see that it extends very high on the Blue side.
    • This is because Washington DC is included in the data.
  • If we omit Washington DC, we’ll see that our color scale shifts.
p0 <- ggplot(data = filter(us_states_elec,
                           region != "district of columbia"),
             aes(x = long, y = lat, group = group, fill = d_points))

p3 <- p1 + scale_fill_gradient2(low = "red", 
                                mid = scales::muted("purple"),
                                high = "blue", 
                                breaks = c(-25, 0, 25, 50, 75)) 
p3 + theme_map() + labs(fill = "Percent", title = "Winning margins", caption = "DC is omitted.")

Choropleth Maps

Choropleth Maps

America’s ur-choropleths

  • Choropleth maps display divided geographical areas or regions that are colored, shaded or patterned in relation to a data variable.

  • County-level US choropleth maps can be aesthetically pleasing, because of the added detail they bring to a national map.

  • The county-level datasets (county_map and county_data) are included in the socviz library.

  • The county map data frame, county_map, has been processed a little in order to transform it to an Albers projection, and also to relocate (and re-scale) Alaska and Hawaii.

Choropleth Maps

America’s ur-choropleths

county_map

county_data |>
  select(id, name, state, pop_dens, pct_black) |>
  sample_n(5)

county_full <- 
  left_join(county_map, county_data, by = "id")
  • The id field is the FIPS code for the county.

  • pop_dens is population density.

  • pct_black is percent of African-American population.

  • We merge the data frames using the shared FIPS id column.

Choropleth Maps

Map U.S. county-level data

p <- ggplot(data = county_full,
            mapping = aes(x = long, y = lat,
                          fill = pop_dens, 
                          group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05)
p1
  • p1 object produces a legible map, but by default it chooses an unordered categorical layout.
  • This is because the pop_dens variable is not ordered.
  • pop_dens is an un-ordered discrete variable.
p1 + coord_equal()
  • The use of coord_equal() makes sure that the relative scale of our map does not change even if we alter the overall dimensions of the plot.

Choropleth Maps

Map U.S. county-level data

p2 <- p1 + scale_fill_brewer(
  palette = "Blues",
  labels = c("0-10", "10-50", "50-100", "100-500",
             "500-1,000", "1,000-5,000", ">5,000"))
p2
  • We can manually supply the right sort of scale using the scale_fill_brewer() function, together with a nicer set of labels.
p2 + labs(fill = "Population per\nsquare mile") +
  theme_map() +
  guides(fill = guide_legend(nrow = 1)) + 
  theme(legend.position = "bottom")
  • We can also use the guides() function to make sure each element of the key in the legend appears on the same row.

Choropleth Maps

Map U.S. county-level data

  • We can now do exactly the same thing for our map of percent African-American population by county.
  • pct_black is an un-ordered factor variable.
  • table(county_full$pct_black)
p <- ggplot(data = county_full,
            mapping = aes(x = long, y = lat, fill = pct_black, 
                          group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette="Greens")

p2 + labs(fill = "US Population, Percent Black") +
  guides(fill = guide_legend(nrow = 1)) + 
  theme_map() + theme(legend.position = "bottom")

Choropleth Maps

Map U.S. county-level data

  • Let’s draw a new county-level choropleths.

  • We have a pop_dens6 variable that divides the population density into six categories.

  • We will map the color scale to the value of variable.

orange_pal <- RColorBrewer::brewer.pal(n = 6, name = "Oranges")
orange_pal
orange_rev <- rev(orange_pal)
orange_rev
  • We use the RColorBrewer::brewer.pal() function to manually create two palettes.
    • The brewer.pal() function produces evenly-spaced color schemes.
  • We use the rev() function to reverse the order of a color vector.
pop_p <- ggplot(data = county_full,
            mapping = aes(x = long, y = lat,
                          fill = pop_dens6, 
                          group = group))

pop_p1 <- pop_p + geom_polygon(color = "gray90", size = 0.05) +
  coord_equal()
pop_p2 <- pop_p1 + scale_fill_manual(values = orange_pal)

pop_p2 + labs(title = "Population Density",
              fill = "People per square mile") +
    theme_map() + theme(legend.position = "bottom")
pop_p2_rev <- pop_p1 + scale_fill_manual(values = orange_rev)

pop_p2_rev + labs(title = "Reverse-coded Population Density",
              fill = "People per square mile") +
    theme_map() + theme(legend.position = "bottom")

Choropleth Maps

Map U.S. county-level data

  • Let’s consider a county map of a continuous variable, such as per_gop_2016.
  • Check class(county_full$per_gop_2016).
gop_p <- ggplot(data = county_full,
                mapping = aes(x = long, y = lat,
                              fill = per_gop_2016, 
                              group = group))

gop_p1 <- gop_p + geom_polygon(color = "gray70", size = 0.05) + coord_equal()
gop_p1
  • For a continuous variable, we can use scale_fill_gradient(), scale_fill_gradient2(), or scale_fill_gradient2() function:

    • scale_fill_gradient() produces a two-color gradient.
  • scale_fill_gradient2() produces a three-color gradient with specified midpoint.

  • scale_fill_gradientn() produces an n-color gradient.

  • For scale_fill_gradient2(), choose the value and color for midpoint carefully.

gop_p2 <- gop_p1 + scale_fill_gradient2( 
  low = '#2E74C0',  # from party_colors for DEM
  mid = '#FFFFFF',  # transparent white
  high = '#CB454A',  # from party_colors for GOP
  na.value = "grey50",
  midpoint = .5)

gop_p2 + labs(title = "US Presidential Election 2016",
              fill = "Trump vote share") +
  theme_map() + theme(legend.position = "bottom")

Choropleth Maps

Small-multiple maps

  • Sometimes we have geographical data with repeated observations over time.

  • A common case is to have a country- or state-level measure observed over a period of years (Panel data).

  • Let’s consider consider the poverty rate determined by level of educational attainment in NY.

NY_socioecon_geo_poverty <- read_csv(
  'https://bcdanl.github.io/data/NY_socioecon_geo_poverty.csv'
)
library(viridis)
  • The viridis colors run in low-to-high sequences and combines perceptually uniform colors with easy-to-see, easily-contrasted hues along their scales.

    • The scale_fill_viridis_c() function is for continuous data.
    • The scale_fill_viridis_d() function is for discrete data.
p <- ggplot(data = NY_socioecon_geo_poverty,
            mapping = aes(x = long, y = lat, group = group, 
                          fill = c04_058 ))
  
p1 <- p + geom_polygon(color = "grey", size = 0.1) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45) 

p2 <- p1 + scale_fill_viridis_c(option = "plasma") + theme_map() 
p2
p2 + facet_wrap(~ year, ncol = 3) +
    theme(legend.position = "bottom",
          strip.background = element_blank()) +
    labs(fill = "Poverty rate in NY (%)",
         title = "Poverty rate for the male population 25 years and over \nfor whom the highest educational attainment is bachelor's degree")
  • We facet the maps just like any other small-multiple with facet_wrap().

Hexbin Maps

Hexbin Maps

Statebins

  • As an alternative to state-level choropleths, we can consider statebins.
library(statebins)  # install.packages("statebins")
p <- ggplot(election, aes( state = state, fill = pct_trump ) )
p1 <- p +  geom_statebins(lbl_size = 5,
                          border_col = "grey90", border_size = 1)
p2 <- p1 + labs(fill = "Percent Trump") +
  coord_equal() +
  theme_statebins( legend_position = c(.45, 1) ) +
  theme( legend.direction="horizontal" )
p2
p2 + scale_fill_gradient2( 
  low = '#2E74C0',  # from party_colors for DEM
  mid = '#FFFFFF',  # transparent white
  high = '#CB454A',  # from party_colors for GOP
  na.value = "grey50",
  midpoint = 50)   # set the midpoint value

Hexbin Maps

Statebins

  • Let’s remove DC and use scale_fill_gradient().
p <- ggplot(data = filter(election, st != "DC")  , 
            mapping = aes(state = state, fill = pct_clinton)) 
p1 <- p + geom_statebins(lbl_size = 5,
                         border_col = "grey90",
                         border_size = 1)
p2 <- p1 + labs(fill = "Percent Clinton") +
  coord_equal() +
  theme_statebins( legend_position = c(.45, 1) ) +
  theme( legend.direction="horizontal" )
p2

p2 + scale_fill_gradient( 
    low = '#FFFFFF',  # transparent white
    high = '#2E74C0',  # from party_colors for DEM
    na.value = "grey50")   # set the midpoint value

Hexbin Maps

Statebins

  • Let’s use scale_fill_manual() to fill color by party.

  • legend_position allows for adjusting a coordinate for the legend position.

p <- ggplot(data = election  , 
            mapping = aes(state = state, fill = party)) 
p1 <- p + geom_statebins(lbl_size = 5,
                         border_col = "grey90",
                         border_size = 1)
p2 <- p1 + labs(fill = "Winner") +
  coord_equal() +
  theme_statebins( legend_position = c(.25, 1) ) +
  theme( legend.direction="horizontal",
         legend.title = element_text(size=30),
         legend.text = element_text(size=30) )

p2 + scale_fill_manual( values = c(Republican = "darkred", 
                                   Democratic = "royalblue"))

Hexbin Maps

Statebins

  • Let’s discretize a continuous variable using scale_fill_gradient() with breaks.
p <- ggplot(data = election  , 
            mapping = aes(state = state, fill=pct_trump)) 
p1 <- p + geom_statebins(lbl_size = 5,
                         border_col = "grey90",
                         border_size = 1)
p2 <- p1 + labs(fill = "Percent Trump") +
  coord_equal() +
  theme_statebins( legend_position = c(.2, 1) ) +
  theme( legend.direction="horizontal")

p2 + scale_fill_gradient(breaks = c(5, 21, 41, 48, 57),
                         labels = c("< 5", "5-21", 
                                    "21-41", "41-58", "> 57"),
                         low = "#f9ecec", high = "#CB454A") +
  guides(fill = guide_legend())