Draw Maps
March 24, 2025
Data with map drawing, also known as geospatial data, is important for several reasons:
Analysis: Geospatial data can highlight patterns and relationships across spatial units in the data that may not be apparent in other forms of data.
Communication: Maps can be visually striking, especially when the spatial units of the map are familiar entities, like counties in the US.
socviz::election
dataset has various measures of the vote and vote shares by state.geom_polygon()
can be used to visualize map data.p <- ggplot(data = us_states,
mapping = aes(x = long, y = lat,
group = group))
p + geom_polygon(fill = "white", color = "black")
fill
the map.p <- ggplot(data = us_states,
mapping = aes(x = long, y = lat,
group = group, fill = region))
p + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
guides(fill = FALSE)
We can transform the default projection used by geom_polygon()
, via the coord_map()
function.
lat0
and lat1
.election
data on to the mapIn the map data, us_states
, the state names (in a variable named region
) are in lower case.
Here we can create a variable in the election
data frame to correspond to this, using the tolower()
function to convert the state names.
It is important to know your data and variables well enough to check that they have merged properly.
p0 <- ggplot(data = us_states_elec,
aes(x = long, y = lat,
group = group, fill = party))
p0 + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
party
colors for the fill
.fill
aethetic, let’s try a continuous measure, such as the percentage of the vote received by Donald Trump (pct_trump
).Blue is not the color we want here.
The color gradient runs in the wrong direction.
Let’s fix these problems using scale_fill_gradient()
:
scale_*_gradient2()
function gives us a blue-red spectrum that passes through white by default.high
and low
colors.p0 <- ggplot(data = us_states_elec,
mapping = aes(x = long, y = lat, group = group, fill = d_points))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_gradient2() + labs(title = "Winning margins")
p2 + theme_map() + labs(fill = "Percent")
scale_*_gradient2()
function, we can also re-specify the mid-level color along with the high
and low
colors.p3
, you’ll see that it extends very high on the Blue side.
p0 <- ggplot(data = filter(us_states_elec,
region != "district of columbia"),
aes(x = long, y = lat, group = group, fill = d_points))
p3 <- p1 + scale_fill_gradient2(low = "red",
mid = scales::muted("purple"),
high = "blue",
breaks = c(-25, 0, 25, 50, 75))
p3 + theme_map() + labs(fill = "Percent", title = "Winning margins", caption = "DC is omitted.")
Choropleth maps display divided geographical areas or regions that are colored, shaded or patterned in relation to a data variable.
County-level US choropleth maps can be aesthetically pleasing, because of the added detail they bring to a national map.
The county-level datasets (county_map
and county_data
) are included in the socviz
library.
The county map data frame, county_map
, has been processed a little in order to transform it to an Albers projection, and also to relocate (and re-scale) Alaska and Hawaii.
county_map
county_data |>
select(id, name, state, pop_dens, pct_black) |>
sample_n(5)
county_full <-
left_join(county_map, county_data, by = "id")
The id
field is the FIPS code for the county.
pop_dens
is population density.
pct_black
is percent of African-American population.
We merge the data frames using the shared FIPS id
column.
p <- ggplot(data = county_full,
mapping = aes(x = long, y = lat,
fill = pop_dens,
group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05)
p1
p1
object produces a legible map, but by default it chooses an unordered categorical layout.pop_dens
variable is not ordered.pop_dens
is an un-ordered discrete variable.pct_black
is an un-ordered factor variable.table(county_full$pct_black)
p <- ggplot(data = county_full,
mapping = aes(x = long, y = lat, fill = pct_black,
group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette="Greens")
p2 + labs(fill = "US Population, Percent Black") +
guides(fill = guide_legend(nrow = 1)) +
theme_map() + theme(legend.position = "bottom")
Let’s draw a new county-level choropleths.
We have a pop_dens6
variable that divides the population density into six categories.
We will map the color scale to the value of variable.
orange_pal <- RColorBrewer::brewer.pal(n = 6, name = "Oranges")
orange_pal
orange_rev <- rev(orange_pal)
orange_rev
RColorBrewer::brewer.pal()
function to manually create two palettes.
brewer.pal()
function produces evenly-spaced color schemes.rev()
function to reverse the order of a color vector.pop_p <- ggplot(data = county_full,
mapping = aes(x = long, y = lat,
fill = pop_dens6,
group = group))
pop_p1 <- pop_p + geom_polygon(color = "gray90", size = 0.05) +
coord_equal()
pop_p2 <- pop_p1 + scale_fill_manual(values = orange_pal)
pop_p2 + labs(title = "Population Density",
fill = "People per square mile") +
theme_map() + theme(legend.position = "bottom")
per_gop_2016
.class(county_full$per_gop_2016)
.For a continuous variable, we can use scale_fill_gradient()
, scale_fill_gradient2()
, or scale_fill_gradient2()
function:
scale_fill_gradient()
produces a two-color gradient.scale_fill_gradient2()
produces a three-color gradient with specified midpoint.
scale_fill_gradientn()
produces an n-color gradient.
For scale_fill_gradient2()
, choose the value and color for midpoint
carefully.
gop_p2 <- gop_p1 + scale_fill_gradient2(
low = '#2E74C0', # from party_colors for DEM
mid = '#FFFFFF', # transparent white
high = '#CB454A', # from party_colors for GOP
na.value = "grey50",
midpoint = .5)
gop_p2 + labs(title = "US Presidential Election 2016",
fill = "Trump vote share") +
theme_map() + theme(legend.position = "bottom")
Sometimes we have geographical data with repeated observations over time.
A common case is to have a country- or state-level measure observed over a period of years (Panel data).
Let’s consider consider the poverty rate determined by level of educational attainment in NY.
The viridis
colors run in low-to-high sequences and combines perceptually uniform colors with easy-to-see, easily-contrasted hues along their scales.
scale_fill_viridis_c()
function is for continuous data.scale_fill_viridis_d()
function is for discrete data.p2 + facet_wrap(~ year, ncol = 3) +
theme(legend.position = "bottom",
strip.background = element_blank()) +
labs(fill = "Poverty rate in NY (%)",
title = "Poverty rate for the male population 25 years and over \nfor whom the highest educational attainment is bachelor's degree")
facet_wrap()
.statebins
.library(statebins) # install.packages("statebins")
p <- ggplot(election, aes( state = state, fill = pct_trump ) )
p1 <- p + geom_statebins(lbl_size = 5,
border_col = "grey90", border_size = 1)
p2 <- p1 + labs(fill = "Percent Trump") +
coord_equal() +
theme_statebins( legend_position = c(.45, 1) ) +
theme( legend.direction="horizontal" )
p2
scale_fill_gradient()
.p2 <- p1 + labs(fill = "Percent Clinton") +
coord_equal() +
theme_statebins( legend_position = c(.45, 1) ) +
theme( legend.direction="horizontal" )
p2
p2 + scale_fill_gradient(
low = '#FFFFFF', # transparent white
high = '#2E74C0', # from party_colors for DEM
na.value = "grey50") # set the midpoint value
Let’s use scale_fill_manual()
to fill
color by party
.
legend_position
allows for adjusting a coordinate for the legend position.
scale_fill_gradient()
with breaks
.p2 <- p1 + labs(fill = "Percent Trump") +
coord_equal() +
theme_statebins( legend_position = c(.2, 1) ) +
theme( legend.direction="horizontal")
p2 + scale_fill_gradient(breaks = c(5, 21, 41, 48, 57),
labels = c("< 5", "5-21",
"21-41", "41-58", "> 57"),
low = "#f9ecec", high = "#CB454A") +
guides(fill = guide_legend())