Draw Maps
March 24, 2025
Data with map drawing, also known as geospatial data, is important for several reasons:
Analysis: Geospatial data can highlight patterns and relationships across spatial units in the data that may not be apparent in other forms of data.
Communication: Maps can be visually striking, especially when the spatial units of the map are familiar entities, like counties in the US.
socviz::election dataset has various measures of the vote and vote shares by state.geom_polygon() can be used to visualize map data.p <- ggplot(data = us_states,
mapping = aes(x = long, y = lat,
group = group))
p + geom_polygon(fill = "white", color = "black")fill the map.p <- ggplot(data = us_states,
mapping = aes(x = long, y = lat,
group = group, fill = region))
p + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
guides(fill = FALSE)We can transform the default projection used by geom_polygon(), via the coord_map() function.
lat0 and lat1.election data on to the mapIn the map data, us_states, the state names (in a variable named region) are in lower case.
Here we can create a variable in the election data frame to correspond to this, using the tolower() function to convert the state names.
It is important to know your data and variables well enough to check that they have merged properly.
p0 <- ggplot(data = us_states_elec,
aes(x = long, y = lat,
group = group, fill = party))
p0 + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45) party colors for the fill.fill aethetic, let’s try a continuous measure, such as the percentage of the vote received by Donald Trump (pct_trump).Blue is not the color we want here.
The color gradient runs in the wrong direction.
Let’s fix these problems using scale_fill_gradient():
scale_*_gradient2() function gives us a blue-red spectrum that passes through white by default.high and low colors.p0 <- ggplot(data = us_states_elec,
mapping = aes(x = long, y = lat, group = group, fill = d_points))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_gradient2() + labs(title = "Winning margins")
p2 + theme_map() + labs(fill = "Percent")scale_*_gradient2() function, we can also re-specify the mid-level color along with the high and low colors.p3, you’ll see that it extends very high on the Blue side.
p0 <- ggplot(data = filter(us_states_elec,
region != "district of columbia"),
aes(x = long, y = lat, group = group, fill = d_points))
p3 <- p1 + scale_fill_gradient2(low = "red",
mid = scales::muted("purple"),
high = "blue",
breaks = c(-25, 0, 25, 50, 75))
p3 + theme_map() + labs(fill = "Percent", title = "Winning margins", caption = "DC is omitted.")Choropleth maps display divided geographical areas or regions that are colored, shaded or patterned in relation to a data variable.
County-level US choropleth maps can be aesthetically pleasing, because of the added detail they bring to a national map.
The county-level datasets (county_map and county_data) are included in the socviz library.
The county map data frame, county_map, has been processed a little in order to transform it to an Albers projection, and also to relocate (and re-scale) Alaska and Hawaii.
county_map
county_data |>
select(id, name, state, pop_dens, pct_black) |>
sample_n(5)
county_full <-
left_join(county_map, county_data, by = "id")The id field is the FIPS code for the county.
pop_dens is population density.
pct_black is percent of African-American population.
We merge the data frames using the shared FIPS id column.
p <- ggplot(data = county_full,
mapping = aes(x = long, y = lat,
fill = pop_dens,
group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05)
p1p1 object produces a legible map, but by default it chooses an unordered categorical layout.pop_dens variable is not ordered.pop_dens is an un-ordered discrete variable.pct_black is an un-ordered factor variable.table(county_full$pct_black)p <- ggplot(data = county_full,
mapping = aes(x = long, y = lat, fill = pct_black,
group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette="Greens")
p2 + labs(fill = "US Population, Percent Black") +
guides(fill = guide_legend(nrow = 1)) +
theme_map() + theme(legend.position = "bottom")Let’s draw a new county-level choropleths.
We have a pop_dens6 variable that divides the population density into six categories.
We will map the color scale to the value of variable.
orange_pal <- RColorBrewer::brewer.pal(n = 6, name = "Oranges")
orange_pal
orange_rev <- rev(orange_pal)
orange_revRColorBrewer::brewer.pal() function to manually create two palettes.
brewer.pal() function produces evenly-spaced color schemes.rev() function to reverse the order of a color vector.pop_p <- ggplot(data = county_full,
mapping = aes(x = long, y = lat,
fill = pop_dens6,
group = group))
pop_p1 <- pop_p + geom_polygon(color = "gray90", size = 0.05) +
coord_equal()
pop_p2 <- pop_p1 + scale_fill_manual(values = orange_pal)
pop_p2 + labs(title = "Population Density",
fill = "People per square mile") +
theme_map() + theme(legend.position = "bottom")per_gop_2016.class(county_full$per_gop_2016).For a continuous variable, we can use scale_fill_gradient(), scale_fill_gradient2(), or scale_fill_gradient2() function:
scale_fill_gradient() produces a two-color gradient.scale_fill_gradient2() produces a three-color gradient with specified midpoint.
scale_fill_gradientn() produces an n-color gradient.
For scale_fill_gradient2(), choose the value and color for midpoint carefully.
gop_p2 <- gop_p1 + scale_fill_gradient2(
low = '#2E74C0', # from party_colors for DEM
mid = '#FFFFFF', # transparent white
high = '#CB454A', # from party_colors for GOP
na.value = "grey50",
midpoint = .5)
gop_p2 + labs(title = "US Presidential Election 2016",
fill = "Trump vote share") +
theme_map() + theme(legend.position = "bottom")Sometimes we have geographical data with repeated observations over time.
A common case is to have a country- or state-level measure observed over a period of years (Panel data).
Let’s consider consider the poverty rate determined by level of educational attainment in NY.
The viridis colors run in low-to-high sequences and combines perceptually uniform colors with easy-to-see, easily-contrasted hues along their scales.
scale_fill_viridis_c() function is for continuous data.scale_fill_viridis_d() function is for discrete data.p2 + facet_wrap(~ year, ncol = 3) +
theme(legend.position = "bottom",
strip.background = element_blank()) +
labs(fill = "Poverty rate in NY (%)",
title = "Poverty rate for the male population 25 years and over \nfor whom the highest educational attainment is bachelor's degree")facet_wrap().statebins.library(statebins) # install.packages("statebins")
p <- ggplot(election, aes( state = state, fill = pct_trump ) )
p1 <- p + geom_statebins(lbl_size = 5,
border_col = "grey90", border_size = 1)
p2 <- p1 + labs(fill = "Percent Trump") +
coord_equal() +
theme_statebins( legend_position = c(.45, 1) ) +
theme( legend.direction="horizontal" )
p2scale_fill_gradient().p2 <- p1 + labs(fill = "Percent Clinton") +
coord_equal() +
theme_statebins( legend_position = c(.45, 1) ) +
theme( legend.direction="horizontal" )
p2
p2 + scale_fill_gradient(
low = '#FFFFFF', # transparent white
high = '#2E74C0', # from party_colors for DEM
na.value = "grey50") # set the midpoint valueLet’s use scale_fill_manual() to fill color by party.
legend_position allows for adjusting a coordinate for the legend position.
scale_fill_gradient() with breaks.p2 <- p1 + labs(fill = "Percent Trump") +
coord_equal() +
theme_statebins( legend_position = c(.2, 1) ) +
theme( legend.direction="horizontal")
p2 + scale_fill_gradient(breaks = c(5, 21, 41, 48, 57),
labels = c("< 5", "5-21",
"21-41", "41-58", "> 57"),
low = "#f9ecec", high = "#CB454A") +
guides(fill = guide_legend())