Data Visualization - Aesthetic Mappings and Facets
March 5, 2024
<-
<-
command + ⬆️/⬇️/⬅️/➡️
shift + ⬆️/⬇️/⬅️/➡️
command + shift + ⬆️/⬇️/⬅️/➡️
command + PgUp/PgDn
shift + PgUp/PgDn
command + shift + PgUp/PgDn:
Ctrl + ⬆️/⬇️/⬅️/➡️
Shift + ⬆️/⬇️/⬅️/➡️
Ctrl + shift + ⬆️/⬇️/⬅️/➡️
Ctrl + PgUp/PgDn
Shift + PgUp/PgDn
Ctrl + Shift + PgUp/PgDn:
An aesthetic is a visual property (e.g., size
, shape
, color
) of the objects (e.g., class
) in our plot.
We can display a point in different ways by changing the values of its aesthetic properties.
alpha
manually.ggplot
Basicsdata.frame
, a geom
function, or a collection of mappings such as x = VAR_1
and y = VAR_2
.as.factor(variable)
to make a variable categorical.as.numeric(variable)
to make a variable continuous.For data visualization, integer
-type variables could be treated as either categorical or continuous, depending on the context of analysis.
If the values of an integer-type variable means an intensity or an order, the integer variable could be continuous.
If not, the integer variable is categorical.
facet_wrap( VAR ~ . )
facet_wrap()
.facet_wrap( VAR ~ . )
nrow
(ncol
) determines the number of rows (columns) to use when laying out the facets.facet_grid( VAR_ROW ~ VAR_COL )
To facet our plot on the combination of two variables, add facet_grid( VAR_ROW ~ VAR_COL )
to our plot call.
The first argument of facet_grid()
is also a formula.
~
.facet_grid( VAR_ROW ~ VAR_COL )
scales
in Facettingscales
in facet_*()
is whether scales is
"fixed"
, the default),"free_x"
, "free_y"
), or"free"
).scales
in FacettingHow are these two plots similar?
geom_*()
is the geometrical object that a plot uses to represent data.
geom_bar()
or geom_col()
;geom_histogram()
or geom_freqpoly()
;geom_line()
;geom_boxplot()
;geom_point()
;geom_smooth()
;geom_*()
to plot the same data.From the plots with two or more variables, we want to see co-variation, the tendency for the values of two or more variables to vary together in a related way.
What type of co-variation occurs between variables?
geom_boxplot()
is used to create box plots (also known as box-and-whisker plots).
diamonds
data.frame comes in ggplot2
and contains information about ~54,000 diamonds, including cut
variable.geom_bar()
transforms the data.frame.
A time trend plot, (also known as a time series plot), is used to visualize trends, patterns, and fluctuations in a variable over a specific time period.
We can check the overall direction in which the time-series variable are moving—upwards, downwards, or staying relatively constant over time.
nvda
data.frame includes NVIDIA’s stock information from 2019-01-02
to 2024-03-04
.ggplot()
Step 1. Figure out whether variables of interests are categorical or continuous.
Step 2. Think which geometric objects, aesthetic mappings, and faceting are appropriate to visualize distributions and relationships.
Step 3. If needed, transform a given data.frame
(e.g., filtered observations, new variables, summarized data) and try new visualizations.
geom_bar()
and more)geom_histogram()
and more)geom_bar()
and more)geom_point()
with geom_smooth()
and more)geom_boxplot()
and more)geom_bar()
and more)geom_line()
and more)Every geom function in ggplot2
takes a mapping argument.
However, not every aesthetic works with every geom
.
shape
of a point, but we could not set the shape
of a line;linetype
of a line.geom_smooth()
geom_smooth(method = lm)
method = lm
manually in geom_smooth()
gives a straight line that fits into data points.geom_smooth(group = CATEGORICAL_VAR)
group
aesthetic to a categorical variable to draw multiple objects.
ggplot2
will draw a separate object for each unique value of the grouping variable.geom_smooth(group = CATEGORICAL_VAR)
geom_smooth(group = CATEGORICAL_VAR)
ggplot2
will automatically group the data for these geoms
whenever we map an aesthetic to a categorical variable (as in the linetype
example).geom_*()
functions to ggplot()
:geom_point()
, geom_smooth()
, and geom_smooth(method = lm)
together is an excellent option to visualize the relationship between the two variables.ggplot2
will treat them as local mappings for the layer.We can use the same idea to specify different data for each layer.
Here, our smooth line displays just a subset of the mpg
data.frame, the subcompact
cars.
filter()
is the tidyverse-way to filter observations in a data.frame.The local data argument in geom_smooth()
overrides the global data argument in ggplot()
for that layer only.
se
) tells us how much the predicted values from a model might differ from the actual values we’re trying to predict.