Data Visualization - Geometric Objects
March 7, 2024
<-
<-
command + ⬆️/⬇️/⬅️/➡️
shift + ⬆️/⬇️/⬅️/➡️
command + shift + ⬆️/⬇️/⬅️/➡️
command + PgUp/PgDn
shift + PgUp/PgDn
command + shift + PgUp/PgDn:
Ctrl + ⬆️/⬇️/⬅️/➡️
Shift + ⬆️/⬇️/⬅️/➡️
Ctrl + shift + ⬆️/⬇️/⬅️/➡️
Ctrl + PgUp/PgDn
Shift + PgUp/PgDn
Ctrl + Shift + PgUp/PgDn:
ggplot
BasicsA graphic should display as much information as it can, with the lowest possible cognitive strain to the viewer.
Strive for clarity.
Visualization is an iterative process.
geom_*()
is the geometrical object that a plot uses to represent data.
geom_bar()
or geom_col()
;geom_histogram()
or geom_freqpoly()
;geom_line()
;geom_boxplot()
;geom_point()
;geom_smooth()
;geom_*()
to plot the same data.From the plots with two or more variables, we want to see co-variation, the tendency for the values of two or more variables to vary together in a related way.
What type of co-variation occurs between variables?
geom_boxplot()
is used to create box plots (also known as box-and-whisker plots).
diamonds
data.frame comes in ggplot2
and contains information about ~54,000 diamonds, including cut
variable.geom_bar()
transforms the data.frame.
A time trend plot, (also known as a time series plot), is used to visualize trends, patterns, and fluctuations in a variable over a specific time period.
We can check the overall direction in which the time-series variable are moving—upwards, downwards, or staying relatively constant over time.
nvda
data.frame includes NVIDIA’s stock information from 2019-01-02
to 2024-03-04
.ggplot()
Step 1. Figure out whether variables of interests are categorical or continuous.
Step 2. Think which geometric objects, aesthetic mappings, and faceting are appropriate to visualize distributions and relationships.
Step 3. If needed, transform a given data.frame
(e.g., filtered observations, new variables, summarized data) and try new visualizations.
geom_bar()
and more)geom_histogram()
and more)geom_bar()
and more)geom_point()
with geom_smooth()
and more)geom_boxplot()
and more)geom_bar()
and more)geom_line()
and more)Every geom function in ggplot2
takes a mapping argument.
However, not every aesthetic works with every geom
.
shape
of a point, but we could not set the shape
of a line;linetype
of a line.geom_smooth()
geom_smooth(method = lm)
method = lm
manually in geom_smooth()
gives a straight line that fits into data points.geom_smooth(group = CATEGORICAL_VAR)
group
aesthetic to a categorical variable to draw multiple objects.
ggplot2
will draw a separate object for each unique value of the grouping variable.geom_smooth(group = CATEGORICAL_VAR)
geom_smooth(group = CATEGORICAL_VAR)
ggplot2
will automatically group the data for these geoms
whenever we map an aesthetic to a categorical variable (as in the linetype
example).geom_*()
functions to ggplot()
:geom_point()
, geom_smooth()
, and geom_smooth(method = lm)
together is an excellent option to visualize the relationship between the two variables.ggplot2
will treat them as local mappings for the layer.We can use the same idea to specify different data for each layer.
Here, our smooth line displays just a subset of the mpg
data.frame, the subcompact
cars.
filter()
is the tidyverse-way to filter observations in a data.frame.The local data argument in geom_smooth()
overrides the global data argument in ggplot()
for that layer only.
se
) tells us how much the predicted values from a model might differ from the actual values we’re trying to predict.