Data Visualization with ggplot
; Relationship ggplot()
October 29, 2024
ggplot
The mpg
data frame, provided by ggplot2
, contains observations collected by the US Environmental Protection Agency on 38 models of car.
Q. Do cars with big engines use more fuel than cars with small engines?
displ
: a car’s engine size, in liters.hwy
: a car’s fuel efficiency on the highway, in miles per gallon (mpg).What does the relationship between engine size and fuel efficiency look like?
ggplot
mpg
, run the above code to put displ
on the x
-axis and hwy
on the y
-axis.A ggplot
graphic is a mapping
of variables in data
to aes
thetic attributes of geom
etric objects.
Three Essential Components in ggplot()
Graphics:
data
: data.frame containing the variables of interest.geom_*()
: geometric object in the plot (e.g., point, line, bar, histogram, boxplot).aes()
: aesthetic attributes of the geometric object (e.g., x
-axis, y
-axis, color
, shape
, size
, fill
) mapped to variables in the data.frame.ggplot
ggplot()
:
data = mpg
geom_point()
aes(x = displ, y = hwy)
ggplot()
ggplot()
geom_point()
ggplot()
geom_smooth()
ggplot()
geom_point()
with geom_smooth()
geom_smooth()
draws a smooth curve fitted to the data.ggplot()
workflowggplot()
ggplot2
graphics is to put the +
in the wrong place.
+
at the end of the previous line, NOT at the beginning of the next line.ggplot()
geom_smooth()
Using regression—one of the machine learning methods—the geom_smooth()
visualizes the predicted value of the y
variable for a given value of the x
variable.
What Does the Grey Ribbon Represent?
x
and y
variables falls within the grey ribbon.ggplot()
geom_point()
with geom_smooth(method = lm)
method = "lm"
specifies that a linear model (lm
), called a linear regression model.ggplot()
mpg
data.frame?ggplot()
Many points overlap each other.
When points overlap, it’s hard to know how many data points are at a particular location.
Overplotting can obscure patterns and outliers, leading to potentially misleading conclusions.
ggplot()
alpha
alpha
) between 0 (full transparency) and 1 (no transparency) manually.ggplot()
alpha
aes()
function but within the geom_*()
function.ggplot()
ggplot()
ggplot()