Working with data.frame
; Data Visualization
February 15, 2024
ggplot()
option+command+I: to create a R chunk
command+shift+return: to run the code in the R chunk
command + shift + K: to render/knit the Quarto file
command + shift + C: to (de-)comment out a line in the Quarto file
Alt+Ctrl+I : to create a R chunk
Ctrl+Shift+Enter : to run the code in the R chunk
Ctrl + Shift + K: to knit/render the Quarto file
Ctrl + Shift + C: to (de-)comment out a line in the Quarto file
Mac
<-
.Windows
<-
.danl-200-mynote-lec-08-2024-0215.qmd
)Your website project directory should include files specifically dedicated to your website.
In your website project directory, avoid having
_quarto.yml
Ensure all Quarto documents are rendered well without any errors.
After making edits to _quarto.yml
, save the changes by clicking the floppy disk icon (💾).
Run quarto render
in Terminal, whenever you have some changes in _quarto.yml
.
Once quarto render
completes, view the index.html
in your website working directory to see the updates from a web browser.
After confirming your local website files are well updated, use the 3-step git commands (add
-commit
-push
) to update your online website.
Keeping all the files associated with a given project (e.g., input data, Quarto documents, and figures) together in one directory is such a wise and common practice.
RStudio has built-in support for this via projects
We can create a new project by clicking
bcdanl.github.io
Keeping all the files associated with a given project (e.g., input data, Quarto documents, and figures) together in one directory is such a wise and common practice.
RStudio has built-in support for this via projects
We can create a new project by clicking
bcdanl.github.io
An absolute path is a complete path from the root directory to the target file or directory.
Example (Mac): /Users/bchoe/Documents/bcdanl.github.io/data/car_data.csv
Example (Windows): C:/Users/bchoe/Documents/bcdanl.github.io/data/car_data.csv
A relative path is a path relative to the current working directory.
Example (Mac): If the current directory is /Users/bchoe/Documents/bcdanl.github.io/
, the relative path to car_data.csv
is data/car_data.csv
Example (Windows): If the current directory is C:/Users/bchoe/Documents/bcdanl.github.io
, the relative path to car_data.csv
is data/car_data.csv
For any RStudio project, it is recommended to use a relative path.
read_csv()
function to read a comma-separated values (CSV) file.Download the CSV file, car_data.csv
from the Class Files module in our Brightspace.
Find the path name for the file, car_data.csv
from the File Explorer / Finder.
Provide the path name for the file, car_data.csv
, to the read_csv()
function.
View()
/view()
displays the data in a simple spreadsheet-like grid viewer.data.frames
dim()
shows how many rows and columns are in the data for data.frame
.nrow()
and ncol()
shows the number of rows and columns for data.frame
respectively.skimr::skim()
provides a more detailed summary.
skimr
is the R package that provides the function skim()
.data.frame
: Variables, Observations, and Values
There are three rules which make a data.frame
tidy:
ggplot()
In data visualization, you’ll turn data into plots.
In data transformation, you’ll learn the key verbs that allow you to select important variables, filter out key observations, create new variables, and compute summaries.
In exploratory data analysis, you’ll combine summary statistics (skim()
), visualization, and transformation with your curiosity and skepticism to ask and answer interesting questions about data.
“The simple graph has brought more information to the data analyst’s mind than any other device.” John Tukey
Data visualization is the creation and study of the visual representation of data
Many tools for visualizing data – R is one of them
Many approaches/systems within R for making data visualizations – ggplot2 is one of them, and that’s what we’re going to use
A grammar of graphics is a tool that enables us to concisely describe the components of a graphic
The mpg
data frame, provided by ggplot2
, contains observations collected by the US Environmental Protection Agency on 38 models of car.
Q. Do cars with big engines use more fuel than cars with small engines?
displ
: a car’s engine size, in liters.hwy
: a car’s fuel efficiency on the highway, in miles per gallon (mpg).What does the relationship between engine size and fuel efficiency look like?
ggplot
mpg
, run the above code to put displ
on the x
-axis and hwy
on the y
-axis.data.frame
, a geom
function, or a collection of mappings such as x = VAR_1
and y = VAR_2
.
In the plot above, one group of points (highlighted in red) seems to fall outside of the linear trend.
An aesthetic is a visual property (e.g., size
, shape
, color
) of the objects (e.g., class
) in your plot.
You can display a point in different ways by changing the values of its aesthetic properties.
color
to the plotshape
to the plotsize
to the plotalpha
(transparency) to the plotMany points overlap each other.
When points overlap, it’s hard to know how many data points are at a particular location.
Overplotting can obscure patterns and outliers, leading to potentially misleading conclusions.
We can set a transparency level (alpha
) between 0 (full transparency) and 1 (no transparency).
alpha
color
to the plot
geom_
function; i.e. it goes outside of aes()
.
color
as a character string.size
of a point in mm.shape
of a point as a number, as shown below.color
to the plot?ggplot()
ggplot2
graphics is to put the +
in the wrong place.facet_wrap()
.facet_grid( VAR_ROW ~ VAR_COL )
to our plot call.facet_grid()
is also a formula.
~
.scales
in facet_*()
is whether scales is
"fixed"
, the default),"free_x"
, "free_y"
), or"free"
).