R Basics
September 9, 2024
TRUE
or FALSE
.
Sometimes we need to explicitly cast a value from one type to another.
as.character()
, as.integer()
, as.numeric()
, and as.factor()
.character
in R."
) or single quotes ('
) to wrap around the string
favorite.integer <- as.integer(2)
class(favorite.integer)
favorite.numeric <- as.numeric(8.8)
class(favorite.numeric)
class(TRUE)
class(FALSE)
favorite.numeric == 8.8
favorite.numeric == 9.9
class(favorite.numeric == 8.8)
==
to test for equality in Ra <- 1:10 # colon operator
b <- c("3", 4, 5)
beers <- c("BUD LIGHT", "BUSCH LIGHT", "COORS LIGHT",
"MILLER LITE", "NATURAL LIGHT")
class(a)
class(b)
class(beers)
We can create one-dimensional data structures called “vectors”.
c(...)
: Returns a vector that is constructed from one or more arguments, with the order of the vector elements corresponding to the order of the arguments.
Factors store categorical data.
Under the hood, factors are actually integers that have a string label attached to each unique integer.
Male
/Female
labels for each of our patients, this will be stored a “column” of zeros and ones by R.+
+
:+
tells you that R is waiting for more input; it doesn’t think you’re done yet.A function can take any number and type of input parameters and return any number and type of output results.
R ships a vast number of built-in functions.
R also allows a user to define a new function.
We will mostly use built-in functions.
library(tidyverse)
# The function `str_c()`, provided by `tidyverse`, concatenates characters.
str_c("Data", "Analytics")
str_c("Data", "Analytics", sep = "!")
We invoke a function by entering its name and a pair of opening and closing parentheses.
Much as a cooking recipe can accept ingredients, a function invocation can accept inputs called arguments.
We pass arguments sequentially inside the parentheses (, separated by commas).
A parameter is a name given to an expected function argument.
A default argument is a fallback value that R passes to a parameter if the function invocation does not explicitly provide one.
All of the basic operators with parentheses we see in mathematics are available to use.
R can be used for a wide range of mathematical calculations.
abs(x)
: the absolute value \(|x|\)sqrt(x)
: the square root \(\sqrt{x}\)exp(x)
: the exponential value \(e^x\), where \(e = 2.718...\)log(x)
: the natural logarithm \(\log_{e}(x)\), or simply \(\log(x)\)\[ \overline{x} = \frac{x_{1} + x_{2} + \cdots + x_{N}}{N} \]
mean()
calculates the mean of the values in a vector.
median()
calculates the median of the values in a vector.The mode is the value(s) that occurs most frequently in a given vector.
Mode is useful, although it is often not a very good representation of centrality.
The R package, modest
, provides the mfw(x)
function that calculate the mode of values in vector x
.
\[ (\text{range of x}) \,=\, (\text{maximum value in x}) \,-\, (\text{minimum value in x}) \]
max(x)
returns the maximum value of the values in a given vector \(x\).min(x)
returns the minimum value of the values in a given vector \(x\).\[ \overline{s}^{2} = \frac{(x_{1}-\overline{x})^{2} + (x_{2}-\overline{x})^{2} + \cdots + (x_{N}-\overline{x})^{2}}{N-1}\;\, \]
var(x)
calculates the variance of the values in a vector \(x\).\[ \overline{s} = \sqrt{ \left( \frac{(x_{1}-\overline{x})^{2} + (x_{2}-\overline{x})^{2} + \cdots + (x_{N}-\overline{x})^{2}}{N-1}\;\, \right) } \]
sd(x)
calculates the standard deviation of the values in a vector \(x\)quantile(x)
quantile(x, 0) # the minimum
quantile(x, 0.25) # the 1st quartile
quantile(x, 0.5) # the 2nd quartile
quantile(x, 0.75) # the 3rd quartile
quantile(x, 1) # the maximum
Complete path from the root directory to the target file or directory.
Independent of the current working directory.
Example
/Users/user/documents/data/car_data.csv
C:\\Users\\user\\Documents\\data\\car_data.csv
car_data.csv
is /Users/user/documents/data/car_data.csv
./Users/user/documents/
.car_data.csv
is dada/car_data.csv
./cloud/project/
read_csv()
function to read a comma-separated values (CSV) file.Download the CSV file, car_data.csv
from the Class Files module in our Brightspace.
Create a sub-directory, data
, by clicking “New Folder” in the Files Pane in Posit Cloud.
Upload the car_data.csv
file to the sub-directory data
.
Provide the relative pathname for the file, car_data.csv
, to the read_csv()
function.
View()
/view()
displays the data in a simple spreadsheet-like grid.dim()
shows how many rows and columns are in the data for data.frame
.nrow()
and ncol()
shows the number of rows and columns for data.frame
respectively.skimr::skim()
provides a more detailed summary.
skimr
is the R package that provides the function skim()
.data.frame
: Variables, Observations, and Values
There are three rules which make a data.frame
tidy:
:::
–> –> –> –>