R Basics
September 11, 2024
A function can take any number and type of input parameters and return any number and type of output results.
R ships a vast number of built-in functions.
R also allows a user to define a new function.
We will mostly use built-in functions.
library(tidyverse)
# The function `str_c()`, provided by `tidyverse`, concatenates characters.
str_c("Data", "Analytics")
str_c("Data", "Analytics", sep = "!")
We invoke a function by entering its name and a pair of opening and closing parentheses.
Much as a cooking recipe can accept ingredients, a function invocation can accept inputs called arguments.
We pass arguments sequentially inside the parentheses (, separated by commas).
A parameter is a name given to an expected function argument.
A default argument is a fallback value that R passes to a parameter if the function invocation does not explicitly provide one.
All of the basic operators with parentheses we see in mathematics are available to use.
R can be used for a wide range of mathematical calculations.
abs(x)
: the absolute value \(|x|\)sqrt(x)
: the square root \(\sqrt{x}\)exp(x)
: the exponential value \(e^x\), where \(e = 2.718...\)log(x)
: the natural logarithm \(\log_{e}(x)\), or simply \(\log(x)\)
Descriptive statistics condense data into manageable summaries, making it easier to understand key characteristics of the data.
\[ \overline{x} = \frac{x_{1} + x_{2} + \cdots + x_{N}}{N} \]
mean()
calculates the mean of the values in a vector.
median()
calculates the median of the values in a vector.The mode is the value(s) that occurs most frequently in a given vector.
Mode is useful, although it is often not a very good representation of centrality.
The R package, modest
, provides the mfw(x)
function that calculate the mode of values in vector x
.
\[ (\text{range of x}) \,=\, (\text{maximum value in x}) \,-\, (\text{minimum value in x}) \]
max(x)
returns the maximum value of the values in a given vector \(x\).min(x)
returns the minimum value of the values in a given vector \(x\).\[ \overline{s}^{2} = \frac{(x_{1}-\overline{x})^{2} + (x_{2}-\overline{x})^{2} + \cdots + (x_{N}-\overline{x})^{2}}{N-1}\;\, \]
var(x)
calculates the variance of the values in a vector \(x\).\[ \overline{s} = \sqrt{ \left( \frac{(x_{1}-\overline{x})^{2} + (x_{2}-\overline{x})^{2} + \cdots + (x_{N}-\overline{x})^{2}}{N-1}\;\, \right) } \]
sd(x)
calculates the standard deviation of the values in a vector \(x\)quantile(x)
quantile(x, 0) # the minimum
quantile(x, 0.25) # the 1st quartile
quantile(x, 0.5) # the 2nd quartile
quantile(x, 0.75) # the 3rd quartile
quantile(x, 1) # the maximum
Complete path from the root directory to the target file or directory.
Independent of the current working directory.
Example
/Users/user/documents/data/car_data.csv
C:\\Users\\user\\Documents\\data\\car_data.csv
car_data.csv
is /Users/user/documents/data/car_data.csv
./Users/user/documents/
.car_data.csv
is dada/car_data.csv
./cloud/project/
read_csv()
function to read a comma-separated values (CSV) file.Download the CSV file, car_data.csv
from the Class Files module in our Brightspace.
Create a sub-directory, data
, by clicking “New Folder” in the Files Pane in Posit Cloud.
Upload the car_data.csv
file to the sub-directory data
.
Provide the relative pathname for the file, car_data.csv
, to the read_csv()
function.
View()
/view()
displays the data in a simple spreadsheet-like grid.dim()
shows how many rows and columns are in the data for data.frame
.nrow()
and ncol()
shows the number of rows and columns for data.frame
respectively.skimr::skim()
provides a more detailed summary.
skimr
is the R package that provides the function skim()
.data.frame
: Variables, Observations, and Values
There are three rules which make a data.frame
tidy:
:::
–> –> –> –>