R Basics
September 17, 2025
Posit Cloud (formerly RStudio Cloud) is a web service that delivers a browser-based experience similar to RStudio, the standard IDE for the R language.
For our course, we use Posit Cloud for the R programming component.
a <- 1
in the Script Pane.
R packages are collections of ready-made tools for R.
Many packages are already built into R, and thousands more can be installed from the internet (like downloading apps on your phone).
Why use packages?
Examples:
tidyverse
tidyverse
is a collection of R packages built for data analytics.
tidyverse
include:
install.packages("packageName")
install.packages("packageName")
to install new packages.tidyverse
, type and run the command above in the R Console.library(packageName)
library(packageName)
to load a package into your R session.library(tidyverse)
loads all the R packages in the tidyverse, including readr
, dplyr
, and ggplot2
.mpg
is a built-in dataset from ggplot2
, which is part of the tidyverse.read_csv()
function comes from the readr
package, which is part of the tidyverse.Note
danl-101-2024-0917.R
-
) / underscores (_
)*.R
files), # indicates that the rest of the line is to be ignored.<-
)<-
)libr
in the RScript in RStudio and wait for a second.
A value is datum (literal) such as a number or text.
There are different types of values:
Sometimes you will hear variables referred to as objects.
Everything that is not a literal value, such as 10
, is an object.
What is going on here?
The shortcut for the assignment <-
is:
y <- x + 12
, it does the following:
<-
in the middle.x
and adds it to 12
).y
.
TRUE
or FALSE
.
c(1, 2, 3)
c("red", "blue", "green")
names
, another with ages
as.character()
→ convert to characteras.integer()
→ convert to whole numbersas.numeric()
→ convert to numbers with decimalsas.factor()
→ convert to categorical (factor)" "
) or single quotes (' '
)."hello"
or 'hello'
favorite.integer <- as.integer(2)
class(favorite.integer)
favorite.numeric <- as.numeric(8.8)
class(favorite.numeric)
TRUE
/FALSE
)class(TRUE)
class(FALSE)
favorite.numeric == 8.8
favorite.numeric == 9.9
class(favorite.numeric == 8.8)
TRUE
or FALSE
.==
is used to test for equality.favorite.numeric == 8.8
returns TRUE
favorite.numeric == 9.9
returns FALSE
a <- 1:10 # create a sequence using the colon operator
b <- c("3", 4, 5) # mixing numbers and text
beers <- c("BUD LIGHT", "BUSCH LIGHT", "COORS LIGHT",
"MILLER LITE", "NATURAL LIGHT")
class(a)
class(b)
class(beers)
b
).c(...)
(combine or concatenate) creates vectors by putting values together in order."Freshman"
, "Sophomore"
, "Junior"
, and "Senior"
student classifications, R stores them as numbers (e.g., 1, 2, 3, 4) but displays the labels.levels()
→ shows the categories (unique labels)nlevels()
→ shows how many categories there are+
+
sum()
, mean()
).library(tidyverse)
# The function `str_c()`, provided by `tidyverse`, concatenates characters.
str_c("Data", "Analytics")
str_c("Data", "Analytics", sep = "!")
We use a function by entering its name and a pair of opening and closing parentheses.
Much as a cooking recipe can accept ingredients, a function invocation can accept inputs called arguments.
We pass arguments sequentially inside the parentheses (, separated by commas).
A parameter is a name given to an expected function argument.
A default argument is a fallback value that R passes to a parameter if the function invocation does not explicitly provide one.
All of the basic operators with parentheses we see in mathematics are available to use.
R can be used for a wide range of mathematical calculations.
abs(x)
: the absolute value \(|x|\)sqrt(x)
: the square root \(\sqrt{x}\)exp(x)
: the exponential value \(e^x\), where \(e = 2.718...\)log(x)
: the natural logarithm \(\log_{e}(x)\), or simply \(\log(x)\)# Access an object directly from a package
ggplot2::mpg # PACKAGE_NAME::DATA_FRAME_NAME
# Call a function directly from a package
ggplot2::ggplot() # PACKAGE_NAME::FUNCTION_NAME
::
operator when you want to access:
ggplot2::mpg
)ggplot2::ggplot()
)
Descriptive statistics condense data into manageable summaries, making it easier to understand key characteristics of the data.
\[ \overline{x} = \frac{x_{1} + x_{2} + \cdots + x_{N}}{N} \]
mean()
calculates the mean of the values in a vector.
median()
calculates the median of the values in a vector.The mode is the value(s) that occurs most frequently in a given vector.
Mode is useful, although it is often not a very good representation of centrality.
The R package, modest
, provides the mfw(x)
function that calculate the mode of values in vector x
.
\[ (\text{range of x}) \,=\, (\text{maximum value in x}) \,-\, (\text{minimum value in x}) \]
max(x)
returns the maximum value of the values in a given vector \(x\).min(x)
returns the minimum value of the values in a given vector \(x\).\[ \overline{s}^{2} = \frac{(x_{1}-\overline{x})^{2} + (x_{2}-\overline{x})^{2} + \cdots + (x_{N}-\overline{x})^{2}}{N-1}\;\, \]
var(x)
calculates the variance of the values in a vector \(x\).\[ \overline{s} = \sqrt{ \left( \frac{(x_{1}-\overline{x})^{2} + (x_{2}-\overline{x})^{2} + \cdots + (x_{N}-\overline{x})^{2}}{N-1}\;\, \right) } \]
sd(x)
calculates the standard deviation of the values in a vector \(x\)quantile(x)
quantile(x, 0) # the minimum
quantile(x, 0.25) # the 1st quartile
quantile(x, 0.5) # the 2nd quartile
quantile(x, 0.75) # the 3rd quartile
quantile(x, 1) # the maximum
vector
vector
vector
is knowing how to index them.
my_vector <- c(10, 20, 30, 40, 50, 60)
# Filter elements greater than 10
is_greater_than_10 <- my_vector > 10 # Creates logical vector
my_vector[ is_greater_than_10 ]