Columns - select()
, rename()
, relocate()
, and mutate()
April 11, 2024
|>
) Operatordata.frame
and the output is a data.frame
, dplyr
verbs work well with the pipe, |>
|>
) takes the thing on its left and passes it along to the function on its right so that
f(x, y)
is equivalent to x |> f(y)
.filter(DATA_FRAME, LOGICAL_STATEMENT)
is equivalent to DATA_FRAME |> filter(LOGICAL_STATEMENT)
.|>
) is “then”.
|>
) is super useful when we have a chain of data transforming operations to do.select()
select()
select()
allows us to narrow in on the variables we’re actually interested in.select(VAR_1:VAR_2)
, we can select all the variables between VARIABLE_1
and VARIABLE_2
, inclusively:There are a number of helper functions we can use within select()
:
starts_with("abc")
: matches names that begin with “abc”.
ends_with("xyz")
: matches names that end with “xyz”.
contains("ijk")
: matches names that contain “ijk”.
num_range("x", 1:3)
: matches x1, x2 and x3.
rename()
rename()
rename()
can be used to rename variables:
DATA_FRAME |> rename(NEW_VARIABLE = EXISTING_VARIABLE)
relocate()
relocate()
relocate()
to move variables around.
.before
and .after
arguments to choose where to put variables.mutate()
mutate()
mutate()
is useful to add new variables that are functions of existing variables.
+
, -
, *
, /
, ^
%/%
(integer division) and %%
(remainder).mutate()
mutate()
function.mutate()
mutate()
Offsets: lead()
and lag()
If-else conditions: ifelse()
Ranking functions: min_rank()
, dense_rank()
, percent_rank()
, row_number()
, and more
Other useful functions: log()
, log10()
, exp()
, sqrt()
, round()
, factor()
, as.character()
, as.numeric()
, as.integer()
, and more
mutate()
lead()
and lag()
lead()
and lag()
allow us to refer to leading or lagging values.\[ \begin{align} \Delta GDP_{y} = GDP_{y} - GDP_{y-1} \end{align} \]
\[ \begin{align} \%\Delta GDP_{y} = \frac{GDP_{y} - GDP_{y-1}}{GDP_{y}} \end{align} \]
mutate()
ifelse()
ifelse()
ifelse(CONDITION, <if TRUE>, <else>)
mutate()
rank_me <- data.frame( x = c(10, 5, 1, 5, 5, NA) )
rank_me_asce <- rank_me |>
mutate(x_min_rank = min_rank(x),
x_dense_rank = dense_rank(x),
x_row_number = row_number(x),
x_perc_rank = percent_rank(x) )
rank_me_desc <- rank_me |>
mutate(x_min_rank = min_rank(-x), # instead of -x, we can use desc(x)
x_dense_rank = dense_rank(-x),
x_row_number = row_number(-x),
x_perc_rank = percent_rank(-x) )
min_rank()
, dense_rank()
, row_number()
, percent_rank()
, and moremutate()
df <- data.frame( x = c(1:10) ) |>
mutate(x_log = log(x),
x_log10 = log10(x),
x_exp = exp(x),
x_sqrt = sqrt(x),
x_sqrt_round = round(x_sqrt, 2),
x_fct = factor(x),
x_chr = as.character(x),
x_num = as.numeric(x),
x_int = as.integer(x) )
log()
, log10()
, exp()
, sqrt()
, round(VAR, digit)
, factor()
, as.character()
, as.numeric()
, as.integer()
, and moreselect()
, rename()
, relocate()
, and mutate()
Let’s do Classwork 10!