Data Preparation and Management with R
September 30, 2024
data.frame
s with left_join()
|>
) Operatortidyverse
functions work well with the pipe, |>
, because the first argument of a tidyverse
function is a data.frame
and the output is a data.frame
.
|>
) takes the thing on its left and passes it along to the function on its right so that
f(x, y)
is equivalent to x |> f(y)
.left_join(DATA.FRAME_1, DATA.FRAME_2)
is equivalent to DATA.FRAME_1 |> left_join(DATA.FRAME_2)
.|>
) is “then”.
|>
) is super useful when we have a chain of data transforming operations to do.|>
) OperatorTo use the (native) pipe operator (|>
), we should set the option as follows:
data.frame
s with left_join()
x
.
NA
stands for “not available” (i.e., a missing value).tidyverse
DATA.FRAME |> filter(LOGICAL_CONDITIONS)
DATA.FRAME |> arrange(VARIABLES)
DATA.FRAME |> select(VARIABLES)
DATA.FRAME |> rename(NEW_VARIABLE = EXISTING_VARIABLE)
DATA.FRAME |> mutate(NEW_VARIABLE = ... )
The subsequent arguments describe what to do with the data.frame, mostly using the variable names.
The result is a data.frame.
filter()
filter()
jan1 <- flights |>
filter(month == 1, day == 1)
dec25 <- flights |>
filter(month == 12, day == 25)
class(flights$month == 1)
filter()
allows us to subset observations based on the value of logical conditions, which are either TRUE
or FALSE
.filter()
TRUE
or FALSE
value.logical
.filter()
x
is the left-hand circle, y
is the right-hand circle, and the shaded region show which parts each operator selects.
filter()
logical
conditions with equality and inequalityfilter()
logical
conditionsfilter()
logical
conditionsfilter()
logical
conditions