left_join()

Classwork 3

Author

Byeong-Hak Choe

Published

September 26, 2024

Modified

October 5, 2024

Question 1

  • Install the nycflights13 R package and load it into your R session in your Posit Cloud project.

Answer:

# Install the nycflights13 package
install.packages("nycflights13")

# Load the package into your R session
library(nycflights13)

Answer: In this step, we first install the nycflights13 package using install.packages(), which contains datasets related to flights in and out of New York City in 2013. Once installed, we load the package into the R session with library(nycflights13), making the flights and airlines data.frames available for analysis.



Question 2

  • The nycflights13 package provides two data.frames: flights and airlines, which are related by the carrier variable.
    • carrier: A two-letter abbreviation indicating the full name of the airline.
  • Use the left_join() function to create a new data.frame, flight_airline, that includes all observations and variables from the flights data.frame, along with the name variable from the airlines data.frame that corresponds to the carrier variable in the flights data.frame.

Answer:

library(tidyverse)

# Perform left join to merge flights and airlines data.frames
flight_airline <- flights |> left_join(airlines)

# View the first few rows of the new data.frame
head(flight_airline)
# A tibble: 6 × 20
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      542            540         2      923            850
4  2013     1     1      544            545        -1     1004           1022
5  2013     1     1      554            600        -6      812            837
6  2013     1     1      554            558        -4      740            728
# ℹ 12 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>, name <chr>

Answer: Here, we use the left_join() function from the one of the packages in tidyverse. The left_join() function merges two data.frames based on a common key variable—in this case, the carrier variable, which is present in both the flights and airlines data.frames. This operation adds the name variable (airline name) from the airlines data.frame to the flights data.frame, while keeping all observations and variables from flights. The result is stored in flight_airline.



Discussion

Welcome to our Classwork 3 Discussion Board! 👋

This space is designed for you to engage with your classmates about the material covered in Classwork 3.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) or peer classmate (@GitHub-Username) regarding the Classwork 3 materials or need clarification on any points, don’t hesitate to ask here.

Let’s collaborate and learn from each other!

Back to top