library(tidyverse)
<- nycflights13::flights
flights
# Flights with arrival delay of 2 or more hours
<- flights |> filter(arr_delay >= 120) # The unit of the `arr_delay` variable is in a minute.
delayed_flights
# Flights to Houston (IAH or HOU)
<- flights |> filter( dest == "IAH" | dest == "HOU" )
houston_flights
# Flights that departed in summer (July, August, September)
<- flights |> filter( month == 7 | month == 8 | month == 9 )
summer_flights
# Flights that arrived more than two hours late but didn’t leave late
<- flights |> filter( arr_delay > 120 & dep_delay <= 0 )
late_arrival_on_time_departure
# Flights that departed between midnight and 6am (inclusive)
<- flights |> filter( (dep_time >= 0 & dep_time <= 600) | dep_time == 2400 ) early_morning_flights
filter()
, arrange()
, and distinct()
Classwork 4
Question 1
- Find all flights that had an arrival delay of two or more hours
- Find all flights that flew to Houston (
IAH
orHOU
) - Find all flights that departed in summer (July, August, and September)
- Find all flights that arrived more than two hours late, but didn’t leave late
- Find all flights that departed between midnight and 6am (inclusive)
Answer:
filter(arr_delay >= 120)
: Finds flights with an arrival delay of two or more hours.filter( dest == "IAH" | dest == "HOU" )
: Filters flights flying to Houston by checking if thedest
variable matches “IAH” or “HOU”.filter( month == 7 | month == 8 | month == 9 )
: Filters flights based on themonth
variable for July, August, and September.filter(arr_delay > 120 & dep_delay <= 0)
: Filters flights that arrived more than two hours late but left on time or early.filter( (dep_time >= 0 & dep_time <= 600) | dep_time == 2400)
: Filters flights departing between midnight and 6am (using military time). The conditiondep_time == 2400
is included because, in this data.frame, midnight is represented as2400
rather than0
.- This last question involves more advanced techniques, so it won’t be included in the exams at this level.
Question 2
- How many flights have a missing
dep_time
?
Answer:
<- flights |> filter(is.na(dep_time))
missing_dep_time_flights <- nrow(missing_dep_time_flights)
n_missing_dep_time n_missing_dep_time
[1] 8255
We use filter(is.na(dep_time))
to find flights where the dep_time
is missing, and nrow()
to count the number of such flights.
Question 3
- Sort flights to find the most delayed flights.
Answer:
# either dep_delay, arr_delay, or both can be used for this task
<- flights |> arrange(desc(dep_delay)) most_delayed_flights
arrange(desc(dep_delay))
sorts flights in descending order of departure delay (dep_delay
), placing the flights with the longest departure delays at the top.
Question 4
- Was there a flight on every day of 2013?
Answer:
# checking there is only one unique value in year variable, that is 2013.
|> distinct(year) # tibble is another name of data.frame in R tidyverse flights
# A tibble: 1 × 1
year
<int>
1 2013
<- flights |> distinct(month, day)
flights_per_day <- nrow(flights_per_day)
num_days_2013 == 365 num_days_2013
[1] TRUE
- We use
distinct(month, day)
to find unique combinations ofmonth
andday
, and then count the number of distinct days usingnrow()
. We check if this equals 365 to determine if there was a flight on every day of the year.nrow()
returns the number of observations in a data.frame.
Discussion
Welcome to our Classwork 4 Discussion Board! 👋
This space is designed for you to engage with your classmates about the material covered in Classwork 4.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) or peer classmate (@GitHub-Username) regarding the Classwork 4 materials or need clarification on any points, don’t hesitate to ask here.
Let’s collaborate and learn from each other!