Classwork 7

Reshaping DataFrames; Joining DataFrames; Data Visualization

Author

Byeong-Hak Choe

Published

March 19, 2024

Modified

March 24, 2024

Part 1 - Pivoting DataFrames

Question 1

  • Make ny_pincp longer.
ny_pincp = pd.read_csv('https://bcdanl.github.io/data/NY_pinc_wide.csv')

Answer:


Question 2

  • Make a wide-form DataFrame of covid whose variable names are from countriesAndTerritories and values are from cases.
covid = pd.read_csv('https://bcdanl.github.io/data/covid19_cases.csv')

Answer:



Part 2 - Joining DataFrames



  • The CSV files are related each other, as described above.
flights = pd.read_csv("https://bcdanl.github.io/data/flights.zip")
airlines = pd.read_csv("https://bcdanl.github.io/data/airlines.csv")
airports = pd.read_csv("https://bcdanl.github.io/data/airports.csv")
planes = pd.read_csv("https://bcdanl.github.io/data/planes.csv")
weather = pd.read_csv("https://bcdanl.github.io/data/weather.csv")

Variables in flights DataFrame

  • year, month, day

    • Date of departure.
  • dep_time, arr_time

    • Actual departure and arrival times (format HHMM or HMM), local tz.
  • sched_dep_time, sched_arr_time

    • Scheduled departure and arrival times (format HHMM or HMM), local tz.
  • dep_delay, arr_delay

    • Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.
  • carrier

    • Two letter carrier abbreviation. See airlines DataFrame to get full names.
  • flight Flight number.

  • tailnum

    • Plane tail number. See planes DataFrame for additional metadata.
  • origin, dest

    • Origin and destination. See airports DataFrame for additional metadata.
  • air_time

    • Amount of time spent in the air, in minutes.
  • distance

    • Distance between airports, in miles.
  • hour, minute

    • Time of scheduled departure broken into hour and minutes.
  • time_hour

    • Scheduled date and hour of the flight as a datetime64. Along with origin, can be used to join flights data to weather DataFrame


Question 3

Merge flights with weather.

Answer:


Question 4

Identify the full name of the airline with the highest average dep_delay, considering only positive delays.

Answer:



Discussion

Welcome to our Classwork 7 Discussion Board! 👋

This space is designed for you to engage with your classmates about the material covered in Classwork 7.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 7 materials or need clarification on any points, don’t hesitate to ask here.

Let’s collaborate and learn from each other!

Back to top