= pd.read_csv('https://bcdanl.github.io/data/NY_pinc_wide.csv') ny_pincp
Classwork 7
Reshaping DataFrames
; Joining DataFrames
; Data Visualization
Part 1 - Pivoting DataFrames
Question 1
- Make
ny_pincp
longer.
Answer:
Question 2
- Make a wide-form DataFrame of
covid
whose variable names are fromcountriesAndTerritories
and values are fromcases
.
= pd.read_csv('https://bcdanl.github.io/data/covid19_cases.csv') covid
Answer:
Part 2 - Joining DataFrames
- The CSV files are related each other, as described above.
= pd.read_csv("https://bcdanl.github.io/data/flights.zip")
flights = pd.read_csv("https://bcdanl.github.io/data/airlines.csv")
airlines = pd.read_csv("https://bcdanl.github.io/data/airports.csv")
airports = pd.read_csv("https://bcdanl.github.io/data/planes.csv")
planes = pd.read_csv("https://bcdanl.github.io/data/weather.csv") weather
Variables in flights
DataFrame
year
,month
,day
- Date of departure.
dep_time
,arr_time
- Actual departure and arrival times (format HHMM or HMM), local tz.
sched_dep_time
,sched_arr_time
- Scheduled departure and arrival times (format HHMM or HMM), local tz.
dep_delay
,arr_delay
- Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.
carrier
- Two letter carrier abbreviation. See
airlines
DataFrame to get full names.
- Two letter carrier abbreviation. See
flight
Flight number.tailnum
- Plane tail number. See
planes
DataFrame for additional metadata.
- Plane tail number. See
origin
,dest
- Origin and destination. See
airports
DataFrame for additional metadata.
- Origin and destination. See
air_time
- Amount of time spent in the air, in minutes.
distance
- Distance between airports, in miles.
hour
,minute
- Time of scheduled departure broken into hour and minutes.
time_hour
- Scheduled date and hour of the flight as a
datetime64
. Along with origin, can be used to join flights data toweather
DataFrame
- Scheduled date and hour of the flight as a
Question 3
Merge flights
with weather
.
Answer:
Question 4
Identify the full name of the airline with the highest average dep_delay
, considering only positive delays.
Answer:
Discussion
Welcome to our Classwork 7 Discussion Board! 👋
This space is designed for you to engage with your classmates about the material covered in Classwork 7.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 7 materials or need clarification on any points, don’t hesitate to ask here.
Let’s collaborate and learn from each other!