= pd.read_csv('https://bcdanl.github.io/data/NY_pinc_wide.csv') ny_pincp
Classwork 7
Pandas Basics - Reshaping DataFrames; Joining DataFrames; Concatenating Rows and Columns
Part 1 - Reshaping DataFrames
Question 1
The following is the ny_pincp
DataFrame:
Make ny_pincp
longer.
Answer:
Question 2
= pd.read_csv('https://bcdanl.github.io/data/covid19_cases.csv') covid
The following is the covid
DataFrame:
- Make a wide-form DataFrame of
covid
whose variable names are fromcountriesAndTerritories
and values are fromcases
.
Answer:
Part 2 - Joining DataFrames
The CSV files are related each other, as described above.
= pd.read_csv("https://bcdanl.github.io/data/flights.zip")
flights = pd.read_csv("https://bcdanl.github.io/data/flights-airlines.csv")
airlines = pd.read_csv("https://bcdanl.github.io/data/flights-airports.csv")
airports = pd.read_csv("https://bcdanl.github.io/data/flights-planes.csv")
planes = pd.read_csv("https://bcdanl.github.io/data/flights-weather.csv") weather
Variables in flights DataFrame
year
,month
,day
: Date of departure.dep_time
,arr_time
: Actual departure and arrival times (format HHMM or HMM), local tz.sched_dep_time
,sched_arr_time
: Scheduled departure and arrival times (format HHMM or HMM), local tz.dep_delay
,arr_delay
: Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.carrier
: Two letter carrier abbreviation. See airlines DataFrame to get full names.flight
: Flight number.tailnum
: Plane tail number. See planes DataFrame for additional metadata.origin
,dest
: Origin and destination. Seeairports
DataFrame for additional metadata.air_time
: Amount of time spent in the air, in minutes.distance
: Distance between airports, in miles.hour
,minute
: Time of scheduled departure broken into hour and minutes.time_hour
: Scheduled date and hour of the flight as a datetime64. Along withorigin
, can be used to join flights data toweather
DataFrame
Question 3
The following is the flights
DataFrame:
The following is the weather
DataFrame:
Merge the flights
and weather
DataFrames so that all observations from flights
are retained in the resulting DataFrame.
Answer:
Question 4
The following is the airlines
DataFrame:
Identify the airlines (by full name) that have the top five dep_delay
values in the flights
DataFrame.
Answer:
Question 5
Consider the following two airlines:
- Delta Air Lines Inc. (DL)
- United Air Lines Inc. (UA)
Determine which airline has a higher proportion of flights with a
dep_delay
greater than 30 minutes.Hint:
numpy.where()
is like an if-else statement, which can be very useful.- The following is an example of
numpy.where()
:
- The following is an example of
import pandas as pd
import numpy as np
# Sample DataFrame with temperatures
= pd.DataFrame({
df 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
'Temperature': [12, 22, 7, 25, 30]
})
# Categorize temperatures using numpy.where
'Category'] = np.where(df['Temperature'] < 10, 'Cold', 'Not Cold') df[
Answer:
Part 3 - Concatenating Rows and Columns
The following is the student_data1
DataFrame:
= pd.DataFrame({
student_data1 'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
'marks': [200, 210, 190, 222, 199]})
The following is the student_data2
DataFrame:
= pd.DataFrame({
student_data2 'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'],
'marks': [201, 200, 198, 219, 201]})
Question 6
- Write a Pandas code to concatenate the two given DataFrames along rows.
Answer:
Question 7
- Write a Pandas code to concatenate the two given DataFrames along columns.
Answer:
Question 8
- Consider the following Pandas
Series
:
= pd.Series(['S6', 'Scarlette Fisher', 205], index=['student_id', 'name', 'marks']) s6
Write a Pandas code to append
s6
to the DataFramestudent_data1
.Hint 1: Consider creating a DataFrame of
s6
.
pd.DataFrame(s6)
- Hint 2:
DataFrame.T
returns the transpose of theDataFrame
, swapping rows and columns.
Discussion
Welcome to our Classwork 7 Discussion Board! 👋
This space is designed for you to engage with your classmates about the material covered in Classwork 7.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 7 materials or need clarification on any points, don’t hesitate to ask here.
All comments will be stored here.
Let’s collaborate and learn from each other!