Pandas Fundamentals V: Reshaping and Joining DataFrames
April 20, 2026
DataFramesDataFrame
A DataFrame is tidy if it follows three rules:
A tidy DataFrame keeps your data organized, making it easier to understand, analyze, and share in any data analysis.
DataFrames?A DataFrame can be given in a format unsuited for the analysis that we would like to perform on it.
DataFrame may have larger structural problems that extend beyond the data.DataFrame stores its values in a format that makes it easy to extract a single row but difficult to aggregate the data.Reshaping a DataFrame means manipulating it into a different shape.
In this section, we will discuss pandas techniques for molding a DataFrame into the shape we desire.
DataFrames: An ExampleDataFrames measure temperatures in two cities over two days.import pandas as pd
from google.colab import data_table
data_table.enable_dataframe_formatter()
df_wide = pd.DataFrame({
'Weekday': ['Tuesday', 'Wednesday'],
'Miami': [80, 83],
'Rochester': [57, 62],
'St. Louis': [71, 75]
})
df_long = pd.DataFrame({
'Weekday': ['Tuesday', 'Wednesday', 'Tuesday', 'Wednesday', 'Tuesday', 'Wednesday'],
'City': ['Miami', 'Miami', 'Rochester', 'Rochester', 'St. Louis', 'St. Louis'],
'Temperature': [80, 83, 57, 62, 71, 75]
})DataFrames: Height vs. WidthDataFrame can store its values in wide or long format.DataFrame increases in height.DataFrame increases in width.DataFrames: Choosing the Right FormatDataFrame depends on the insight we are trying to glean from it.
DataFrames longer if one variable is spread across multiple columns.DataFrames wider if one observation is spread across multiple rows.melt() and pivot(): Overview
melt() makes DataFrame longer.pivot() makes DataFrame wider.DataFrame Longer with melt(): BasicsDataFrame Longer with melt(): id_varsmelt() can take a few parameters:
id_vars is a container (string, list, tuple, or array) that represents the variables that will remain as is.id_vars can indicate which column should be the “identifier”.DataFrame Longer with melt(): var_name and value_namedf_wide_to_long = (
df_wide
.melt(id_vars = "Weekday",
var_name = "City",
value_name = "Temperature")
)melt() can take a few parameters:
var_name is a string for the name of the variable whose values are taken from column names in a given wide-form DataFrame.value_name is a string for the name of the variable whose values are taken from the values in a given wide-form DataFrame.DataFrame Wider with pivot()df_long_to_wide = (
df_long
.pivot(index = "Weekday",
columns = "City",
values = "Temperature"
)
.reset_index()
)pivot(), we need to specify a few parameters:
index that takes the column to pivot on;columns that takes the column to be used to make the variable names of the wider DataFrame;values that takes the column that provides the values of the variables in the wider DataFrame.DataFrameDataFrame, df, containing information about the number of courses each student took from each department in each year.dict_data = {"Name": ["Donna", "Donna", "Mike", "Mike"],
"Department": ["ECON", "DANL", "ECON", "DANL"],
"2018": [1, 2, 3, 1],
"2019": [2, 3, 4, 2],
"2020": [5, 1, 2, 2]}
df = pd.DataFrame(dict_data)
df_longer = df.melt(id_vars=["Name", "Department"],
var_name="Year",
value_name="Number of Courses")pivot() method can also take a list of variable names for the index parameter.DataFrame, df, containing information about the number of courses each student took from each department in each year.dict_data = {"Name": ["Donna", "Donna", "Mike", "Mike"],
"Department": ["ECON", "DANL", "ECON", "DANL"],
"2022": [1, 2, 3, 1],
"2023": [2, 3, 4, 2],
"2024": [5, 1, 2, 2]}
df = pd.DataFrame(dict_data)
df_longer = df.melt(id_vars=["Name", "Department"],
var_name="Year",
value_name="Number of Courses")Q. How can we use the df_longer to create the wide-form DataFrame, df_wider, which is equivalent to the df?
Let’s do Classwork 15!
DataFramesDataFrame for county-level data and DataFrame for geographic information, such as longitude and latitude.DataFrames based on common data values in those DataFrames.
merge() method in Pandas.DataFrames are called keys.DataFrames with merge(): Setup
An inner join matches pairs of observations whenever their keys are equal:
A left join keeps all observations in x.
A right join keeps all observations in y.
A full join keeps all observations in x and y.
One DataFrame has duplicate keys (a one-to-many relationship).
Both DataFrames have duplicate keys (many-to-many relationship).
left_on and right_onIf the left and right columns do not have the same name for the key variables, we can use the left_on and right_on parameters instead.
Let’s do Classwork 16!