pandas Basics - Reshaping DataFrames
February 24, 2025
DataFramesDataFramesDataFrames
DataFrame tidy:
DataFramesA DataFrame can be given in a format unsuited for the analysis that we would like to perform on it.
DataFrame may have larger structural problems that extend beyond the data.DataFrame stores its values in a format that makes it easy to extract a single row but difficult to aggregate the data.Reshaping a DataFrame means manipulating it into a different shape.
In this section, we will discuss pandas techniques for molding a DataFrame into the shape we desire.
DataFramesDataFrames measure temperatures in two cities over two days.import pandas as pd
from google.colab import data_table
data_table.enable_dataframe_formatter()
df_wide = pd.DataFrame({
'Weekday': ['Tuesday', 'Wednesday'],
'Miami': [80, 83],
'Rochester': [57, 62],
'St. Louis': [71, 75]
})
df_long = pd.DataFrame({
'Weekday': ['Tuesday', 'Wednesday', 'Tuesday', 'Wednesday', 'Tuesday', 'Wednesday'],
'City': ['Miami', 'Miami', 'Rochester', 'Rochester', 'St. Louis', 'St. Louis'],
'Temperature': [80, 83, 57, 62, 71, 75]
})DataFramesDataFrame can store its values in wide or long format.DataFrame increases in height.DataFrame increases in width.DataFramesDataFrame depends on the insight we are trying to glean from it.
DataFrames longer if one variable is spread across multiple columns.DataFrames wider if one observation is spread across multiple rows.DataFramesmelt() and pivot()
melt() makes DataFrame longer.pivot() makes DataFrame wider.DataFrame Longer with melt()DataFrame Longer with melt()melt() can take a few parameters:
id_vars is a container (string, list, tuple, or array) that represents the variables that will remain as is.id_vars can indicate which column should be the “identifier”.DataFrame Longer with melt()df_wide_to_long = (
df_wide
.melt(id_vars = "Weekday",
var_name = "City",
value_name = "Temperature")
)melt() can take a few parameters:
var_name is a string for the name of the variable whose values are taken from column names in a given wide-form DataFrame.value_name is a string for the name of the variable whose values are taken from the values in a given wide-form DataFrame.DataFrame Wider with pivot()df_long_to_wide = (
df_long
.pivot(index = "Weekday",
columns = "City",
values = "Temperature"
)
.reset_index()
)pivot(), we need to specify a few parameters:
index that takes the column to pivot on;columns that takes the column to be used to make the variable names of the wider DataFrame;values that takes the column that provides the values of the variables in the wider DataFrame.DataFramesDataFrame, df, containing information about the number of courses each student took from each department in each year.dict_data = {"Name": ["Donna", "Donna", "Mike", "Mike"],
"Department": ["ECON", "DANL", "ECON", "DANL"],
"2018": [1, 2, 3, 1],
"2019": [2, 3, 4, 2],
"2020": [5, 1, 2, 2]}
df = pd.DataFrame(dict_data)
df_longer = df.melt(id_vars=["Name", "Department"],
var_name="Year",
value_name="Number of Courses")pivot() method can also take a list of variable names for the index parameter.DataFramesDataFrame, df, containing information about the number of courses each student took from each department in each year.dict_data = {"Name": ["Donna", "Donna", "Mike", "Mike"],
"Department": ["ECON", "DANL", "ECON", "DANL"],
"2022": [1, 2, 3, 1],
"2023": [2, 3, 4, 2],
"2024": [5, 1, 2, 2]}
df = pd.DataFrame(dict_data)
df_longer = df.melt(id_vars=["Name", "Department"],
var_name="Year",
value_name="Number of Courses")Q. How can we use the df_longer to create the wide-form DataFrame, df_wider, which is equivalent to the df?
DataFramesLet’s do Part 1 of Classwork 7!