pandas
Basics - Reshaping DataFrames
February 24, 2025
DataFrames
DataFrames
DataFrames
DataFrame
tidy:
DataFrames
A DataFrame
can be given in a format unsuited for the analysis that we would like to perform on it.
DataFrame
may have larger structural problems that extend beyond the data.DataFrame
stores its values in a format that makes it easy to extract a single row but difficult to aggregate the data.Reshaping a DataFrame
means manipulating it into a different shape.
In this section, we will discuss pandas techniques for molding a DataFrame
into the shape we desire.
DataFrames
DataFrames
measure temperatures in two cities over two days.import pandas as pd
from google.colab import data_table
data_table.enable_dataframe_formatter()
df_wide = pd.DataFrame({
'Weekday': ['Tuesday', 'Wednesday'],
'Miami': [80, 83],
'Rochester': [57, 62],
'St. Louis': [71, 75]
})
df_long = pd.DataFrame({
'Weekday': ['Tuesday', 'Wednesday', 'Tuesday', 'Wednesday', 'Tuesday', 'Wednesday'],
'City': ['Miami', 'Miami', 'Rochester', 'Rochester', 'St. Louis', 'St. Louis'],
'Temperature': [80, 83, 57, 62, 71, 75]
})
DataFrames
DataFrame
can store its values in wide or long format.DataFrame
increases in height.DataFrame
increases in width.DataFrames
DataFrame
depends on the insight we are trying to glean from it.
DataFrames
longer if one variable is spread across multiple columns.DataFrames
wider if one observation is spread across multiple rows.DataFrames
melt()
and pivot()
melt()
makes DataFrame
longer.pivot()
makes DataFrame
wider.DataFrame
Longer with melt()
DataFrame
Longer with melt()
melt()
can take a few parameters:
id_vars
is a container (string
, list
, tuple
, or array
) that represents the variables that will remain as is.id_vars
can indicate which column should be the “identifier”.DataFrame
Longer with melt()
df_wide_to_long = (
df_wide
.melt(id_vars = "Weekday",
var_name = "City",
value_name = "Temperature")
)
melt()
can take a few parameters:
var_name
is a string for the name of the variable whose values are taken from column names in a given wide-form DataFrame.value_name
is a string for the name of the variable whose values are taken from the values in a given wide-form DataFrame.DataFrame
Wider with pivot()
df_long_to_wide = (
df_long
.pivot(index = "Weekday",
columns = "City",
values = "Temperature"
)
.reset_index()
)
pivot()
, we need to specify a few parameters:
index
that takes the column to pivot on;columns
that takes the column to be used to make the variable names of the wider DataFrame
;values
that takes the column that provides the values of the variables in the wider DataFrame
.DataFrames
DataFrame
, df
, containing information about the number of courses each student took from each department in each year.dict_data = {"Name": ["Donna", "Donna", "Mike", "Mike"],
"Department": ["ECON", "DANL", "ECON", "DANL"],
"2018": [1, 2, 3, 1],
"2019": [2, 3, 4, 2],
"2020": [5, 1, 2, 2]}
df = pd.DataFrame(dict_data)
df_longer = df.melt(id_vars=["Name", "Department"],
var_name="Year",
value_name="Number of Courses")
pivot()
method can also take a list
of variable names for the index
parameter.DataFrames
DataFrame
, df
, containing information about the number of courses each student took from each department in each year.dict_data = {"Name": ["Donna", "Donna", "Mike", "Mike"],
"Department": ["ECON", "DANL", "ECON", "DANL"],
"2022": [1, 2, 3, 1],
"2023": [2, 3, 4, 2],
"2024": [5, 1, 2, 2]}
df = pd.DataFrame(dict_data)
df_longer = df.melt(id_vars=["Name", "Department"],
var_name="Year",
value_name="Number of Courses")
Q. How can we use the df_longer
to create the wide-form DataFrame
, df_wider
, which is equivalent to the df
?
DataFrames
Let’s do Part 1 of Classwork 7!