Lecture 12

pandas Basics - Reshaping DataFrames

Byeong-Hak Choe

bchoe@geneseo.edu

SUNY Geneseo

February 24, 2025

Reshaping `DataFrames`

Tidy `DataFrames`

There are three interrelated rules that make a DataFrame tidy:
- Each variable is a column; each column is a variable.
- Each observation is a row; each row is an observation.
- Each value is a cell; each cell is a single value.

Reshaping `DataFrames`

A DataFrame can be given in a format unsuited for the analysis that we would like to perform on it.
- A DataFrame may have larger structural problems that extend beyond the data.
- Perhaps the DataFrame stores its values in a format that makes it easy to extract a single row but difficult to aggregate the data.
Reshaping a DataFrame means manipulating it into a different shape.
In this section, we will discuss pandas techniques for molding a DataFrame into the shape we desire.

Long vs. Wide `DataFrames`

The following DataFrames measure temperatures in two cities over two days.

import pandas as pd
from google.colab import data_table
data_table.enable_dataframe_formatter()

df_wide = pd.DataFrame({
    'Weekday': ['Tuesday', 'Wednesday'],
    'Miami': [80, 83],
    'Rochester': [57, 62],
    'St. Louis': [71, 75]
})

df_long = pd.DataFrame({
    'Weekday': ['Tuesday', 'Wednesday', 'Tuesday', 'Wednesday', 'Tuesday', 'Wednesday'],
    'City': ['Miami', 'Miami', 'Rochester', 'Rochester', 'St. Louis', 'St. Louis'],
    'Temperature': [80, 83, 57, 62, 71, 75]
})

Long vs. Wide `DataFrames`

A DataFrame can store its values in wide or long format.
These names reflect the direction in which the data set expands as we add more values to it.
- A long DataFrame increases in height.
- A wide DataFrame increases in width.

Long vs. Wide `DataFrames`

The optimal storage format for a DataFrame depends on the insight we are trying to glean from it.
- We consider making DataFrames longer if one variable is spread across multiple columns.
- We consider making DataFrames wider if one observation is spread across multiple rows.

Reshaping `DataFrames`

`melt()` and `pivot()`

melt() makes DataFrame longer.
pivot() makes DataFrame wider.

Make `DataFrame` Longer with `melt()`

df_wide_to_long = (
    df_wide
    .melt()
)

Make `DataFrame` Longer with `melt()`

df_wide_to_long = (
    df_wide
    .melt(id_vars = "Weekday")
)

melt() can take a few parameters:
- id_vars is a container (string, list, tuple, or array) that represents the variables that will remain as is.
- id_vars can indicate which column should be the “identifier”.

Make `DataFrame` Longer with `melt()`

df_wide_to_long = (
    df_wide
    .melt(id_vars = "Weekday",
          var_name = "City",
          value_name = "Temperature")
)

melt() can take a few parameters:
- var_name is a string for the name of the variable whose values are taken from column names in a given wide-form DataFrame.
- value_name is a string for the name of the variable whose values are taken from the values in a given wide-form DataFrame.

Make `DataFrame` Wider with `pivot()`

df_long_to_wide = (
    df_long
    .pivot(index = "Weekday",
           columns = "City",
           values = "Temperature"  
        )
    .reset_index()
    )

When using pivot(), we need to specify a few parameters:
- index that takes the column to pivot on;
- columns that takes the column to be used to make the variable names of the wider DataFrame;
- values that takes the column that provides the values of the variables in the wider DataFrame.

Reshaping `DataFrames`

Let’s consider the following wide-form DataFrame, df, containing information about the number of courses each student took from each department in each year.

dict_data = {"Name": ["Donna", "Donna", "Mike", "Mike"],
             "Department": ["ECON", "DANL", "ECON", "DANL"],
             "2018": [1, 2, 3, 1],
             "2019": [2, 3, 4, 2],
             "2020": [5, 1, 2, 2]}
df = pd.DataFrame(dict_data)

df_longer = df.melt(id_vars=["Name", "Department"], 
                    var_name="Year", 
                    value_name="Number of Courses")

The pivot() method can also take a list of variable names for the index parameter.

Reshaping `DataFrames`

Let’s consider the following wide-form DataFrame, df, containing information about the number of courses each student took from each department in each year.

dict_data = {"Name": ["Donna", "Donna", "Mike", "Mike"],
             "Department": ["ECON", "DANL", "ECON", "DANL"],
             "2022": [1, 2, 3, 1],
             "2023": [2, 3, 4, 2],
             "2024": [5, 1, 2, 2]}
df = pd.DataFrame(dict_data)

df_longer = df.melt(id_vars=["Name", "Department"], 
                    var_name="Year", 
                    value_name="Number of Courses")

Q. How can we use the df_longer to create the wide-form DataFrame, df_wider, which is equivalent to the df?

Reshaping `DataFrames`

Let’s do Part 1 of Classwork 7!

Lecture 12

Reshaping DataFrames

Reshaping DataFrames

Tidy DataFrames

Reshaping DataFrames

Long vs. Wide DataFrames

Long vs. Wide DataFrames

Long vs. Wide DataFrames

Reshaping DataFrames

melt() and pivot()

Make DataFrame Longer with melt()

Make DataFrame Longer with melt()

Make DataFrame Longer with melt()

Make DataFrame Wider with pivot()

Reshaping DataFrames

Reshaping DataFrames

Reshaping DataFrames

Reshaping `DataFrames`

Reshaping `DataFrames`

Tidy `DataFrames`

Reshaping `DataFrames`

Long vs. Wide `DataFrames`

Long vs. Wide `DataFrames`

Long vs. Wide `DataFrames`

Reshaping `DataFrames`

`melt()` and `pivot()`

Make `DataFrame` Longer with `melt()`

Make `DataFrame` Longer with `melt()`

Make `DataFrame` Longer with `melt()`

Make `DataFrame` Wider with `pivot()`

Reshaping `DataFrames`

Reshaping `DataFrames`

Reshaping `DataFrames`