Classwork 17

Pandas Fundamental VI: Group Operation 1

Author

Byeong-Hak Choe

Published

May 4, 2026

Modified

May 8, 2026

Direction

The dataset ,cereals_oatmeal.csv,(with its pathname https://bcdanl.github.io/data/cereals_oatmeal.csv) is a listing of 77 popular breakfast cereals and oatmeal.

import pandas as pd 
import numpy as np

# Below is for an interactive display of DataFrame in Colab
from google.colab import data_table
data_table.enable_dataframe_formatter()

cereal = pd.read_csv('https://bcdanl.github.io/data/cereals_oatmeal.csv')

Question 1

Group the data by Manufacturer, and determine the number of groups and the number of cereals per group.

Show answer

cereal['Manufacturer'].nunique()

len(
    cereal.groupby('Manufacturer')
)

(
    cereal
    .groupby('Manufacturer')
    .size()
)

(
    cereal['Manufacturer']
    .value_counts()
)

Question 2

Calculate the mean of the Calories, Fiber, and Sugars for every manufacturer.

Show answer

q2 = (
    cereal
    .groupby('Manufacturer')
    .agg(
        Calories_mean = ("Calories", "mean"),
        Fiber_mean = ("Fiber", "mean"),
        Sugars_mean = ("Sugars", "mean"),

    )
)

q2

Question 3

Create a new DataFrame that includes the maximum Sugars and the minimum Fiber per manufacturer.

Show answer

q3 = (
    cereal
    .groupby('Manufacturer')
    .agg(
        Sugars_max = ("Sugars", "max"),
        Fiber_min = ("Fiber", "min")
    )
)

q3

Question 4

Add a Normalized_Sugars variable to the cereal DataFrame that standardizes each cereal’s sugar content within its manufacturer group.

\[ \text{Normalized\_Sugars} = \frac{\text{Sugars} - \text{mean(Sugars)}}{\text{std(Sugars)}} \]

This formula adjusts the sugar content of each product by subtracting the mean sugar content of its manufacturer and then dividing by the standard deviation of the sugar content within its manufacturer.

Show answer

cereal['Normalized_Sugars'] = (
    cereal
    .groupby('Manufacturer')['Sugars']
    .transform(
        lambda s: ( s - s.mean() ) / s.std()
    )
)

cereal

Question 5

Put the two highest-sugar cereals for every manufacturer in a new DataFrame.

Show answer

q5 = (
    cereal
    .groupby('Manufacturer')
    .apply(
        lambda df: df.nlargest(2, "Sugars", keep = "all")
    )
)

q5

Question 6

Compute the correlation between Calories and Sugars for each manufacturer.
- corr() calculates the correlation between SERIES_1 and SERIES_2.

cereal["Calories"].corr( cereal["Sugars"] ) # returns a correlation value
cereal[["Calories", "Sugars"]].corr() # returns a DataFrame of correlation matrix
cereal[["Calories", "Sugars", "Fiber"]].corr() # returns a DataFrame of correlation matrix

Show answer

q6 = (
    cereal
    .groupby('Manufacturer')[["Calories", "Sugars"]]
    .corr()
)

q6

Question 7

For each manufacturer, choose one “representative cereal” using these rules:

First, calculate the manufacturer’s average calories.
Keep only cereals with calories below the manufacturer’s average.
If a manufacturer has no cereals below its average calories, use all cereals from that manufacturer instead.
Among the remaining cereals, choose the cereal with the highest fiber.
If there is a tie in fiber, choose the cereal with the lowest sugars.
If there is still a tie, choose the cereal with the alphabetically first name.
Return the full row for the selected cereal.

Show answer

def rep_cereal(df):
  avg = df['Calories'].mean()

  candidate =  df[ df['Calories'] < avg ]

  if len(candidate) == 0:
    candidate = df

  candidate = candidate.sort_values(['Fiber', 'Sugars', 'Name'],
                                    ascending = [False, True, True]).iloc[0]

  return candidate

q7 = (
    cereal
    .groupby('Manufacturer')
    .apply(rep_cereal)
)

q7

Discussion

Welcome to our Classwork 17 Discussion Board! 👋

This space is designed for you to engage with your classmates about the material covered in Classwork 17.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 17 materials or need clarification on any points, don’t hesitate to ask here.

Let’s collaborate and learn from each other!