Classwork 14

Group Operation II

Author

Byeong-Hak Choe

Published

May 5, 2025

Modified

May 7, 2025

Direction



The dataset ,beer_markets.csv, (with its pathname https://bcdanl.github.io/data/beer_markets.csv) includes information about beer purchases across different markets, brands, and households.

import pandas as pd
import numpy as np
beer = pd.read_csv('https://bcdanl.github.io/data/beer_markets.csv')


  • Each observation in beer is a household-level transaction record for a purchase of beer.

Variable Description

  • hh: an identifier of the household;
  • X_purchase_desc: details on the purchased item;
  • quantity: the number of items purchased;
  • brand: Bud Light, Busch Light, Coors Light, Miller Lite, or Natural Light;
  • dollar_spent: total dollar value of purchase;
  • beer_floz: total volume of beer, in fluid ounces;
  • price_per_floz: price per fl.oz. (i.e., beer spent/beer floz);
  • container: the type of container;
  • promo: Whether the item was promoted (coupon or otherwise);
  • market: Scan-track market (or state if rural);
  • demographic data, including gender, marital status, household income, class of work, race, education, age, the size of household, and whether or not the household has a microwave or a dishwasher.


Question 1

What are the descriptive statistics (mean, count, min, max) for dollar_spent and beer_floz grouped by brand?

Answer:



Question 2

  • For each market, find the average price_per_floz, total dollar_spent, and total beer_floz.
  • Which market has the highest average price_per_floz?



Question 3

  • Compute the share of dollar_spent on each brand as a percentage of the total spending by each household.
    • In the resulting DataFrame, retain only the following three variables:
      1. hh (household ID)
      2. brand
      3. total dollars spent by the household
      4. The calculated** percentage contribution** of spending on that brand
    • Ensure the resulting DataFrame contains only one row per household–brand pair.

Answer:



Question 4

  • Identify the brand with the highest price_per_floz in each market using a custom function with apply().

Answer:



Question 5

  • Among households that have purchased BUD LIGHT at least once, what proportion only bought BUD LIGHT?

  • Among households that have purchased BUSCH LIGHT at least once, what proportion only bought BUSCH LIGHT?

  • Among households that have purchased COORS LIGHT at least once, what proportion only bought COORS LIGHT?

  • Among households that have purchased MILLER LITE at least once, what proportion only bought MILLER LITE?

  • Among households that have purchased NATURAL LIGHT at least once, what proportion only bought NATURAL LIGHT?

  • Which of these beer brands has the highest share of loyal customers, who purchase no other brand?

  • For example, let’s consider the following example DataFrame with the four households:

In this example, households 1, 2 and 3 purchased BUD at least once, and only the households 1 and 3 purchased BUD only. Therefore, among households that have purchased BUD LIGHT at least once, the proportion only bought BUD is 2/3 = 66.67%.

Answer:



Discussion

Welcome to our Classwork 14 Discussion Board! 👋

This space is designed for you to engage with your classmates about the material covered in Classwork 14.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 14 materials or need clarification on any points, don’t hesitate to ask here.

Let’s collaborate and learn from each other!

Back to top