Homework 4 - Part 1: Lasso Linear Regerssion - Model 3 with Discussions

Beer Markets with Big Demographic Design

Author

Byeong-Hak Choe

Published

April 19, 2025

Modified

May 3, 2025

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import scale # zero mean & one s.d.
from sklearn.linear_model import LassoCV, lasso_path
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

#beer = pd.read_csv("https://bcdanl.github.io/data/beer_markets_xbeer_xdemog.zip")
#beer = pd.read_csv("https://bcdanl.github.io/data/beer_markets_xbeer_brand_xdemog.zip")
beer = pd.read_csv("https://bcdanl.github.io/data/beer_markets_xbeer_brand_promo_xdemog.zip")

X = beer.drop('ylogprice', axis = 1)
y = beer['ylogprice']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

y_train = y_train.values
y_test = y_test.values

# LassoCV with a range of alpha values
lasso_cv = LassoCV(n_alphas = 100,
                   alphas = None, # alphas=None automatically generate 100 candidate alpha values
                   cv = 5,
                   random_state=42,
                   max_iter=100000)

lasso_cv.fit(X_train.values, y_train)
#lasso_cv.fit(X_train.values, y_train.ravel())
print("LassoCV - Best alpha:", lasso_cv.alpha_)

LassoCV - Best alpha: 0.00022539867869301466

# Create a DataFrame including the intercept and the coefficients:
coef_lasso_beer = pd.DataFrame({
    'predictor': list(X_train.columns),
    'coefficient':  list(lasso_cv.coef_),
    'exp_coefficient': np.exp(  list(lasso_cv.coef_) )
})


# Evaluate
y_pred_lasso = lasso_cv.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
print("LassoCV - MSE:", mse_lasso)

LassoCV - MSE: 0.02863537005154527

coef_lasso_beer_n0 = coef_lasso_beer[coef_lasso_beer['coefficient'] != 0]

X_train.shape[1]

coef_lasso_beer_n0.shape[0]

coef_lasso_beer_n0

	predictor	coefficient	exp_coefficient
0	logquantity	-0.136277	0.872601
1	container_CAN	-0.056243	0.945309
2	brandBUSCH_LIGHT	-0.072086	0.930451
4	brandMILLER_LITE	0.001853	1.001854
5	brandNATURAL_LIGHT	-0.457505	0.632861
...	...	...	...
2372	marketRURAL_WEST_VIRGINIA:npeople2	-0.032313	0.968203
2385	marketTAMPA:npeople2	0.018947	1.019127
2386	marketURBAN_NY:npeople2	0.001501	1.001502
2426	marketRALEIGH-DURHAM:npeople3	-0.005387	0.994627
2582	marketEXURBAN_NJ:npeople5plus	0.080731	1.084079

163 rows × 3 columns

# Compute the mean and standard deviation of the CV errors for each alpha.
mean_cv_errors = np.mean(lasso_cv.mse_path_, axis=1)
std_cv_errors = np.std(lasso_cv.mse_path_, axis=1)

plt.figure(figsize=(8, 6))
plt.errorbar(lasso_cv.alphas_, mean_cv_errors, yerr=std_cv_errors, marker='o', linestyle='-', capsize=5)
plt.xscale('log')
plt.xlabel('Alpha')
plt.ylabel('Mean CV Error (MSE)')
plt.title('Cross-Validation Error vs. Alpha')
#plt.gca().invert_xaxis()  # Optionally invert the x-axis so lower alphas (less regularization) appear to the right.
plt.axvline(x=lasso_cv.alpha_, color='red', linestyle='--', label='Best alpha')
plt.legend()
plt.show()

# Compute the coefficient path over the alpha grid that LassoCV used
alphas, coefs, _ = lasso_path(X_train, y_train,
                              alphas=lasso_cv.alphas_,
                              max_iter=100000)

# Count nonzero coefficients for each alpha (coefs shape: (n_features, n_alphas))
nonzero_counts = np.sum(coefs != 0, axis=0)

# Plot the number of nonzero coefficients versus alpha
plt.figure(figsize=(8,6))
plt.plot(alphas, nonzero_counts, marker='o', linestyle='-')
plt.xscale('log')
plt.xlabel('Alpha')
plt.ylabel('Number of nonzero coefficients')
plt.title('Nonzero Coefficients vs. Alpha')
#plt.gca().invert_xaxis()  # Lower alphas (less regularization) on the right
plt.axvline(x=lasso_cv.alpha_, color='red', linestyle='--', label='Best alpha')
plt.legend()
plt.show()

# Compute the lasso path. Note: we use np.log(y_train) because that's what you used in LassoCV.
alphas, coefs, _ = lasso_path(X_train, y_train,
                              alphas=lasso_cv.alphas_,
                              max_iter=100000)
plt.figure(figsize=(8, 6))
# Iterate over each predictor and plot its coefficient path.
for i, col in enumerate(X_train.columns):
    plt.plot(alphas, coefs[i, :], label=col)

plt.xscale('log')
plt.xlabel('Alpha')
plt.ylabel('Coefficient value')
plt.title('Lasso Coefficient Paths')
#plt.gca().invert_xaxis()  # Lower alphas (weaker regularization) to the right.
#plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.axvline(x=lasso_cv.alpha_, color='red', linestyle='--', label='Best alpha')
#plt.legend()
plt.show()

Best Alphas

Model 1: 0.000082

Model 2: 0.000225

Model 3: 0.000225

Sensitivity Estimates without Demographic Controls (Homework 2)

	Model 1	Model 2	Model 3 (no Promo)	Model 3 (with Promo)
BUD	-0.142	-0.146	-0.140	-0.148 = −0.140 − 0.008
BUSCH	-0.142	-0.159 = −0.146 − 0.013	-0.161 = −0.140 − 0.021	-0.122 = −0.140 − 0.021 − 0.008 + 0.047
COORS	-0.142	-0.146 = −0.146 − 0	-0.148 = −0.140 − 0.008	-0.119 = −0.140 − 0.008 − 0.008 + 0.037
MILLER	-0.142	-0.163 = −0.146 − 0.017	-0.163 = −0.140 − 0.023	-0.119 = −0.140 − 0.023 − 0.008 + 0.052
NATURAL	-0.142	-0.094 = −0.146 + 0.052	-0.103 = −0.140 + 0.037	-0.040 = −0.140 + 0.037 − 0.008 + 0.071

Sensitivity Estimates with Demographic Controls

	Model 1	Model 2	Model 3 (no Promo)	Model 3 (with Promo)
BUD	-0.1429	-0.1408	-0.1363	-0.1403 = −0.1363 + 0.0040
BUSCH	-0.1429	-0.1719 = −0.1408 − 0.0311	-0.1708 = −0.1363 − 0.0345	-0.1418 = −0.1363 − 0.0345 + 0.0040 + 0.025
COORS	-0.1429	-0.1414 = −0.1408 − 0.0006	-0.1365 = −0.1363 − 0.0002	-0.1355 = −0.1363 − 0.0002 + 0.0040 - 0.0030
MILLER	-0.1429	-0.1443 = −0.1408 − 0.0035	-0.1401 = −0.1363 − 0.0038	-0.1353 = −0.1363 − 0.0038 + 0.0040 + 0.0008
NATURAL	-0.1429	-0.1113 = −0.1408 + 0.0295	-0.1105 = −0.1363 + 0.0258	-0.1058 = −0.1363 + 0.0258 + 0.0040 + 0.0007

Sensitivity Visuals with & without Demographic Controls

# Slopes (inverse elasticities)
model1_slope = -0.142
model2_slopes = {
    'Bud': -0.146,
    'Busch': -0.159,
    'Coors': -0.146,
    'Miller': -0.163,
    'Natural': -0.094
}
model3_full_slopes = {
    'Bud': -0.140,
    'Busch': -0.161,
    'Coors': -0.148,
    'Miller': -0.163,
    'Natural': -0.103
}
model3_promo_slopes = {
    'Bud': -0.148,
    'Busch': -0.122,
    'Coors': -0.119,
    'Miller': -0.096,
    'Natural': -0.077
}

model1_demog_slope = -0.1429
model2_demog_slopes = {
    'Bud': -0.1408,
    'Busch': -0.1719,
    'Coors': -0.1414,
    'Miller': -0.1443,
    'Natural': -0.1113
}
model3_demog_full_slopes = {
    'Bud': -0.1363,
    'Busch': -0.1708,
    'Coors': -0.1365,
    'Miller': -0.1401,
    'Natural': -0.1105
}
model3_demog_promo_slopes = {
    'Bud': -0.1403,
    'Busch': -0.1418,
    'Coors': -0.1355,
    'Miller': -0.1353,
    'Natural': -0.1058
}

# Create range of log(sales) values
x = np.linspace(0, 10, 100)

# Set up a 2x4 grid of subplots
fig, axes = plt.subplots(2, 4, figsize=(16, 8), sharex=True, sharey=True)

# First row: models without demographics
# Model 1
ax = axes[0, 0]
ax.plot(x, model1_slope * x, color='tab:blue', label='All Brands')
ax.set_title('Model 1')
ax.set_ylabel('log(price)')
ax.grid(True)
ax.legend(loc='lower left')

# Model 2
ax = axes[0, 1]
for brand, slope in model2_slopes.items():
    ax.plot(x, slope * x, label=brand)
ax.set_title('Model 2')
ax.grid(True)
ax.legend(loc='lower left')

# Model 3 Full (No Promo)
ax = axes[0, 2]
for brand, slope in model3_full_slopes.items():
    ax.plot(x, slope * x, label=brand)
ax.set_title('Model 3 Full')
ax.grid(True)
ax.legend(loc='lower left')

# Model 3 Promo
ax = axes[0, 3]
for brand, slope in model3_promo_slopes.items():
    ax.plot(x, slope * x, label=brand)
ax.set_title('Model 3 Promo')
ax.grid(True)
ax.legend(loc='lower left')

# Second row: models with demographics
# Model 1 + Demog
ax = axes[1, 0]
ax.plot(x, model1_demog_slope * x, color='tab:orange', label='All Brands + Demog')
ax.set_title('Model 1 (+ Demog)')
ax.set_ylabel('log(price)')
ax.set_xlabel('log(sales)')
ax.grid(True)
ax.legend(loc='lower left')

# Model 2 + Demog
ax = axes[1, 1]
for brand, slope in model2_demog_slopes.items():
    ax.plot(x, slope * x, label=brand)
ax.set_title('Model 2 (+ Demog)')
ax.set_xlabel('log(sales)')
ax.grid(True)
ax.legend(loc='lower left')

# Model 3 Full + Demog
ax = axes[1, 2]
for brand, slope in model3_demog_full_slopes.items():
    ax.plot(x, slope * x, label=brand)
ax.set_title('Model 3 Full (+ Demog)')
ax.set_xlabel('log(sales)')
ax.grid(True)
ax.legend(loc='lower left')

# Model 3 Promo + Demog
ax = axes[1, 3]
for brand, slope in model3_demog_promo_slopes.items():
    ax.plot(x, slope * x, label=brand)
ax.set_title('Model 3 Promo (+ Demog)')
ax.set_xlabel('log(sales)')
ax.grid(True)
ax.legend(loc='lower left')

plt.tight_layout()
plt.show()

How Demographic Controls Affect Price–Volume Sensitivity

Adding demographic covariates shifts the implied own‑price elasticities by varying amounts across models and brands.

Model 1 (Common Elasticity)

Specification	Elasticity
Without demographics	–0.1420
With demographics	–0.1429
Δ	–0.0009
Effect	Slightly more elastic (negligible change)

Model 2 (Brand‑Specific Elasticities)

Brand	Without Demog	With Demog	Δ	Effect
Bud	–0.1460	–0.1408	+0.0052	Less elastic
Busch	–0.1590	–0.1719	–0.0129	More elastic
Coors	–0.1460	–0.1414	+0.0046	Less elastic
Miller	–0.1630	–0.1443	+0.0187	Much less elastic
Natural	–0.0940	–0.1113	–0.0173	More elastic

Interpretation: Demographics slightly dampen sensitivity for Bud, Coors, and Miller but increase it for Busch and Natural.

Model 3 Full‑Price (No Promo)

Brand	Without Demog	With Demog	Δ	Effect
Bud	–0.1400	–0.1363	+0.0037	Less elastic
Busch	–0.1610	–0.1708	–0.0098	More elastic
Coors	–0.1480	–0.1365	+0.0115	Less elastic
Miller	–0.1630	–0.1401	+0.0229	Much less elastic
Natural	–0.1030	–0.1105	–0.0075	More elastic

Interpretation: Similar pattern to Model 2; demographic controls modestly increase or decrease brand‑specific slopes.

Model 3 Promotional‑Price

Brand	Without Demog	With Demog	Δ	Effect
Bud	–0.1480	–0.1403	+0.0077	Less elastic
Busch	–0.1220	–0.1418	–0.0198	More elastic
Coors	–0.1190	–0.1355	–0.0165	More elastic
Miller	–0.0960	–0.1353	–0.0393	Much more elastic
Natural	–0.0770	–0.1058	–0.0288	More elastic

Interpretation: Demographic adjustments have the largest impact here, especially increasing sensitivity for Miller and others.

Summary

Model 1: Virtually no change when demographics are added.
Models 2 & 3 without Promo: Elasticities shift modestly by brand—some become slightly more price‑sensitive, others slightly less.
Model 3 Promo: Most brands become notably more elastic once demographics are included. This implies that consumer characteristics play a key role in how promotions affect beer pricing.

Test Set RMSE Comparison

Model	RMSE (no demogs)	RMSE (with demogs)
1	0.167	0.0278
2	0.166	0.0290
3	0.165	0.0286

Including demographic controls dramatically lowers the prediction error for all three models, implying that adding demogs substantially improves prediction performance.