Homework 2

Beer Markets

Author

Byeong-Hak Choe

Published

March 5, 2025

Modified

April 19, 2025

Settings

Required Libraries and Spark Session

import pandas as pd
import numpy as np
from tabulate import tabulate  # for table summary
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm  # for lowess smoothing

from pyspark.sql import SparkSession
from pyspark.sql.functions import rand, col, pow, mean, avg, when, log, sqrt, exp
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression

spark = SparkSession.builder.master("local[*]").getOrCreate()

UDFs

regression_table

Code
def regression_table(model, assembler):
    """
    Creates a formatted regression table from a fitted LinearRegression model and its VectorAssembler.

    If the model’s labelCol (retrieved using getLabelCol()) starts with "log", an extra column showing np.exp(coeff)
    is added immediately after the beta estimate column for predictor rows. Additionally, np.exp() of the 95% CI
    Lower and Upper bounds is also added unless the predictor's name includes "log_". The Intercept row does not
    include exponentiated values.

    When labelCol starts with "log", the columns are ordered as:
        y: [label] | Beta | Exp(Beta) | Sig. | Std. Error | p-value | 95% CI Lower | 95% CI Upper | Exp(95% CI Lower) | Exp(95% CI Upper)

    Otherwise, the columns are:
        y: [label] | Beta | Sig. | Std. Error | p-value | 95% CI Lower | 95% CI Upper

    Parameters:
        model: A fitted LinearRegression model (with a .summary attribute and a labelCol).
        assembler: The VectorAssembler used to assemble the features for the model.

    Returns:
        A formatted string containing the regression table.
    """
    # Determine if we should display exponential values for coefficients.
    is_log = model.getLabelCol().lower().startswith("log")

    # Extract coefficients and standard errors as NumPy arrays.
    coeffs = model.coefficients.toArray()
    std_errors_all = np.array(model.summary.coefficientStandardErrors)

    # Check if the intercept's standard error is included (one extra element).
    if len(std_errors_all) == len(coeffs) + 1:
        intercept_se = std_errors_all[0]
        std_errors = std_errors_all[1:]
    else:
        intercept_se = None
        std_errors = std_errors_all

    # Use provided tValues and pValues.
    df = model.summary.numInstances - len(coeffs) - 1
    t_critical = stats.t.ppf(0.975, df)
    p_values = model.summary.pValues

    # Helper: significance stars.
    def significance_stars(p):
        if p < 0.01:
            return "***"
        elif p < 0.05:
            return "**"
        elif p < 0.1:
            return "*"
        else:
            return ""

    # Build table rows for each feature.
    table = []
    for feature, beta, se, p in zip(assembler.getInputCols(), coeffs, std_errors, p_values):
        ci_lower = beta - t_critical * se
        ci_upper = beta + t_critical * se

        # Check if predictor contains "log_" to determine if exponentiation should be applied
        apply_exp = is_log and "log_" not in feature.lower()

        exp_beta = np.exp(beta) if apply_exp else ""
        exp_ci_lower = np.exp(ci_lower) if apply_exp else ""
        exp_ci_upper = np.exp(ci_upper) if apply_exp else ""

        if is_log:
            table.append([
                feature,            # Predictor name
                beta,               # Beta estimate
                exp_beta,           # Exponential of beta (or blank)
                significance_stars(p),
                se,
                p,
                ci_lower,
                ci_upper,
                exp_ci_lower,       # Exponential of 95% CI lower bound
                exp_ci_upper        # Exponential of 95% CI upper bound
            ])
        else:
            table.append([
                feature,
                beta,
                significance_stars(p),
                se,
                p,
                ci_lower,
                ci_upper
            ])

    # Process intercept.
    if intercept_se is not None:
        intercept_p = model.summary.pValues[0] if model.summary.pValues is not None else None
        intercept_sig = significance_stars(intercept_p)
        ci_intercept_lower = model.intercept - t_critical * intercept_se
        ci_intercept_upper = model.intercept + t_critical * intercept_se
    else:
        intercept_sig = ""
        ci_intercept_lower = ""
        ci_intercept_upper = ""
        intercept_se = ""

    if is_log:
        table.append([
            "Intercept",
            model.intercept,
            "",                    # Removed np.exp(model.intercept)
            intercept_sig,
            intercept_se,
            "",
            ci_intercept_lower,
            "",
            ci_intercept_upper,
            ""
        ])
    else:
        table.append([
            "Intercept",
            model.intercept,
            intercept_sig,
            intercept_se,
            "",
            ci_intercept_lower,
            ci_intercept_upper
        ])

    # Append overall model metrics.
    if is_log:
        table.append(["Observations", model.summary.numInstances, "", "", "", "", "", "", "", ""])
        table.append(["R²", model.summary.r2, "", "", "", "", "", "", "", ""])
        table.append(["RMSE", model.summary.rootMeanSquaredError, "", "", "", "", "", "", "", ""])
    else:
        table.append(["Observations", model.summary.numInstances, "", "", "", "", ""])
        table.append(["R²", model.summary.r2, "", "", "", "", ""])
        table.append(["RMSE", model.summary.rootMeanSquaredError, "", "", "", "", ""])

    # Format the table rows.
    formatted_table = []
    for row in table:
        formatted_row = []
        for i, item in enumerate(row):
            # Format Observations as integer with commas.
            if row[0] == "Observations" and i == 1 and isinstance(item, (int, float, np.floating)) and item != "":
                formatted_row.append(f"{int(item):,}")
            elif isinstance(item, (int, float, np.floating)) and item != "":
                if is_log:
                    # When is_log, the columns are:
                    # 0: Metric, 1: Beta, 2: Exp(Beta), 3: Sig, 4: Std. Error, 5: p-value,
                    # 6: 95% CI Lower, 7: 95% CI Upper, 8: Exp(95% CI Lower), 9: Exp(95% CI Upper).
                    if i in [1, 2, 4, 6, 7, 8, 9]:
                        formatted_row.append(f"{item:,.3f}")
                    elif i == 5:
                        formatted_row.append(f"{item:.3f}")
                    else:
                        formatted_row.append(f"{item:.3f}")
                else:
                    # When not is_log, the columns are:
                    # 0: Metric, 1: Beta, 2: Sig, 3: Std. Error, 4: p-value, 5: 95% CI Lower, 6: 95% CI Upper.
                    if i in [1, 3, 5, 6]:
                        formatted_row.append(f"{item:,.3f}")
                    elif i == 4:
                        formatted_row.append(f"{item:.3f}")
                    else:
                        formatted_row.append(f"{item:.3f}")
            else:
                formatted_row.append(item)
        formatted_table.append(formatted_row)

    # Set header and column alignment based on whether label starts with "log"
    if is_log:
        headers = [
            f"y: {model.getLabelCol()}",
            "Beta", "Exp(Beta)", "Sig.", "Std. Error", "p-value",
            "95% CI Lower", "95% CI Upper", "Exp(95% CI Lower)", "Exp(95% CI Upper)"
        ]
        colalign = ("left", "right", "right", "center", "right", "right", "right", "right", "right", "right")
    else:
        headers = [f"y: {model.getLabelCol()}", "Beta", "Sig.", "Std. Error", "p-value", "95% CI Lower", "95% CI Upper"]
        colalign = ("left", "right", "center", "right", "right", "right", "right")

    table_str = tabulate(
        formatted_table,
        headers=headers,
        tablefmt="pretty",
        colalign=colalign
    )

    # Insert a dashed line after the Intercept row.
    lines = table_str.split("\n")
    dash_line = '-' * len(lines[0])
    for i, line in enumerate(lines):
        if "Intercept" in line and not line.strip().startswith('+'):
            lines.insert(i+1, dash_line)
            break

    return "\n".join(lines)

# Example usage:
# print(regression_table(model_1, assembler_1))

add_dummy_variables

Code
def add_dummy_variables(var_name, reference_level, category_order=None):
    """
    Creates dummy variables for the specified column in the global DataFrames dtrain and dtest.
    Allows manual setting of category order.

    Parameters:
        var_name (str): The name of the categorical column (e.g., "borough_name").
        reference_level (int): Index of the category to be used as the reference (dummy omitted).
        category_order (list, optional): List of categories in the desired order. If None, categories are sorted.

    Returns:
        dummy_cols (list): List of dummy column names excluding the reference category.
        ref_category (str): The category chosen as the reference.
    """
    global dtrain, dtest

    # Get distinct categories from the training set.
    categories = dtrain.select(var_name).distinct().rdd.flatMap(lambda x: x).collect()

    # Convert booleans to strings if present.
    categories = [str(c) if isinstance(c, bool) else c for c in categories]

    # Use manual category order if provided; otherwise, sort categories.
    if category_order:
        # Ensure all categories are present in the user-defined order
        missing = set(categories) - set(category_order)
        if missing:
            raise ValueError(f"These categories are missing from your custom order: {missing}")
        categories = category_order
    else:
        categories = sorted(categories)

    # Validate reference_level
    if reference_level < 0 or reference_level >= len(categories):
        raise ValueError(f"reference_level must be between 0 and {len(categories) - 1}")

    # Define the reference category
    ref_category = categories[reference_level]
    print("Reference category (dummy omitted):", ref_category)

    # Create dummy variables for all categories
    for cat in categories:
        dummy_col_name = var_name + "_" + str(cat).replace(" ", "_")
        dtrain = dtrain.withColumn(dummy_col_name, when(col(var_name) == cat, 1).otherwise(0))
        dtest = dtest.withColumn(dummy_col_name, when(col(var_name) == cat, 1).otherwise(0))

    # List of dummy columns, excluding the reference category
    dummy_cols = [var_name + "_" + str(cat).replace(" ", "_") for cat in categories if cat != ref_category]

    return dummy_cols, ref_category


# Example usage without category_order:
# dummy_cols_year, ref_category_year = add_dummy_variables('year', 0)

# Example usage with category_order:
# custom_order_wkday = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday']
# dummy_cols_wkday, ref_category_wkday = add_dummy_variables('wkday', reference_level=0, category_order = custom_order_wkday)

add_interaction_terms

Code
def add_interaction_terms(var_list1, var_list2, var_list3=None):
    """
    Creates interaction term columns in the global DataFrames dtrain and dtest.

    For two sets of variable names (which may represent categorical (dummy) or continuous variables),
    this function creates two-way interactions by multiplying each variable in var_list1 with each
    variable in var_list2.

    Optionally, if a third list of variable names (var_list3) is provided, the function also creates
    three-way interactions among each variable in var_list1, each variable in var_list2, and each variable
    in var_list3.

    Parameters:
        var_list1 (list): List of column names for the first set of variables.
        var_list2 (list): List of column names for the second set of variables.
        var_list3 (list, optional): List of column names for the third set of variables for three-way interactions.

    Returns:
        A flat list of new interaction column names.
    """
    global dtrain, dtest

    interaction_cols = []

    # Create two-way interactions between var_list1 and var_list2.
    for var1 in var_list1:
        for var2 in var_list2:
            col_name = f"{var1}_*_{var2}"
            dtrain = dtrain.withColumn(col_name, col(var1).cast("double") * col(var2).cast("double"))
            dtest = dtest.withColumn(col_name, col(var1).cast("double") * col(var2).cast("double"))
            interaction_cols.append(col_name)

    # Create two-way interactions between var_list1 and var_list3.
    if var_list3 is not None:
      for var1 in var_list1:
          for var3 in var_list3:
              col_name = f"{var1}_*_{var3}"
              dtrain = dtrain.withColumn(col_name, col(var1).cast("double") * col(var3).cast("double"))
              dtest = dtest.withColumn(col_name, col(var1).cast("double") * col(var3).cast("double"))
              interaction_cols.append(col_name)

    # Create two-way interactions between var_list2 and var_list3.
    if var_list3 is not None:
      for var2 in var_list2:
          for var3 in var_list3:
              col_name = f"{var2}_*_{var3}"
              dtrain = dtrain.withColumn(col_name, col(var2).cast("double") * col(var3).cast("double"))
              dtest = dtest.withColumn(col_name, col(var2).cast("double") * col(var3).cast("double"))
              interaction_cols.append(col_name)

    # If a third list is provided, create three-way interactions.
    if var_list3 is not None:
        for var1 in var_list1:
            for var2 in var_list2:
                for var3 in var_list3:
                    col_name = f"{var1}_*_{var2}_*_{var3}"
                    dtrain = dtrain.withColumn(col_name, col(var1).cast("double") * col(var2).cast("double") * col(var3).cast("double"))
                    dtest = dtest.withColumn(col_name, col(var1).cast("double") * col(var2).cast("double") * col(var3).cast("double"))
                    interaction_cols.append(col_name)

    return interaction_cols

 # Example
 # interaction_cols_brand_price = add_interaction_terms(dummy_cols_brand, ['log_price'])
 # interaction_cols_brand_ad_price = add_interaction_terms(dummy_cols_brand, dummy_cols_ad, ['log_price'])

compare_reg_models

Code
def compare_reg_models(models, assemblers, names=None):
    """
    Produces a single formatted table comparing multiple regression models.

    For each predictor (the union across models, ordered by first appearance), the table shows
    the beta estimate (with significance stars) from each model (blank if not used).
    For a predictor, if a model's outcome (model.getLabelCol()) starts with "log", the cell displays
    both the beta and its exponential (separated by " / "), except when the predictor's name includes "log_".
    (The intercept row does not display exp(.))

    Additional rows for Intercept, Observations, R², and RMSE are appended.

    The header's first column is labeled "Predictor", and subsequent columns are
    "y: [outcome] ([name])" for each model.

    The table is produced in grid format (with vertical lines). A dashed line (using '-' characters)
    is inserted at the top, immediately after the header, and at the bottom.
    Additionally, immediately after the Intercept row, the border line is replaced with one using '='
    (to appear as, for example, "+==============================================+==========================+...").

    Parameters:
        models (list): List of fitted LinearRegression models.
        assemblers (list): List of corresponding VectorAssembler objects.
        names (list, optional): List of model names; defaults to "Model 1", "Model 2", etc.

    Returns:
        A formatted string containing the combined regression table.
    """
    # Default model names.
    if names is None:
        names = [f"Model {i+1}" for i in range(len(models))]

    # For each model, get outcome and determine if that model is log-transformed.
    outcomes = [m.getLabelCol() for m in models]
    is_log_flags = [out.lower().startswith("log") for out in outcomes]

    # Build an ordered union of predictors based on first appearance.
    ordered_predictors = []
    for assembler in assemblers:
        for feat in assembler.getInputCols():
            if feat not in ordered_predictors:
                ordered_predictors.append(feat)

    # Helper for significance stars.
    def significance_stars(p):
        if p is None:
            return ""
        if p < 0.01:
            return "***"
        elif p < 0.05:
            return "**"
        elif p < 0.1:
            return "*"
        else:
            return ""

    # Build rows for each predictor.
    rows = []
    for feat in ordered_predictors:
        row = [feat]
        for m, a, is_log in zip(models, assemblers, is_log_flags):
            feats_model = a.getInputCols()
            if feat in feats_model:
                idx = feats_model.index(feat)
                beta = m.coefficients.toArray()[idx]
                p_val = m.summary.pValues[idx] if m.summary.pValues is not None else None
                stars = significance_stars(p_val)
                cell = f"{beta:.3f}{stars}"
                # Only add exp(beta) if model is log and predictor name does NOT include "log_"
                if is_log and ("log_" not in feat.lower()):
                    cell += f" / {np.exp(beta):,.3f}"
                row.append(cell)
            else:
                row.append("")
        rows.append(row)

    # Build intercept row (do NOT compute exp(intercept)).
    intercept_row = ["Intercept"]
    for m in models:
        std_all = np.array(m.summary.coefficientStandardErrors)
        coeffs = m.coefficients.toArray()
        if len(std_all) == len(coeffs) + 1:
            intercept_p = m.summary.pValues[0] if m.summary.pValues is not None else None
        else:
            intercept_p = None
        sig = significance_stars(intercept_p)
        cell = f"{m.intercept:.3f}{sig}"
        intercept_row.append(cell)
    rows.append(intercept_row)

    # Add Observations row.
    obs_row = ["Observations"]
    for m in models:
        obs = m.summary.numInstances
        obs_row.append(f"{int(obs):,}")
    rows.append(obs_row)

    # Add R² row.
    r2_row = ["R²"]
    for m in models:
        r2_row.append(f"{m.summary.r2:.3f}")
    rows.append(r2_row)

    # Add RMSE row.
    rmse_row = ["RMSE"]
    for m in models:
        rmse_row.append(f"{m.summary.rootMeanSquaredError:.3f}")
    rows.append(rmse_row)

    # Build header: first column "Predictor", then for each model: "y: [outcome] ([name])"
    header = ["Predictor"]
    for out, name in zip(outcomes, names):
        header.append(f"y: {out} ({name})")

    # Create table string using grid format.
    table_str = tabulate(rows, headers=header, tablefmt="grid", colalign=("left",) + ("right",)*len(models))

    # Split into lines.
    lines = table_str.split("\n")

    # Create a dashed line spanning the full width.
    full_width = len(lines[0])
    dash_line = '-' * full_width
    # Create an equals line by replacing '-' with '='.
    eq_line = dash_line.replace('-', '=')

    # Insert a dashed line after the header row.
    lines = table_str.split("\n")
    # In grid format, header and separator are usually the first two lines.
    lines.insert(2, dash_line)

    # Insert an equals line after the Intercept row.
    for i, line in enumerate(lines):
        if line.startswith("|") and "Intercept" in line:
            if i+1 < len(lines):
                lines[i+1] = eq_line
            break

    # Add dashed lines at the very top and bottom.
    final_table = dash_line + "\n" + "\n".join(lines) + "\n" + dash_line

    return final_table

# Example usage:
# print(compare_reg_models([model_1, model_2, model_3],
#                          [assembler_1, assembler_2, assembler_3],
#                          ["Model 1", "Model 2", "Model 3"]))

compare_rmse

Code
def compare_rmse(test_dfs, label_col, pred_col="prediction", names=None):
    """
    Computes and compares RMSE values for a list of test DataFrames.

    For each DataFrame in test_dfs, this function calculates the RMSE between the actual outcome
    (given by label_col) and the predicted value (given by pred_col, default "prediction"). It then
    produces a formatted table where the first column header is empty and the first row's first cell is
    "RMSE", with each model's RMSE in its own column.

    Parameters:
        test_dfs (list): List of test DataFrames.
        label_col (str): The name of the outcome column.
        pred_col (str, optional): The name of the prediction column (default "prediction").
        names (list, optional): List of model names corresponding to the test DataFrames.
                                Defaults to "Model 1", "Model 2", etc.

    Returns:
        A formatted string containing a table that compares RMSE values for each test DataFrame,
        with one model per column.
    """
    # Set default model names if none provided.
    if names is None:
        names = [f"Model {i+1}" for i in range(len(test_dfs))]

    rmse_values = []
    for df in test_dfs:
        # Create a column for squared error.
        df = df.withColumn("error_sq", pow(col(label_col) - col(pred_col), 2))
        # Calculate RMSE: square root of the mean squared error.
        rmse = df.agg(sqrt(avg("error_sq")).alias("rmse")).collect()[0]["rmse"]
        rmse_values.append(rmse)

    # Build a single row table: first cell "RMSE", then one cell per model with the RMSE value.
    row = ["RMSE"] + [f"{rmse:.3f}" for rmse in rmse_values]

    # Build header: first column header is empty, then model names.
    header = [""] + names

    table_str = tabulate([row], headers=header, tablefmt="grid", colalign=("left",) + ("right",)*len(names))
    return table_str

# Example usage:
# print(compare_rmse([dtest_1, dtest_2, dtest_3], "log_sales", names=["Model 1", "Model 2", "Model 3"]))

residual_plot

Code
def residual_plot(df, label_col, model_name):
    """
    Generates a residual plot for a given test dataframe.

    Parameters:
        df (DataFrame): Spark DataFrame containing the test set with predictions.
        label_col (str): The column name of the actual outcome variable.
        title (str): The title for the residual plot.

    Returns:
        None (displays the plot)
    """
    # Convert to Pandas DataFrame
    df_pd = df.select(["prediction", label_col]).toPandas()
    df_pd["residual"] = df_pd[label_col] - df_pd["prediction"]

    # Scatter plot of residuals vs. predicted values
    plt.scatter(df_pd["prediction"], df_pd["residual"], alpha=0.2, color="darkgray")

    # Use LOWESS smoothing for trend line
    smoothed = sm.nonparametric.lowess(df_pd["residual"], df_pd["prediction"])
    plt.plot(smoothed[:, 0], smoothed[:, 1], color="darkblue")

    # Add reference line at y=0
    plt.axhline(y=0, color="red", linestyle="--")

    # Labels and title (model_name)
    plt.xlabel("Predicted Values")
    plt.ylabel("Residuals")
    model_name = "Residual Plot for " + model_name
    plt.title(model_name)

    # Show plot
    plt.show()

# Example usage:
# residual_plot(dtest_1, "log_sales", "Model 1")

Data Preparation

beer = pd.read_csv('https://bcdanl.github.io/data/beer_markets_all_cleaned.csv')

Log Transformation

df = spark.createDataFrame(beer)
df = (
    df
    .withColumn("log_beer_floz",
                log(df['beer_floz']) )
    .withColumn("log_price_floz",
                log(df['price_floz']) )
)

Question 1 - Filter

df = df.filter(
    (col("container") == "CAN") |
    (col("container") == "NON_REFILLABLE_BOTTLE"))

Question 2 - Training-Test Split

dtrain, dtest = df.randomSplit([0.67, 0.33], seed = 1234)
(
    df
    .groupBy("market")
    .count()
    .orderBy("market")
    .show(n = df.select("market").distinct().count())
)
+--------------------+-----+
|              market|count|
+--------------------+-----+
|              ALBANY|  487|
|             ATLANTA| 1279|
|           BALTIMORE|  374|
|          BIRMINGHAM| 1137|
|              BOSTON|  872|
|   BUFFALO-ROCHESTER|  607|
|           CHARLOTTE| 1246|
|             CHICAGO| 1879|
|          CINCINNATI| 1270|
|           CLEVELAND| 1226|
|            COLUMBUS| 1862|
|              DALLAS| 2098|
|              DENVER|  796|
|          DES_MOINES|  716|
|             DETROIT| 1731|
|          EXURBAN_NJ|  223|
|          EXURBAN_NY|   98|
|        GRAND_RAPIDS|  739|
|  HARTFORD-NEW_HAVEN|  370|
|             HOUSTON| 1673|
|        INDIANAPOLIS| 1213|
|        JACKSONVILLE|  501|
|         KANSAS_CITY|  663|
|         LITTLE_ROCK|  452|
|         LOS_ANGELES| 1564|
|          LOUISVILLE|  833|
|             MEMPHIS|  530|
|               MIAMI| 2616|
|           MILWAUKEE|  728|
|         MINNEAPOLIS|  801|
|           NASHVILLE|  989|
|  NEW_ORLEANS-MOBILE|  852|
| OKLAHOMA_CITY-TULSA|  800|
|               OMAHA| 1017|
|             ORLANDO| 1135|
|        PHILADELPHIA|  433|
|             PHOENIX| 2263|
|          PITTSBURGH|  352|
|            PORTLAND|  552|
|      RALEIGH-DURHAM| 1126|
|            RICHMOND| 1063|
|       RURAL_ALABAMA|  305|
|      RURAL_ARKANSAS|  160|
|    RURAL_CALIFORNIA|  848|
|      RURAL_COLORADO|   21|
|       RURAL_FLORIDA|  522|
|       RURAL_GEORGIA|  460|
|         RURAL_IDAHO|  154|
|      RURAL_ILLINOIS| 1195|
|       RURAL_INDIANA|  481|
|          RURAL_IOWA| 1060|
|        RURAL_KANSAS|  179|
|      RURAL_KENTUCKY|  225|
|     RURAL_LOUISIANA|  381|
|         RURAL_MAINE|  353|
|      RURAL_MICHIGAN|  754|
|     RURAL_MINNESOTA|  138|
|   RURAL_MISSISSIPPI|  354|
|      RURAL_MISSOURI|  640|
|       RURAL_MONTANA|  354|
|      RURAL_NEBRASKA|  110|
|        RURAL_NEVADA|  557|
| RURAL_NEW_HAMPSHIRE|   25|
|    RURAL_NEW_MEXICO|  427|
|      RURAL_NEW_YORK|   13|
|RURAL_NORTH_CAROLINA|  909|
|  RURAL_NORTH_DAKOTA|  129|
|          RURAL_OHIO|  257|
|      RURAL_OKLAHOMA|   54|
|        RURAL_OREGON|   38|
|  RURAL_PENNSYLVANIA|  298|
|RURAL_SOUTH_CAROLINA| 1295|
|  RURAL_SOUTH_DAKOTA|  153|
|     RURAL_TENNESSEE|  423|
|         RURAL_TEXAS| 1771|
|       RURAL_VERMONT|  139|
|      RURAL_VIRGINIA|  185|
|    RURAL_WASHINGTON|  330|
| RURAL_WEST_VIRGINIA|  265|
|     RURAL_WISCONSIN| 1306|
|       RURAL_WYOMING|   39|
|          SACRAMENTO|  981|
|      SALT_LAKE_CITY|  320|
|         SAN_ANTONIO| 2615|
|           SAN_DIEGO|  656|
|       SAN_FRANCISCO|  871|
|             SEATTLE|  903|
|            ST_LOUIS| 1347|
|        SURBURBAN_NJ|  399|
|        SURBURBAN_NY|  473|
|            SYRACUSE|  294|
|               TAMPA| 3180|
|            URBAN_NY|  735|
|       WASHINGTON_DC|  863|
+--------------------+-----+
(
    df
    .groupBy("brand")
    .count()
    .orderBy("brand")
    .show(n = df.select("brand").distinct().count())
)
+-------------+-----+
|        brand|count|
+-------------+-----+
|    BUD_LIGHT|21170|
|  BUSCH_LIGHT| 8671|
|  COORS_LIGHT|12865|
|  MILLER_LITE|16788|
|NATURAL_LIGHT|12616|
+-------------+-----+
(
    df
    .groupBy("container")
    .count()
    .orderBy("container")
    .show(n = df.select("container").distinct().count(), truncate=False)
)
+---------------------+-----+
|container            |count|
+---------------------+-----+
|CAN                  |53015|
|NON_REFILLABLE_BOTTLE|19095|
+---------------------+-----+
(
    df
    .groupBy("promo")
    .count()
    .orderBy("promo")
    .show(n = df.select("promo").distinct().count())
)
+-----+-----+
|promo|count|
+-----+-----+
|false|57621|
| true|14489|
+-----+-----+

Adding Dummies

dummy_cols_market, ref_category_market = add_dummy_variables('market', 5)
dummy_cols_brand, ref_category_brand = add_dummy_variables('brand', 0)
dummy_cols_container, ref_category_container = add_dummy_variables('container', 1)
dummy_cols_promo, ref_category_promo = add_dummy_variables('promo', 0)
Reference category (dummy omitted): BUFFALO-ROCHESTER
Reference category (dummy omitted): BUD_LIGHT
Reference category (dummy omitted): NON_REFILLABLE_BOTTLE
Reference category (dummy omitted): False

Adding Interaction Terms

interaction_cols_brand_quantity = add_interaction_terms(dummy_cols_brand, ['log_beer_floz'])
interaction_cols_brand_promo_quantity = add_interaction_terms(dummy_cols_brand, dummy_cols_promo, ['log_beer_floz'])

Question 3 - Intuition behind Each Model

  • Model 1

\[ \begin{align} \log(\text{price\_per\_floz}) = &\ \beta_{0} + \sum_{i=1}^{N} \beta_{i} \,\text{market}_{i} + \sum_{j=N+1}^{N+4} \beta_{j} \,\text{brand}_{j} + \beta_{N+5} \,\text{container\_{CAN}} \\ &\,+\, \beta_{N+6} \log(\text{beer\_floz}) + \epsilon \end{align} \]

  • One-Size-Fits-All Elasticity
    • Assumes that every beer—whether Budweiser or value-priced Natural—shares an identical inverse price elasticity.
    • The entire beer is treated as a homogeneous commodity. A 10 % increase in price invariably yields, for example, a 5 % reduction in sales, regardless of brand identity or promotional activity.
    • While straightforward to estimate, this specification overlooks differences in pricing power across brands, positioning, and promotional responsiveness, leading to underpricing of premium favorites (e.g., Bud, Coors, Miller) and overpricing of bargain brands (e.g., Busch, Natural).
  • Model 2

\[ \begin{align} \log(\text{price\_per\_floz}) \,=\, & \beta_{0} \,+\, \sum_{i=1}^{N}\beta_{i}\,\text{market}_{i} \,+\, \sum_{j=N+1}^{N+4}\beta_{j}\,\text{brand}_{j} \,+\, \beta_{N+5}\,\text{container\_{CAN}} \\ &\,+\, \beta_{N+6}\log(\text{beer\_floz})\\ &\,+\, \sum_{j=N+1}^{N+4}\beta_{j\times\text{beer\_floz}}\,\text{brand}_{j}\times \log(\text{beer\_floz})\\ &\,+\, \epsilon \end{align} \]

  • Brand-Specific Elasticity (No Promotional Effect)
    • Allows Bud, Coors, Miller, Busch, and Natural each to exhibit its own inverse price elasticity, yet treats promotion and no-promotion identically.
    • The model imposes a constant sensitivity irrespective of whether the price is part of a “Happy Hour” or standard offering.
      • This approach captures variation in the degres of sensitiveness across brands, but it fails to account for the additional demand shifts induced by promotions.
  • Model 3

\[ \begin{align} \log(\text{price\_per\_floz}) \,=\, & \beta_{0} \,+\, \sum_{i=1}^{N}\beta_{i}\,\text{market}_{i} \,+\, \sum_{j=N+1}^{N+4}\beta_{j}\,\text{brand}_{j} \,+\, \beta_{N+5}\,\text{container\_{CAN}} \\ &\,+\, \beta_{N+6}\log(\text{beer\_floz})\\ &\,+\, \beta_{N+7}\,\text{promo} \times\log(\text{beer\_floz}) \\ &\,+\, \sum_{j=N+1}^{N+4}\beta_{j\times\text{beer\_floz}}\,\text{brand}_{j}\times \log(\text{beer\_floz})\\ &\,+\, \sum_{j=N+1}^{N+4}\beta_{j\times\text{promo}}\,\text{brand}_{j}\times \text{promo}\\ &\,+\, \sum_{j=N+1}^{N+4}\beta_{j\times\text{promo}\times\text{beer\_floz}}\,\text{brand}_{j}\times \text{promo}\times \log(\text{beer\_floz})\\ &\,+\, \epsilon \end{align} \]

  • Brand * Promotion-Specific Elasticity
    • Estimates a distinct inverse elasticity for each brand and for each brand’s promotional status (discounted vs. full price).
      • When Natural Light is on special, its demand becomes markedly more elastic—deal-seeking consumers flood in—whereas Bud devotees remain relatively insensitive to a buy-one-get-one offer. Each brand’s “promotion effect” is modeled uniquely.
      • This specification offers the greatest flexibility, supporting a dynamic pricing and promotional calendar that maximizes profitability by aligning strategies with each brand’s specific consumer responsiveness.

Model 1

# assembling predictors
conti_cols = ["log_beer_floz"]

assembler_predictors = (
    conti_cols +
    dummy_cols_market +
    dummy_cols_brand +
    dummy_cols_container
)

assembler_1 = VectorAssembler(
    inputCols = assembler_predictors,
    outputCol = "predictors"
)

dtrain_1 = assembler_1.transform(dtrain)
dtest_1  = assembler_1.transform(dtest)

# training model
model_1 = (
    LinearRegression(featuresCol="predictors",
                     labelCol="log_price_floz")
    .fit(dtrain_1)
)

# making prediction - Question 4
dtest_1 = model_1.transform(dtest_1)

# makting regression table
print( regression_table(model_1, assembler_1) )
+-----------------------------+--------+-----------+------+------------+---------+--------------+--------------+-------------------+-------------------+
| y: log_price_floz           |   Beta | Exp(Beta) | Sig. | Std. Error | p-value | 95% CI Lower | 95% CI Upper | Exp(95% CI Lower) | Exp(95% CI Upper) |
+-----------------------------+--------+-----------+------+------------+---------+--------------+--------------+-------------------+-------------------+
| log_beer_floz               | -0.142 |           | ***  |      0.013 |   0.000 |       -0.166 |       -0.117 |                   |                   |
| market_ALBANY               |  0.027 |     1.027 |  **  |      0.010 |   0.034 |        0.007 |        0.047 |             1.007 |             1.048 |
| market_ATLANTA              |  0.083 |     1.087 | ***  |      0.014 |   0.000 |        0.057 |        0.110 |             1.058 |             1.117 |
| market_BALTIMORE            |  0.100 |     1.105 | ***  |      0.011 |   0.000 |        0.079 |        0.121 |             1.083 |             1.128 |
| market_BIRMINGHAM           |  0.124 |     1.132 | ***  |      0.011 |   0.000 |        0.102 |        0.145 |             1.107 |             1.156 |
| market_BOSTON               |  0.127 |     1.136 | ***  |      0.010 |   0.000 |        0.107 |        0.147 |             1.113 |             1.159 |
| market_CHARLOTTE            |  0.020 |     1.020 |  *   |      0.010 |   0.058 |        0.000 |        0.039 |             1.000 |             1.039 |
| market_CHICAGO              | -0.008 |     0.992 |      |      0.010 |   0.411 |       -0.028 |        0.012 |             0.972 |             1.012 |
| market_CINCINNATI           |  0.084 |     1.088 | ***  |      0.010 |   0.000 |        0.064 |        0.104 |             1.066 |             1.110 |
| market_CLEVELAND            |  0.050 |     1.051 | ***  |      0.010 |   0.000 |        0.031 |        0.069 |             1.031 |             1.071 |
| market_COLUMBUS             |  0.069 |     1.072 | ***  |      0.010 |   0.000 |        0.050 |        0.088 |             1.051 |             1.092 |
| market_DALLAS               |  0.203 |     1.226 | ***  |      0.011 |   0.000 |        0.181 |        0.225 |             1.199 |             1.253 |
| market_DENVER               |  0.123 |     1.131 | ***  |      0.012 |   0.000 |        0.101 |        0.146 |             1.106 |             1.157 |
| market_DES_MOINES           |  0.129 |     1.138 | ***  |      0.010 |   0.000 |        0.110 |        0.148 |             1.116 |             1.160 |
| market_DETROIT              |  0.087 |     1.091 | ***  |      0.016 |   0.000 |        0.055 |        0.119 |             1.057 |             1.126 |
| market_EXURBAN_NJ           |  0.221 |     1.247 | ***  |      0.023 |   0.000 |        0.176 |        0.266 |             1.192 |             1.305 |
| market_EXURBAN_NY           |  0.122 |     1.130 | ***  |      0.011 |   0.000 |        0.100 |        0.144 |             1.105 |             1.155 |
| market_GRAND_RAPIDS         |  0.082 |     1.086 | ***  |      0.014 |   0.000 |        0.055 |        0.109 |             1.056 |             1.115 |
| market_HARTFORD-NEW_HAVEN   |  0.148 |     1.160 | ***  |      0.010 |   0.000 |        0.129 |        0.167 |             1.137 |             1.182 |
| market_HOUSTON              |  0.113 |     1.120 | ***  |      0.010 |   0.000 |        0.093 |        0.133 |             1.097 |             1.143 |
| market_INDIANAPOLIS         |  0.042 |     1.043 | ***  |      0.013 |   0.000 |        0.018 |        0.067 |             1.018 |             1.069 |
| market_JACKSONVILLE         |  0.113 |     1.120 | ***  |      0.012 |   0.000 |        0.090 |        0.136 |             1.094 |             1.146 |
| market_KANSAS_CITY          |  0.076 |     1.079 | ***  |      0.013 |   0.000 |        0.050 |        0.101 |             1.052 |             1.106 |
| market_LITTLE_ROCK          |  0.092 |     1.096 | ***  |      0.010 |   0.000 |        0.072 |        0.112 |             1.075 |             1.118 |
| market_LOS_ANGELES          |  0.032 |     1.033 | ***  |      0.011 |   0.001 |        0.010 |        0.054 |             1.010 |             1.055 |
| market_LOUISVILLE           |  0.068 |     1.070 | ***  |      0.012 |   0.000 |        0.044 |        0.092 |             1.045 |             1.097 |
| market_MEMPHIS              |  0.128 |     1.137 | ***  |      0.009 |   0.000 |        0.110 |        0.147 |             1.116 |             1.158 |
| market_MIAMI                |  0.108 |     1.114 | ***  |      0.011 |   0.000 |        0.085 |        0.130 |             1.089 |             1.139 |
| market_MILWAUKEE            |  0.028 |     1.028 |  **  |      0.011 |   0.015 |        0.006 |        0.050 |             1.006 |             1.051 |
| market_MINNEAPOLIS          |  0.128 |     1.137 | ***  |      0.011 |   0.000 |        0.107 |        0.149 |             1.113 |             1.161 |
| market_NASHVILLE            |  0.143 |     1.153 | ***  |      0.011 |   0.000 |        0.121 |        0.164 |             1.129 |             1.179 |
| market_NEW_ORLEANS-MOBILE   |  0.128 |     1.136 | ***  |      0.011 |   0.000 |        0.106 |        0.150 |             1.112 |             1.162 |
| market_OKLAHOMA_CITY-TULSA  |  0.145 |     1.156 | ***  |      0.011 |   0.000 |        0.124 |        0.166 |             1.132 |             1.181 |
| market_OMAHA                |  0.131 |     1.140 | ***  |      0.011 |   0.000 |        0.110 |        0.151 |             1.116 |             1.164 |
| market_ORLANDO              |  0.098 |     1.103 | ***  |      0.013 |   0.000 |        0.073 |        0.124 |             1.075 |             1.132 |
| market_PHILADELPHIA         |  0.115 |     1.121 | ***  |      0.010 |   0.000 |        0.096 |        0.133 |             1.100 |             1.143 |
| market_PHOENIX              |  0.142 |     1.152 | ***  |      0.014 |   0.000 |        0.115 |        0.169 |             1.121 |             1.184 |
| market_PITTSBURGH           |  0.100 |     1.105 | ***  |      0.012 |   0.000 |        0.076 |        0.124 |             1.079 |             1.132 |
| market_PORTLAND             |  0.115 |     1.122 | ***  |      0.011 |   0.000 |        0.095 |        0.136 |             1.099 |             1.145 |
| market_RALEIGH-DURHAM       |  0.090 |     1.094 | ***  |      0.011 |   0.000 |        0.069 |        0.111 |             1.071 |             1.117 |
| market_RICHMOND             |  0.043 |     1.044 | ***  |      0.015 |   0.000 |        0.015 |        0.072 |             1.015 |             1.075 |
| market_RURAL_ALABAMA        |  0.157 |     1.170 | ***  |      0.020 |   0.000 |        0.118 |        0.195 |             1.126 |             1.215 |
| market_RURAL_ARKANSAS       |  0.157 |     1.171 | ***  |      0.011 |   0.000 |        0.136 |        0.179 |             1.145 |             1.196 |
| market_RURAL_CALIFORNIA     |  0.044 |     1.045 | ***  |      0.046 |   0.000 |       -0.047 |        0.135 |             0.955 |             1.144 |
| market_RURAL_COLORADO       |  0.137 |     1.147 | ***  |      0.013 |   0.003 |        0.113 |        0.162 |             1.119 |             1.176 |
| market_RURAL_FLORIDA        |  0.059 |     1.061 | ***  |      0.013 |   0.000 |        0.034 |        0.085 |             1.034 |             1.089 |
| market_RURAL_GEORGIA        |  0.132 |     1.141 | ***  |      0.019 |   0.000 |        0.095 |        0.169 |             1.099 |             1.184 |
| market_RURAL_IDAHO          |  0.142 |     1.152 | ***  |      0.010 |   0.000 |        0.121 |        0.162 |             1.129 |             1.176 |
| market_RURAL_ILLINOIS       |  0.014 |     1.014 |      |      0.013 |   0.176 |       -0.011 |        0.039 |             0.989 |             1.040 |
| market_RURAL_INDIANA        |  0.073 |     1.076 | ***  |      0.011 |   0.000 |        0.052 |        0.094 |             1.053 |             1.098 |
| market_RURAL_IOWA           |  0.058 |     1.060 | ***  |      0.018 |   0.000 |        0.023 |        0.094 |             1.023 |             1.098 |
| market_RURAL_KANSAS         |  0.134 |     1.143 | ***  |      0.016 |   0.000 |        0.102 |        0.166 |             1.108 |             1.180 |
| market_RURAL_KENTUCKY       |  0.157 |     1.170 | ***  |      0.014 |   0.000 |        0.130 |        0.184 |             1.139 |             1.202 |
| market_RURAL_LOUISIANA      |  0.060 |     1.061 | ***  |      0.014 |   0.000 |        0.032 |        0.087 |             1.033 |             1.091 |
| market_RURAL_MAINE          |  0.091 |     1.096 | ***  |      0.011 |   0.000 |        0.069 |        0.114 |             1.071 |             1.120 |
| market_RURAL_MICHIGAN       |  0.085 |     1.089 | ***  |      0.020 |   0.000 |        0.046 |        0.124 |             1.047 |             1.132 |
| market_RURAL_MINNESOTA      |  0.165 |     1.180 | ***  |      0.014 |   0.000 |        0.138 |        0.193 |             1.148 |             1.213 |
| market_RURAL_MISSISSIPPI    |  0.043 |     1.044 | ***  |      0.012 |   0.002 |        0.020 |        0.066 |             1.020 |             1.069 |
| market_RURAL_MISSOURI       |  0.106 |     1.112 | ***  |      0.014 |   0.000 |        0.078 |        0.134 |             1.081 |             1.143 |
| market_RURAL_MONTANA        |  0.127 |     1.135 | ***  |      0.021 |   0.000 |        0.085 |        0.169 |             1.089 |             1.184 |
| market_RURAL_NEBRASKA       |  0.138 |     1.148 | ***  |      0.012 |   0.000 |        0.114 |        0.162 |             1.121 |             1.176 |
| market_RURAL_NEVADA         |  0.051 |     1.052 | ***  |      0.045 |   0.000 |       -0.037 |        0.139 |             0.964 |             1.149 |
| market_RURAL_NEW_HAMPSHIRE  |  0.028 |     1.028 |      |      0.013 |   0.532 |        0.002 |        0.054 |             1.002 |             1.055 |
| market_RURAL_NEW_MEXICO     |  0.154 |     1.166 | ***  |      0.061 |   0.000 |        0.035 |        0.273 |             1.036 |             1.314 |
| market_RURAL_NEW_YORK       | -0.013 |     0.987 |      |      0.011 |   0.832 |       -0.034 |        0.009 |             0.966 |             1.009 |
| market_RURAL_NORTH_CAROLINA | -0.002 |     0.998 |      |      0.020 |   0.823 |       -0.042 |        0.038 |             0.958 |             1.038 |
| market_RURAL_NORTH_DAKOTA   |  0.223 |     1.250 | ***  |      0.015 |   0.000 |        0.193 |        0.253 |             1.213 |             1.288 |
| market_RURAL_OHIO           |  0.096 |     1.100 | ***  |      0.031 |   0.000 |        0.034 |        0.157 |             1.035 |             1.170 |
| market_RURAL_OKLAHOMA       |  0.130 |     1.139 | ***  |      0.033 |   0.000 |        0.066 |        0.194 |             1.068 |             1.214 |
| market_RURAL_OREGON         |  0.074 |     1.077 |  **  |      0.015 |   0.024 |        0.045 |        0.103 |             1.046 |             1.108 |
| market_RURAL_PENNSYLVANIA   |  0.131 |     1.140 | ***  |      0.010 |   0.000 |        0.111 |        0.151 |             1.117 |             1.163 |
| market_RURAL_SOUTH_CAROLINA |  0.055 |     1.056 | ***  |      0.020 |   0.000 |        0.017 |        0.093 |             1.017 |             1.098 |
| market_RURAL_SOUTH_DAKOTA   |  0.077 |     1.081 | ***  |      0.013 |   0.000 |        0.052 |        0.103 |             1.053 |             1.109 |
| market_RURAL_TENNESSEE      |  0.170 |     1.186 | ***  |      0.010 |   0.000 |        0.151 |        0.190 |             1.163 |             1.209 |
| market_RURAL_TEXAS          |  0.169 |     1.184 | ***  |      0.020 |   0.000 |        0.130 |        0.207 |             1.139 |             1.230 |
| market_RURAL_VERMONT        |  0.077 |     1.080 | ***  |      0.017 |   0.000 |        0.043 |        0.111 |             1.044 |             1.118 |
| market_RURAL_VIRGINIA       |  0.018 |     1.019 |      |      0.014 |   0.289 |       -0.009 |        0.046 |             0.991 |             1.047 |
| market_RURAL_WASHINGTON     |  0.116 |     1.123 | ***  |      0.015 |   0.000 |        0.087 |        0.146 |             1.090 |             1.157 |
| market_RURAL_WEST_VIRGINIA  | -0.037 |     0.964 |  **  |      0.010 |   0.015 |       -0.057 |       -0.017 |             0.945 |             0.984 |
| market_RURAL_WISCONSIN      |  0.039 |     1.040 | ***  |      0.036 |   0.000 |       -0.031 |        0.109 |             0.970 |             1.115 |
| market_RURAL_WYOMING        |  0.147 |     1.158 | ***  |      0.011 |   0.000 |        0.126 |        0.168 |             1.134 |             1.183 |
| market_SACRAMENTO           |  0.020 |     1.020 |  *   |      0.014 |   0.064 |       -0.008 |        0.048 |             0.992 |             1.049 |
| market_SALT_LAKE_CITY       |  0.120 |     1.128 | ***  |      0.009 |   0.000 |        0.102 |        0.139 |             1.107 |             1.149 |
| market_SAN_ANTONIO          |  0.138 |     1.148 | ***  |      0.012 |   0.000 |        0.115 |        0.161 |             1.122 |             1.174 |
| market_SAN_DIEGO            |  0.012 |     1.012 |      |      0.011 |   0.321 |       -0.010 |        0.033 |             0.990 |             1.034 |
| market_SAN_FRANCISCO        |  0.069 |     1.071 | ***  |      0.011 |   0.000 |        0.047 |        0.090 |             1.048 |             1.094 |
| market_SEATTLE              |  0.109 |     1.116 | ***  |      0.010 |   0.000 |        0.089 |        0.129 |             1.093 |             1.138 |
| market_ST_LOUIS             |  0.042 |     1.042 | ***  |      0.013 |   0.000 |        0.015 |        0.068 |             1.015 |             1.070 |
| market_SURBURBAN_NJ         | -0.008 |     0.992 |      |      0.013 |   0.559 |       -0.033 |        0.017 |             0.968 |             1.017 |
| market_SURBURBAN_NY         |  0.101 |     1.106 | ***  |      0.015 |   0.000 |        0.072 |        0.129 |             1.075 |             1.138 |
| market_SYRACUSE             | -0.035 |     0.966 |  **  |      0.009 |   0.017 |       -0.053 |       -0.017 |             0.948 |             0.983 |
| market_TAMPA                |  0.103 |     1.109 | ***  |      0.011 |   0.000 |        0.081 |        0.126 |             1.084 |             1.134 |
| market_URBAN_NY             |  0.173 |     1.189 | ***  |      0.011 |   0.000 |        0.152 |        0.195 |             1.164 |             1.216 |
| market_WASHINGTON_DC        |  0.097 |     1.102 | ***  |      0.003 |   0.000 |        0.092 |        0.103 |             1.096 |             1.109 |
| brand_BUSCH_LIGHT           | -0.260 |     0.771 | ***  |      0.002 |   0.000 |       -0.264 |       -0.255 |             0.768 |             0.775 |
| brand_COORS_LIGHT           | -0.005 |     0.995 |  **  |      0.002 |   0.049 |       -0.009 |       -0.000 |             0.991 |             1.000 |
| brand_MILLER_LITE           | -0.013 |     0.987 | ***  |      0.003 |   0.000 |       -0.018 |       -0.008 |             0.982 |             0.992 |
| brand_NATURAL_LIGHT         | -0.319 |     0.727 | ***  |      0.002 |   0.000 |       -0.323 |       -0.315 |             0.724 |             0.730 |
| container_CAN               | -0.053 |     0.948 | ***  |      0.011 |   0.000 |       -0.074 |       -0.032 |             0.929 |             0.968 |
| Intercept                   | -2.117 |           | ***  |      0.001 |         |       -2.119 |              |            -2.115 |                   |
--------------------------------------------------------------------------------------------------------------------------------------------------------
| Observations                | 48,115 |           |      |            |         |              |              |                   |                   |
| R²                          |  0.547 |           |      |            |         |              |              |                   |                   |
| RMSE                        |  0.170 |           |      |            |         |              |              |                   |                   |
+-----------------------------+--------+-----------+------+------------+---------+--------------+--------------+-------------------+-------------------+

Model 2

# assembling predictors
conti_cols = ["log_beer_floz"]

assembler_predictors = (
    conti_cols +
    dummy_cols_market +
    dummy_cols_brand +
    dummy_cols_container +
    interaction_cols_brand_quantity
)


assembler_2 = VectorAssembler(
    inputCols = assembler_predictors,
    outputCol = "predictors"
)

dtrain_2 = assembler_2.transform(dtrain)
dtest_2  = assembler_2.transform(dtest)

# training model
model_2 = (
    LinearRegression(featuresCol="predictors",
                     labelCol="log_price_floz")
    .fit(dtrain_2)
)

# making prediction - Question 4
dtest_2 = model_2.transform(dtest_2)

# makting regression table
print( regression_table(model_2, assembler_2) )
+-------------------------------------+--------+-----------+------+------------+---------+--------------+--------------+-------------------+-------------------+
| y: log_price_floz                   |   Beta | Exp(Beta) | Sig. | Std. Error | p-value | 95% CI Lower | 95% CI Upper | Exp(95% CI Lower) | Exp(95% CI Upper) |
+-------------------------------------+--------+-----------+------+------------+---------+--------------+--------------+-------------------+-------------------+
| log_beer_floz                       | -0.146 |           | ***  |      0.013 |   0.000 |       -0.171 |       -0.121 |                   |                   |
| market_ALBANY                       |  0.029 |     1.029 |  **  |      0.010 |   0.021 |        0.009 |        0.049 |             1.009 |             1.050 |
| market_ATLANTA                      |  0.083 |     1.087 | ***  |      0.014 |   0.000 |        0.056 |        0.110 |             1.058 |             1.116 |
| market_BALTIMORE                    |  0.104 |     1.109 | ***  |      0.011 |   0.000 |        0.083 |        0.124 |             1.087 |             1.132 |
| market_BIRMINGHAM                   |  0.130 |     1.139 | ***  |      0.011 |   0.000 |        0.108 |        0.152 |             1.114 |             1.164 |
| market_BOSTON                       |  0.127 |     1.135 | ***  |      0.010 |   0.000 |        0.107 |        0.147 |             1.113 |             1.158 |
| market_CHARLOTTE                    |  0.015 |     1.015 |      |      0.010 |   0.156 |       -0.005 |        0.034 |             0.995 |             1.034 |
| market_CHICAGO                      | -0.013 |     0.987 |      |      0.010 |   0.173 |       -0.033 |        0.007 |             0.967 |             1.007 |
| market_CINCINNATI                   |  0.079 |     1.082 | ***  |      0.010 |   0.000 |        0.058 |        0.099 |             1.060 |             1.104 |
| market_CLEVELAND                    |  0.045 |     1.046 | ***  |      0.010 |   0.000 |        0.025 |        0.064 |             1.026 |             1.066 |
| market_COLUMBUS                     |  0.066 |     1.068 | ***  |      0.010 |   0.000 |        0.047 |        0.085 |             1.048 |             1.089 |
| market_DALLAS                       |  0.214 |     1.238 | ***  |      0.011 |   0.000 |        0.192 |        0.236 |             1.211 |             1.266 |
| market_DENVER                       |  0.121 |     1.129 | ***  |      0.012 |   0.000 |        0.099 |        0.144 |             1.104 |             1.155 |
| market_DES_MOINES                   |  0.125 |     1.134 | ***  |      0.010 |   0.000 |        0.106 |        0.145 |             1.112 |             1.156 |
| market_DETROIT                      |  0.083 |     1.086 | ***  |      0.016 |   0.000 |        0.051 |        0.114 |             1.052 |             1.121 |
| market_EXURBAN_NJ                   |  0.216 |     1.242 | ***  |      0.023 |   0.000 |        0.172 |        0.261 |             1.187 |             1.299 |
| market_EXURBAN_NY                   |  0.119 |     1.126 | ***  |      0.011 |   0.000 |        0.096 |        0.141 |             1.101 |             1.151 |
| market_GRAND_RAPIDS                 |  0.078 |     1.081 | ***  |      0.014 |   0.000 |        0.051 |        0.105 |             1.053 |             1.111 |
| market_HARTFORD-NEW_HAVEN           |  0.146 |     1.157 | ***  |      0.010 |   0.000 |        0.126 |        0.165 |             1.135 |             1.179 |
| market_HOUSTON                      |  0.110 |     1.117 | ***  |      0.010 |   0.000 |        0.090 |        0.131 |             1.094 |             1.140 |
| market_INDIANAPOLIS                 |  0.041 |     1.042 | ***  |      0.013 |   0.000 |        0.017 |        0.066 |             1.017 |             1.068 |
| market_JACKSONVILLE                 |  0.105 |     1.111 | ***  |      0.012 |   0.000 |        0.082 |        0.128 |             1.086 |             1.137 |
| market_KANSAS_CITY                  |  0.070 |     1.073 | ***  |      0.013 |   0.000 |        0.045 |        0.095 |             1.046 |             1.100 |
| market_LITTLE_ROCK                  |  0.088 |     1.092 | ***  |      0.010 |   0.000 |        0.069 |        0.108 |             1.071 |             1.114 |
| market_LOS_ANGELES                  |  0.026 |     1.026 |  **  |      0.011 |   0.010 |        0.004 |        0.047 |             1.004 |             1.048 |
| market_LOUISVILLE                   |  0.063 |     1.065 | ***  |      0.012 |   0.000 |        0.039 |        0.087 |             1.040 |             1.091 |
| market_MEMPHIS                      |  0.127 |     1.136 | ***  |      0.009 |   0.000 |        0.109 |        0.146 |             1.115 |             1.157 |
| market_MIAMI                        |  0.106 |     1.112 | ***  |      0.011 |   0.000 |        0.084 |        0.129 |             1.088 |             1.137 |
| market_MILWAUKEE                    |  0.027 |     1.027 |  **  |      0.011 |   0.019 |        0.005 |        0.048 |             1.005 |             1.050 |
| market_MINNEAPOLIS                  |  0.129 |     1.138 | ***  |      0.011 |   0.000 |        0.108 |        0.150 |             1.115 |             1.162 |
| market_NASHVILLE                    |  0.142 |     1.152 | ***  |      0.011 |   0.000 |        0.120 |        0.163 |             1.128 |             1.177 |
| market_NEW_ORLEANS-MOBILE           |  0.118 |     1.125 | ***  |      0.011 |   0.000 |        0.097 |        0.140 |             1.101 |             1.150 |
| market_OKLAHOMA_CITY-TULSA          |  0.142 |     1.153 | ***  |      0.011 |   0.000 |        0.121 |        0.163 |             1.129 |             1.177 |
| market_OMAHA                        |  0.129 |     1.137 | ***  |      0.011 |   0.000 |        0.108 |        0.149 |             1.114 |             1.161 |
| market_ORLANDO                      |  0.096 |     1.101 | ***  |      0.013 |   0.000 |        0.071 |        0.121 |             1.073 |             1.129 |
| market_PHILADELPHIA                 |  0.114 |     1.121 | ***  |      0.010 |   0.000 |        0.096 |        0.133 |             1.100 |             1.142 |
| market_PHOENIX                      |  0.144 |     1.155 | ***  |      0.014 |   0.000 |        0.117 |        0.171 |             1.124 |             1.186 |
| market_PITTSBURGH                   |  0.098 |     1.102 | ***  |      0.012 |   0.000 |        0.074 |        0.122 |             1.076 |             1.129 |
| market_PORTLAND                     |  0.113 |     1.120 | ***  |      0.010 |   0.000 |        0.093 |        0.133 |             1.097 |             1.143 |
| market_RALEIGH-DURHAM               |  0.090 |     1.094 | ***  |      0.011 |   0.000 |        0.069 |        0.111 |             1.072 |             1.117 |
| market_RICHMOND                     |  0.041 |     1.041 | ***  |      0.015 |   0.000 |        0.012 |        0.069 |             1.012 |             1.072 |
| market_RURAL_ALABAMA                |  0.156 |     1.169 | ***  |      0.019 |   0.000 |        0.118 |        0.194 |             1.125 |             1.215 |
| market_RURAL_ARKANSAS               |  0.161 |     1.174 | ***  |      0.011 |   0.000 |        0.139 |        0.182 |             1.149 |             1.200 |
| market_RURAL_CALIFORNIA             |  0.041 |     1.042 | ***  |      0.046 |   0.000 |       -0.049 |        0.131 |             0.952 |             1.140 |
| market_RURAL_COLORADO               |  0.136 |     1.145 | ***  |      0.012 |   0.003 |        0.111 |        0.160 |             1.118 |             1.174 |
| market_RURAL_FLORIDA                |  0.050 |     1.052 | ***  |      0.013 |   0.000 |        0.025 |        0.076 |             1.025 |             1.079 |
| market_RURAL_GEORGIA                |  0.128 |     1.137 | ***  |      0.019 |   0.000 |        0.091 |        0.166 |             1.096 |             1.180 |
| market_RURAL_IDAHO                  |  0.135 |     1.144 | ***  |      0.010 |   0.000 |        0.114 |        0.155 |             1.121 |             1.168 |
| market_RURAL_ILLINOIS               |  0.013 |     1.013 |      |      0.013 |   0.228 |       -0.012 |        0.037 |             0.988 |             1.038 |
| market_RURAL_INDIANA                |  0.076 |     1.078 | ***  |      0.011 |   0.000 |        0.055 |        0.096 |             1.056 |             1.101 |
| market_RURAL_IOWA                   |  0.055 |     1.056 | ***  |      0.018 |   0.000 |        0.019 |        0.090 |             1.020 |             1.094 |
| market_RURAL_KANSAS                 |  0.133 |     1.142 | ***  |      0.016 |   0.000 |        0.102 |        0.165 |             1.107 |             1.179 |
| market_RURAL_KENTUCKY               |  0.156 |     1.169 | ***  |      0.014 |   0.000 |        0.129 |        0.183 |             1.138 |             1.200 |
| market_RURAL_LOUISIANA              |  0.055 |     1.057 | ***  |      0.014 |   0.000 |        0.028 |        0.082 |             1.028 |             1.086 |
| market_RURAL_MAINE                  |  0.088 |     1.092 | ***  |      0.011 |   0.000 |        0.065 |        0.110 |             1.068 |             1.116 |
| market_RURAL_MICHIGAN               |  0.081 |     1.084 | ***  |      0.020 |   0.000 |        0.042 |        0.119 |             1.043 |             1.127 |
| market_RURAL_MINNESOTA              |  0.169 |     1.185 | ***  |      0.014 |   0.000 |        0.142 |        0.197 |             1.152 |             1.218 |
| market_RURAL_MISSISSIPPI            |  0.039 |     1.040 | ***  |      0.012 |   0.006 |        0.016 |        0.062 |             1.016 |             1.064 |
| market_RURAL_MISSOURI               |  0.104 |     1.110 | ***  |      0.014 |   0.000 |        0.077 |        0.132 |             1.080 |             1.141 |
| market_RURAL_MONTANA                |  0.124 |     1.132 | ***  |      0.021 |   0.000 |        0.083 |        0.166 |             1.086 |             1.180 |
| market_RURAL_NEBRASKA               |  0.138 |     1.148 | ***  |      0.012 |   0.000 |        0.114 |        0.162 |             1.121 |             1.176 |
| market_RURAL_NEVADA                 |  0.051 |     1.052 | ***  |      0.044 |   0.000 |       -0.037 |        0.138 |             0.964 |             1.148 |
| market_RURAL_NEW_HAMPSHIRE          |  0.017 |     1.017 |      |      0.013 |   0.705 |       -0.009 |        0.043 |             0.991 |             1.043 |
| market_RURAL_NEW_MEXICO             |  0.148 |     1.160 | ***  |      0.060 |   0.000 |        0.030 |        0.266 |             1.030 |             1.305 |
| market_RURAL_NEW_YORK               | -0.010 |     0.990 |      |      0.011 |   0.862 |       -0.032 |        0.011 |             0.968 |             1.011 |
| market_RURAL_NORTH_CAROLINA         |  0.027 |     1.027 |  **  |      0.020 |   0.015 |       -0.013 |        0.067 |             0.987 |             1.069 |
| market_RURAL_NORTH_DAKOTA           |  0.222 |     1.248 | ***  |      0.015 |   0.000 |        0.192 |        0.252 |             1.212 |             1.286 |
| market_RURAL_OHIO                   |  0.093 |     1.098 | ***  |      0.031 |   0.000 |        0.033 |        0.154 |             1.033 |             1.167 |
| market_RURAL_OKLAHOMA               |  0.130 |     1.139 | ***  |      0.033 |   0.000 |        0.067 |        0.194 |             1.069 |             1.214 |
| market_RURAL_OREGON                 |  0.070 |     1.073 |  **  |      0.015 |   0.031 |        0.041 |        0.099 |             1.042 |             1.104 |
| market_RURAL_PENNSYLVANIA           |  0.132 |     1.141 | ***  |      0.010 |   0.000 |        0.112 |        0.152 |             1.118 |             1.164 |
| market_RURAL_SOUTH_CAROLINA         |  0.054 |     1.055 | ***  |      0.019 |   0.000 |        0.016 |        0.092 |             1.016 |             1.096 |
| market_RURAL_SOUTH_DAKOTA           |  0.076 |     1.079 | ***  |      0.013 |   0.000 |        0.050 |        0.101 |             1.051 |             1.107 |
| market_RURAL_TENNESSEE              |  0.170 |     1.185 | ***  |      0.010 |   0.000 |        0.150 |        0.189 |             1.162 |             1.208 |
| market_RURAL_TEXAS                  |  0.167 |     1.182 | ***  |      0.020 |   0.000 |        0.129 |        0.205 |             1.137 |             1.228 |
| market_RURAL_VERMONT                |  0.066 |     1.068 | ***  |      0.017 |   0.001 |        0.032 |        0.100 |             1.032 |             1.105 |
| market_RURAL_VIRGINIA               |  0.015 |     1.015 |      |      0.014 |   0.380 |       -0.012 |        0.043 |             0.988 |             1.044 |
| market_RURAL_WASHINGTON             |  0.114 |     1.121 | ***  |      0.015 |   0.000 |        0.084 |        0.143 |             1.088 |             1.154 |
| market_RURAL_WEST_VIRGINIA          | -0.041 |     0.960 | ***  |      0.010 |   0.007 |       -0.061 |       -0.021 |             0.941 |             0.980 |
| market_RURAL_WISCONSIN              |  0.037 |     1.038 | ***  |      0.036 |   0.000 |       -0.032 |        0.107 |             0.968 |             1.113 |
| market_RURAL_WYOMING                |  0.144 |     1.155 | ***  |      0.011 |   0.000 |        0.123 |        0.165 |             1.131 |             1.179 |
| market_SACRAMENTO                   |  0.018 |     1.019 |  *   |      0.014 |   0.085 |       -0.009 |        0.046 |             0.991 |             1.047 |
| market_SALT_LAKE_CITY               |  0.114 |     1.121 | ***  |      0.009 |   0.000 |        0.096 |        0.133 |             1.101 |             1.142 |
| market_SAN_ANTONIO                  |  0.133 |     1.142 | ***  |      0.012 |   0.000 |        0.110 |        0.155 |             1.116 |             1.168 |
| market_SAN_DIEGO                    |  0.010 |     1.010 |      |      0.011 |   0.385 |       -0.012 |        0.032 |             0.989 |             1.032 |
| market_SAN_FRANCISCO                |  0.066 |     1.068 | ***  |      0.011 |   0.000 |        0.044 |        0.087 |             1.045 |             1.091 |
| market_SEATTLE                      |  0.101 |     1.106 | ***  |      0.010 |   0.000 |        0.081 |        0.121 |             1.084 |             1.128 |
| market_ST_LOUIS                     |  0.038 |     1.038 | ***  |      0.013 |   0.000 |        0.011 |        0.064 |             1.011 |             1.066 |
| market_SURBURBAN_NJ                 | -0.010 |     0.990 |      |      0.013 |   0.459 |       -0.035 |        0.015 |             0.966 |             1.015 |
| market_SURBURBAN_NY                 |  0.097 |     1.102 | ***  |      0.015 |   0.000 |        0.069 |        0.126 |             1.071 |             1.134 |
| market_SYRACUSE                     | -0.040 |     0.961 | ***  |      0.009 |   0.006 |       -0.058 |       -0.022 |             0.943 |             0.978 |
| market_TAMPA                        |  0.099 |     1.104 | ***  |      0.011 |   0.000 |        0.077 |        0.121 |             1.080 |             1.129 |
| market_URBAN_NY                     |  0.172 |     1.188 | ***  |      0.011 |   0.000 |        0.151 |        0.194 |             1.163 |             1.214 |
| market_WASHINGTON_DC                |  0.091 |     1.095 | ***  |      0.022 |   0.000 |        0.048 |        0.134 |             1.049 |             1.144 |
| brand_BUSCH_LIGHT                   | -0.185 |     0.831 | ***  |      0.019 |   0.000 |       -0.222 |       -0.149 |             0.801 |             0.862 |
| brand_COORS_LIGHT                   |  0.019 |     1.019 |      |      0.017 |   0.315 |       -0.014 |        0.051 |             0.986 |             1.053 |
| brand_MILLER_LITE                   |  0.080 |     1.083 | ***  |      0.018 |   0.000 |        0.045 |        0.114 |             1.046 |             1.121 |
| brand_NATURAL_LIGHT                 | -0.601 |     0.548 | ***  |      0.002 |   0.000 |       -0.605 |       -0.597 |             0.546 |             0.550 |
| container_CAN                       | -0.052 |     0.949 | ***  |      0.004 |   0.000 |       -0.060 |       -0.044 |             0.942 |             0.957 |
| brand_BUSCH_LIGHT_*_log_beer_floz   | -0.013 |           | ***  |      0.003 |   0.001 |       -0.020 |       -0.007 |                   |                   |
| brand_COORS_LIGHT_*_log_beer_floz   | -0.004 |           |      |      0.003 |   0.194 |       -0.011 |        0.002 |                   |                   |
| brand_MILLER_LITE_*_log_beer_floz   | -0.017 |           | ***  |      0.003 |   0.000 |       -0.024 |       -0.011 |                   |                   |
| brand_NATURAL_LIGHT_*_log_beer_floz |  0.052 |           | ***  |      0.014 |   0.000 |        0.025 |        0.080 |                   |                   |
| Intercept                           | -2.093 |           | ***  |      0.002 |         |       -2.097 |              |            -2.089 |                   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Observations                        | 48,115 |           |      |            |         |              |              |                   |                   |
| R²                                  |  0.552 |           |      |            |         |              |              |                   |                   |
| RMSE                                |  0.169 |           |      |            |         |              |              |                   |                   |
+-------------------------------------+--------+-----------+------+------------+---------+--------------+--------------+-------------------+-------------------+

Model 3

# assembling predictors
conti_cols = ["log_beer_floz"]

assembler_predictors = (
    conti_cols +
    dummy_cols_market +
    dummy_cols_brand +
    dummy_cols_container +
    interaction_cols_brand_promo_quantity
)

assembler_3 = VectorAssembler(
    inputCols = assembler_predictors,
    outputCol = "predictors"
)

dtrain_3 = assembler_3.transform(dtrain)
dtest_3  = assembler_3.transform(dtest)

# training model
model_3 = (
    LinearRegression(featuresCol="predictors",
                     labelCol="log_price_floz")
    .fit(dtrain_3)
)

# making prediction - Question 4
dtest_3 = model_3.transform(dtest_3)

# makting regression table
print( regression_table(model_3, assembler_3) )
+--------------------------------------------------+--------+-----------+------+------------+---------+--------------+--------------+-------------------+-------------------+
| y: log_price_floz                                |   Beta | Exp(Beta) | Sig. | Std. Error | p-value | 95% CI Lower | 95% CI Upper | Exp(95% CI Lower) | Exp(95% CI Upper) |
+--------------------------------------------------+--------+-----------+------+------------+---------+--------------+--------------+-------------------+-------------------+
| log_beer_floz                                    | -0.140 |           | ***  |      0.012 |   0.000 |       -0.165 |       -0.116 |                   |                   |
| market_ALBANY                                    |  0.022 |     1.023 |  *   |      0.010 |   0.074 |        0.002 |        0.042 |             1.002 |             1.043 |
| market_ATLANTA                                   |  0.079 |     1.083 | ***  |      0.013 |   0.000 |        0.053 |        0.106 |             1.054 |             1.112 |
| market_BALTIMORE                                 |  0.093 |     1.098 | ***  |      0.010 |   0.000 |        0.073 |        0.114 |             1.075 |             1.120 |
| market_BIRMINGHAM                                |  0.124 |     1.133 | ***  |      0.011 |   0.000 |        0.103 |        0.146 |             1.108 |             1.157 |
| market_BOSTON                                    |  0.123 |     1.131 | ***  |      0.010 |   0.000 |        0.103 |        0.143 |             1.109 |             1.154 |
| market_CHARLOTTE                                 |  0.024 |     1.024 |  **  |      0.010 |   0.019 |        0.005 |        0.043 |             1.005 |             1.044 |
| market_CHICAGO                                   | -0.007 |     0.993 |      |      0.010 |   0.486 |       -0.027 |        0.013 |             0.974 |             1.013 |
| market_CINCINNATI                                |  0.077 |     1.080 | ***  |      0.010 |   0.000 |        0.057 |        0.097 |             1.058 |             1.102 |
| market_CLEVELAND                                 |  0.041 |     1.041 | ***  |      0.010 |   0.000 |        0.022 |        0.059 |             1.022 |             1.061 |
| market_COLUMBUS                                  |  0.066 |     1.068 | ***  |      0.010 |   0.000 |        0.047 |        0.085 |             1.048 |             1.088 |
| market_DALLAS                                    |  0.218 |     1.244 | ***  |      0.011 |   0.000 |        0.197 |        0.240 |             1.217 |             1.271 |
| market_DENVER                                    |  0.130 |     1.139 | ***  |      0.011 |   0.000 |        0.108 |        0.153 |             1.114 |             1.165 |
| market_DES_MOINES                                |  0.119 |     1.126 | ***  |      0.010 |   0.000 |        0.099 |        0.138 |             1.105 |             1.148 |
| market_DETROIT                                   |  0.084 |     1.088 | ***  |      0.016 |   0.000 |        0.053 |        0.116 |             1.054 |             1.123 |
| market_EXURBAN_NJ                                |  0.207 |     1.229 | ***  |      0.023 |   0.000 |        0.162 |        0.251 |             1.176 |             1.285 |
| market_EXURBAN_NY                                |  0.112 |     1.119 | ***  |      0.011 |   0.000 |        0.090 |        0.134 |             1.095 |             1.144 |
| market_GRAND_RAPIDS                              |  0.077 |     1.080 | ***  |      0.014 |   0.000 |        0.050 |        0.104 |             1.051 |             1.109 |
| market_HARTFORD-NEW_HAVEN                        |  0.143 |     1.154 | ***  |      0.010 |   0.000 |        0.124 |        0.162 |             1.132 |             1.176 |
| market_HOUSTON                                   |  0.114 |     1.121 | ***  |      0.010 |   0.000 |        0.094 |        0.134 |             1.098 |             1.143 |
| market_INDIANAPOLIS                              |  0.042 |     1.043 | ***  |      0.012 |   0.000 |        0.017 |        0.066 |             1.017 |             1.068 |
| market_JACKSONVILLE                              |  0.108 |     1.114 | ***  |      0.012 |   0.000 |        0.085 |        0.131 |             1.089 |             1.140 |
| market_KANSAS_CITY                               |  0.065 |     1.067 | ***  |      0.013 |   0.000 |        0.040 |        0.090 |             1.041 |             1.094 |
| market_LITTLE_ROCK                               |  0.085 |     1.088 | ***  |      0.010 |   0.000 |        0.065 |        0.104 |             1.067 |             1.109 |
| market_LOS_ANGELES                               |  0.034 |     1.035 | ***  |      0.011 |   0.000 |        0.013 |        0.056 |             1.013 |             1.058 |
| market_LOUISVILLE                                |  0.067 |     1.069 | ***  |      0.012 |   0.000 |        0.043 |        0.091 |             1.044 |             1.095 |
| market_MEMPHIS                                   |  0.121 |     1.128 | ***  |      0.009 |   0.000 |        0.102 |        0.139 |             1.108 |             1.149 |
| market_MIAMI                                     |  0.107 |     1.113 | ***  |      0.011 |   0.000 |        0.085 |        0.129 |             1.088 |             1.138 |
| market_MILWAUKEE                                 |  0.029 |     1.029 |  **  |      0.011 |   0.012 |        0.007 |        0.050 |             1.007 |             1.051 |
| market_MINNEAPOLIS                               |  0.125 |     1.134 | ***  |      0.011 |   0.000 |        0.105 |        0.146 |             1.110 |             1.157 |
| market_NASHVILLE                                 |  0.140 |     1.150 | ***  |      0.011 |   0.000 |        0.118 |        0.161 |             1.126 |             1.175 |
| market_NEW_ORLEANS-MOBILE                        |  0.112 |     1.118 | ***  |      0.011 |   0.000 |        0.090 |        0.134 |             1.095 |             1.143 |
| market_OKLAHOMA_CITY-TULSA                       |  0.135 |     1.144 | ***  |      0.011 |   0.000 |        0.114 |        0.156 |             1.121 |             1.169 |
| market_OMAHA                                     |  0.130 |     1.139 | ***  |      0.010 |   0.000 |        0.110 |        0.151 |             1.116 |             1.162 |
| market_ORLANDO                                   |  0.099 |     1.104 | ***  |      0.013 |   0.000 |        0.073 |        0.124 |             1.076 |             1.132 |
| market_PHILADELPHIA                              |  0.103 |     1.108 | ***  |      0.009 |   0.000 |        0.084 |        0.121 |             1.088 |             1.129 |
| market_PHOENIX                                   |  0.153 |     1.165 | ***  |      0.014 |   0.000 |        0.126 |        0.180 |             1.134 |             1.197 |
| market_PITTSBURGH                                |  0.091 |     1.096 | ***  |      0.012 |   0.000 |        0.067 |        0.115 |             1.070 |             1.122 |
| market_PORTLAND                                  |  0.116 |     1.123 | ***  |      0.010 |   0.000 |        0.095 |        0.136 |             1.100 |             1.146 |
| market_RALEIGH-DURHAM                            |  0.084 |     1.088 | ***  |      0.011 |   0.000 |        0.064 |        0.105 |             1.066 |             1.111 |
| market_RICHMOND                                  |  0.035 |     1.035 | ***  |      0.014 |   0.001 |        0.007 |        0.063 |             1.007 |             1.065 |
| market_RURAL_ALABAMA                             |  0.151 |     1.163 | ***  |      0.019 |   0.000 |        0.113 |        0.189 |             1.120 |             1.208 |
| market_RURAL_ARKANSAS                            |  0.152 |     1.164 | ***  |      0.011 |   0.000 |        0.130 |        0.173 |             1.139 |             1.189 |
| market_RURAL_CALIFORNIA                          |  0.044 |     1.045 | ***  |      0.046 |   0.000 |       -0.046 |        0.133 |             0.955 |             1.142 |
| market_RURAL_COLORADO                            |  0.144 |     1.155 | ***  |      0.012 |   0.002 |        0.120 |        0.169 |             1.128 |             1.184 |
| market_RURAL_FLORIDA                             |  0.047 |     1.049 | ***  |      0.013 |   0.000 |        0.022 |        0.073 |             1.022 |             1.076 |
| market_RURAL_GEORGIA                             |  0.122 |     1.130 | ***  |      0.019 |   0.000 |        0.085 |        0.159 |             1.089 |             1.172 |
| market_RURAL_IDAHO                               |  0.134 |     1.144 | ***  |      0.010 |   0.000 |        0.114 |        0.154 |             1.121 |             1.167 |
| market_RURAL_ILLINOIS                            |  0.010 |     1.010 |      |      0.013 |   0.316 |       -0.014 |        0.035 |             0.986 |             1.036 |
| market_RURAL_INDIANA                             |  0.075 |     1.077 | ***  |      0.011 |   0.000 |        0.054 |        0.095 |             1.055 |             1.100 |
| market_RURAL_IOWA                                |  0.051 |     1.052 | ***  |      0.018 |   0.000 |        0.016 |        0.086 |             1.016 |             1.090 |
| market_RURAL_KANSAS                              |  0.125 |     1.133 | ***  |      0.016 |   0.000 |        0.093 |        0.156 |             1.098 |             1.169 |
| market_RURAL_KENTUCKY                            |  0.152 |     1.164 | ***  |      0.014 |   0.000 |        0.125 |        0.178 |             1.134 |             1.195 |
| market_RURAL_LOUISIANA                           |  0.045 |     1.046 | ***  |      0.014 |   0.001 |        0.018 |        0.072 |             1.018 |             1.075 |
| market_RURAL_MAINE                               |  0.086 |     1.090 | ***  |      0.011 |   0.000 |        0.064 |        0.108 |             1.066 |             1.114 |
| market_RURAL_MICHIGAN                            |  0.076 |     1.079 | ***  |      0.020 |   0.000 |        0.037 |        0.114 |             1.038 |             1.121 |
| market_RURAL_MINNESOTA                           |  0.162 |     1.176 | ***  |      0.014 |   0.000 |        0.134 |        0.189 |             1.144 |             1.208 |
| market_RURAL_MISSISSIPPI                         |  0.038 |     1.039 | ***  |      0.012 |   0.006 |        0.015 |        0.061 |             1.015 |             1.063 |
| market_RURAL_MISSOURI                            |  0.097 |     1.102 | ***  |      0.014 |   0.000 |        0.070 |        0.125 |             1.072 |             1.133 |
| market_RURAL_MONTANA                             |  0.130 |     1.139 | ***  |      0.021 |   0.000 |        0.089 |        0.171 |             1.093 |             1.187 |
| market_RURAL_NEBRASKA                            |  0.133 |     1.142 | ***  |      0.012 |   0.000 |        0.109 |        0.156 |             1.115 |             1.169 |
| market_RURAL_NEVADA                              |  0.048 |     1.049 | ***  |      0.044 |   0.000 |       -0.038 |        0.134 |             0.962 |             1.144 |
| market_RURAL_NEW_HAMPSHIRE                       |  0.011 |     1.011 |      |      0.013 |   0.796 |       -0.014 |        0.037 |             0.986 |             1.038 |
| market_RURAL_NEW_MEXICO                          |  0.145 |     1.156 | ***  |      0.060 |   0.000 |        0.027 |        0.262 |             1.028 |             1.299 |
| market_RURAL_NEW_YORK                            | -0.024 |     0.976 |      |      0.011 |   0.686 |       -0.046 |       -0.003 |             0.955 |             0.997 |
| market_RURAL_NORTH_CAROLINA                      |  0.014 |     1.014 |      |      0.020 |   0.195 |       -0.025 |        0.054 |             0.975 |             1.055 |
| market_RURAL_NORTH_DAKOTA                        |  0.220 |     1.246 | ***  |      0.015 |   0.000 |        0.190 |        0.249 |             1.209 |             1.283 |
| market_RURAL_OHIO                                |  0.090 |     1.094 | ***  |      0.031 |   0.000 |        0.030 |        0.150 |             1.030 |             1.162 |
| market_RURAL_OKLAHOMA                            |  0.120 |     1.128 | ***  |      0.032 |   0.000 |        0.057 |        0.183 |             1.059 |             1.201 |
| market_RURAL_OREGON                              |  0.071 |     1.074 |  **  |      0.014 |   0.027 |        0.043 |        0.100 |             1.044 |             1.105 |
| market_RURAL_PENNSYLVANIA                        |  0.122 |     1.130 | ***  |      0.010 |   0.000 |        0.102 |        0.142 |             1.108 |             1.152 |
| market_RURAL_SOUTH_CAROLINA                      |  0.056 |     1.057 | ***  |      0.019 |   0.000 |        0.018 |        0.093 |             1.018 |             1.098 |
| market_RURAL_SOUTH_DAKOTA                        |  0.071 |     1.074 | ***  |      0.013 |   0.000 |        0.046 |        0.097 |             1.047 |             1.102 |
| market_RURAL_TENNESSEE                           |  0.174 |     1.190 | ***  |      0.010 |   0.000 |        0.155 |        0.193 |             1.168 |             1.213 |
| market_RURAL_TEXAS                               |  0.164 |     1.179 | ***  |      0.019 |   0.000 |        0.126 |        0.202 |             1.135 |             1.224 |
| market_RURAL_VERMONT                             |  0.067 |     1.069 | ***  |      0.017 |   0.001 |        0.033 |        0.100 |             1.034 |             1.106 |
| market_RURAL_VIRGINIA                            |  0.011 |     1.011 |      |      0.014 |   0.536 |       -0.017 |        0.038 |             0.983 |             1.039 |
| market_RURAL_WASHINGTON                          |  0.131 |     1.140 | ***  |      0.015 |   0.000 |        0.102 |        0.160 |             1.107 |             1.174 |
| market_RURAL_WEST_VIRGINIA                       | -0.050 |     0.951 | ***  |      0.010 |   0.001 |       -0.070 |       -0.031 |             0.932 |             0.970 |
| market_RURAL_WISCONSIN                           |  0.036 |     1.036 | ***  |      0.035 |   0.000 |       -0.033 |        0.105 |             0.967 |             1.110 |
| market_RURAL_WYOMING                             |  0.138 |     1.148 | ***  |      0.011 |   0.000 |        0.118 |        0.159 |             1.125 |             1.173 |
| market_SACRAMENTO                                |  0.026 |     1.026 |  **  |      0.014 |   0.015 |       -0.002 |        0.053 |             0.998 |             1.055 |
| market_SALT_LAKE_CITY                            |  0.113 |     1.120 | ***  |      0.009 |   0.000 |        0.095 |        0.131 |             1.099 |             1.140 |
| market_SAN_ANTONIO                               |  0.129 |     1.138 | ***  |      0.012 |   0.000 |        0.106 |        0.152 |             1.112 |             1.164 |
| market_SAN_DIEGO                                 |  0.015 |     1.015 |      |      0.011 |   0.197 |       -0.007 |        0.036 |             0.993 |             1.037 |
| market_SAN_FRANCISCO                             |  0.073 |     1.076 | ***  |      0.011 |   0.000 |        0.052 |        0.094 |             1.053 |             1.099 |
| market_SEATTLE                                   |  0.113 |     1.120 | ***  |      0.010 |   0.000 |        0.094 |        0.133 |             1.098 |             1.143 |
| market_ST_LOUIS                                  |  0.041 |     1.042 | ***  |      0.013 |   0.000 |        0.015 |        0.067 |             1.015 |             1.069 |
| market_SURBURBAN_NJ                              | -0.021 |     0.979 |      |      0.012 |   0.109 |       -0.046 |        0.003 |             0.955 |             1.003 |
| market_SURBURBAN_NY                              |  0.097 |     1.102 | ***  |      0.014 |   0.000 |        0.069 |        0.125 |             1.071 |             1.133 |
| market_SYRACUSE                                  | -0.048 |     0.953 | ***  |      0.009 |   0.001 |       -0.066 |       -0.030 |             0.936 |             0.970 |
| market_TAMPA                                     |  0.100 |     1.105 | ***  |      0.011 |   0.000 |        0.078 |        0.122 |             1.081 |             1.130 |
| market_URBAN_NY                                  |  0.170 |     1.186 | ***  |      0.011 |   0.000 |        0.149 |        0.192 |             1.160 |             1.212 |
| market_WASHINGTON_DC                             |  0.085 |     1.089 | ***  |      0.023 |   0.000 |        0.040 |        0.130 |             1.041 |             1.139 |
| brand_BUSCH_LIGHT                                | -0.149 |     0.862 | ***  |      0.019 |   0.000 |       -0.187 |       -0.111 |             0.829 |             0.895 |
| brand_COORS_LIGHT                                |  0.042 |     1.043 |  **  |      0.017 |   0.029 |        0.008 |        0.077 |             1.008 |             1.080 |
| brand_MILLER_LITE                                |  0.111 |     1.117 | ***  |      0.019 |   0.000 |        0.074 |        0.147 |             1.077 |             1.159 |
| brand_NATURAL_LIGHT                              | -0.519 |     0.595 | ***  |      0.002 |   0.000 |       -0.523 |       -0.516 |             0.593 |             0.597 |
| container_CAN                                    | -0.053 |     0.948 | ***  |      0.061 |   0.000 |       -0.173 |        0.067 |             0.841 |             1.069 |
| brand_BUSCH_LIGHT_*_promo_True                   | -0.253 |     0.776 | ***  |      0.044 |   0.000 |       -0.341 |       -0.166 |             0.711 |             0.847 |
| brand_COORS_LIGHT_*_promo_True                   | -0.227 |     0.797 | ***  |      0.037 |   0.000 |       -0.299 |       -0.155 |             0.741 |             0.856 |
| brand_MILLER_LITE_*_promo_True                   | -0.286 |     0.751 | ***  |      0.035 |   0.000 |       -0.354 |       -0.218 |             0.702 |             0.804 |
| brand_NATURAL_LIGHT_*_promo_True                 | -0.400 |     0.671 | ***  |      0.004 |   0.000 |       -0.408 |       -0.391 |             0.665 |             0.676 |
| brand_BUSCH_LIGHT_*_log_beer_floz                | -0.021 |           | ***  |      0.004 |   0.000 |       -0.028 |       -0.014 |                   |                   |
| brand_COORS_LIGHT_*_log_beer_floz                | -0.008 |           |  **  |      0.003 |   0.026 |       -0.015 |       -0.002 |                   |                   |
| brand_MILLER_LITE_*_log_beer_floz                | -0.023 |           | ***  |      0.003 |   0.000 |       -0.030 |       -0.016 |                   |                   |
| brand_NATURAL_LIGHT_*_log_beer_floz              |  0.037 |           | ***  |      0.001 |   0.000 |        0.036 |        0.038 |                   |                   |
| promo_True_*_log_beer_floz                       | -0.008 |           | ***  |      0.011 |   0.000 |       -0.029 |        0.012 |                   |                   |
| brand_BUSCH_LIGHT_*_promo_True_*_log_beer_floz   |  0.047 |           | ***  |      0.008 |   0.000 |        0.031 |        0.063 |                   |                   |
| brand_COORS_LIGHT_*_promo_True_*_log_beer_floz   |  0.037 |           | ***  |      0.007 |   0.000 |        0.024 |        0.050 |                   |                   |
| brand_MILLER_LITE_*_promo_True_*_log_beer_floz   |  0.052 |           | ***  |      0.006 |   0.000 |        0.040 |        0.064 |                   |                   |
| brand_NATURAL_LIGHT_*_promo_True_*_log_beer_floz |  0.071 |           | ***  |      0.014 |   0.000 |        0.043 |        0.098 |                   |                   |
| Intercept                                        | -2.111 |           | ***  |      0.002 |         |       -2.115 |              |            -2.107 |                   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Observations                                     | 48,115 |           |      |            |         |              |              |                   |                   |
| R²                                               |  0.560 |           |      |            |         |              |              |                   |                   |
| RMSE                                             |  0.167 |           |      |            |         |              |              |                   |                   |
+--------------------------------------------------+--------+-----------+------+------------+---------+--------------+--------------+-------------------+-------------------+

Model Comparison

Question 4

print(
    compare_reg_models(
        [model_1, model_2, model_3],
        [assembler_1, assembler_2, assembler_3]
        )
    )
----------------------------------------------------------------------------------------------------------------------------------------------------
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| Predictor                                        |   y: log_price_floz (Model 1) |   y: log_price_floz (Model 2) |   y: log_price_floz (Model 3) |
----------------------------------------------------------------------------------------------------------------------------------------------------
+==================================================+===============================+===============================+===============================+
| log_beer_floz                                    |                     -0.142*** |                     -0.146*** |                     -0.140*** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_ALBANY                                    |               0.027** / 1.027 |               0.029** / 1.029 |                0.022* / 1.023 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_ATLANTA                                   |              0.083*** / 1.087 |              0.083*** / 1.087 |              0.079*** / 1.083 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_BALTIMORE                                 |              0.100*** / 1.105 |              0.104*** / 1.109 |              0.093*** / 1.098 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_BIRMINGHAM                                |              0.124*** / 1.132 |              0.130*** / 1.139 |              0.124*** / 1.133 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_BOSTON                                    |              0.127*** / 1.136 |              0.127*** / 1.135 |              0.123*** / 1.131 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_CHARLOTTE                                 |                0.020* / 1.020 |                 0.015 / 1.015 |               0.024** / 1.024 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_CHICAGO                                   |                -0.008 / 0.992 |                -0.013 / 0.987 |                -0.007 / 0.993 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_CINCINNATI                                |              0.084*** / 1.088 |              0.079*** / 1.082 |              0.077*** / 1.080 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_CLEVELAND                                 |              0.050*** / 1.051 |              0.045*** / 1.046 |              0.041*** / 1.041 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_COLUMBUS                                  |              0.069*** / 1.072 |              0.066*** / 1.068 |              0.066*** / 1.068 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_DALLAS                                    |              0.203*** / 1.226 |              0.214*** / 1.238 |              0.218*** / 1.244 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_DENVER                                    |              0.123*** / 1.131 |              0.121*** / 1.129 |              0.130*** / 1.139 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_DES_MOINES                                |              0.129*** / 1.138 |              0.125*** / 1.134 |              0.119*** / 1.126 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_DETROIT                                   |              0.087*** / 1.091 |              0.083*** / 1.086 |              0.084*** / 1.088 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_EXURBAN_NJ                                |              0.221*** / 1.247 |              0.216*** / 1.242 |              0.207*** / 1.229 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_EXURBAN_NY                                |              0.122*** / 1.130 |              0.119*** / 1.126 |              0.112*** / 1.119 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_GRAND_RAPIDS                              |              0.082*** / 1.086 |              0.078*** / 1.081 |              0.077*** / 1.080 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_HARTFORD-NEW_HAVEN                        |              0.148*** / 1.160 |              0.146*** / 1.157 |              0.143*** / 1.154 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_HOUSTON                                   |              0.113*** / 1.120 |              0.110*** / 1.117 |              0.114*** / 1.121 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_INDIANAPOLIS                              |              0.042*** / 1.043 |              0.041*** / 1.042 |              0.042*** / 1.043 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_JACKSONVILLE                              |              0.113*** / 1.120 |              0.105*** / 1.111 |              0.108*** / 1.114 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_KANSAS_CITY                               |              0.076*** / 1.079 |              0.070*** / 1.073 |              0.065*** / 1.067 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_LITTLE_ROCK                               |              0.092*** / 1.096 |              0.088*** / 1.092 |              0.085*** / 1.088 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_LOS_ANGELES                               |              0.032*** / 1.033 |               0.026** / 1.026 |              0.034*** / 1.035 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_LOUISVILLE                                |              0.068*** / 1.070 |              0.063*** / 1.065 |              0.067*** / 1.069 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_MEMPHIS                                   |              0.128*** / 1.137 |              0.127*** / 1.136 |              0.121*** / 1.128 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_MIAMI                                     |              0.108*** / 1.114 |              0.106*** / 1.112 |              0.107*** / 1.113 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_MILWAUKEE                                 |               0.028** / 1.028 |               0.027** / 1.027 |               0.029** / 1.029 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_MINNEAPOLIS                               |              0.128*** / 1.137 |              0.129*** / 1.138 |              0.125*** / 1.134 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_NASHVILLE                                 |              0.143*** / 1.153 |              0.142*** / 1.152 |              0.140*** / 1.150 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_NEW_ORLEANS-MOBILE                        |              0.128*** / 1.136 |              0.118*** / 1.125 |              0.112*** / 1.118 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_OKLAHOMA_CITY-TULSA                       |              0.145*** / 1.156 |              0.142*** / 1.153 |              0.135*** / 1.144 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_OMAHA                                     |              0.131*** / 1.140 |              0.129*** / 1.137 |              0.130*** / 1.139 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_ORLANDO                                   |              0.098*** / 1.103 |              0.096*** / 1.101 |              0.099*** / 1.104 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_PHILADELPHIA                              |              0.115*** / 1.121 |              0.114*** / 1.121 |              0.103*** / 1.108 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_PHOENIX                                   |              0.142*** / 1.152 |              0.144*** / 1.155 |              0.153*** / 1.165 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_PITTSBURGH                                |              0.100*** / 1.105 |              0.098*** / 1.102 |              0.091*** / 1.096 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_PORTLAND                                  |              0.115*** / 1.122 |              0.113*** / 1.120 |              0.116*** / 1.123 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RALEIGH-DURHAM                            |              0.090*** / 1.094 |              0.090*** / 1.094 |              0.084*** / 1.088 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RICHMOND                                  |              0.043*** / 1.044 |              0.041*** / 1.041 |              0.035*** / 1.035 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_ALABAMA                             |              0.157*** / 1.170 |              0.156*** / 1.169 |              0.151*** / 1.163 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_ARKANSAS                            |              0.157*** / 1.171 |              0.161*** / 1.174 |              0.152*** / 1.164 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_CALIFORNIA                          |              0.044*** / 1.045 |              0.041*** / 1.042 |              0.044*** / 1.045 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_COLORADO                            |              0.137*** / 1.147 |              0.136*** / 1.145 |              0.144*** / 1.155 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_FLORIDA                             |              0.059*** / 1.061 |              0.050*** / 1.052 |              0.047*** / 1.049 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_GEORGIA                             |              0.132*** / 1.141 |              0.128*** / 1.137 |              0.122*** / 1.130 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_IDAHO                               |              0.142*** / 1.152 |              0.135*** / 1.144 |              0.134*** / 1.144 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_ILLINOIS                            |                 0.014 / 1.014 |                 0.013 / 1.013 |                 0.010 / 1.010 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_INDIANA                             |              0.073*** / 1.076 |              0.076*** / 1.078 |              0.075*** / 1.077 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_IOWA                                |              0.058*** / 1.060 |              0.055*** / 1.056 |              0.051*** / 1.052 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_KANSAS                              |              0.134*** / 1.143 |              0.133*** / 1.142 |              0.125*** / 1.133 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_KENTUCKY                            |              0.157*** / 1.170 |              0.156*** / 1.169 |              0.152*** / 1.164 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_LOUISIANA                           |              0.060*** / 1.061 |              0.055*** / 1.057 |              0.045*** / 1.046 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_MAINE                               |              0.091*** / 1.096 |              0.088*** / 1.092 |              0.086*** / 1.090 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_MICHIGAN                            |              0.085*** / 1.089 |              0.081*** / 1.084 |              0.076*** / 1.079 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_MINNESOTA                           |              0.165*** / 1.180 |              0.169*** / 1.185 |              0.162*** / 1.176 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_MISSISSIPPI                         |              0.043*** / 1.044 |              0.039*** / 1.040 |              0.038*** / 1.039 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_MISSOURI                            |              0.106*** / 1.112 |              0.104*** / 1.110 |              0.097*** / 1.102 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_MONTANA                             |              0.127*** / 1.135 |              0.124*** / 1.132 |              0.130*** / 1.139 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_NEBRASKA                            |              0.138*** / 1.148 |              0.138*** / 1.148 |              0.133*** / 1.142 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_NEVADA                              |              0.051*** / 1.052 |              0.051*** / 1.052 |              0.048*** / 1.049 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_NEW_HAMPSHIRE                       |                 0.028 / 1.028 |                 0.017 / 1.017 |                 0.011 / 1.011 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_NEW_MEXICO                          |              0.154*** / 1.166 |              0.148*** / 1.160 |              0.145*** / 1.156 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_NEW_YORK                            |                -0.013 / 0.987 |                -0.010 / 0.990 |                -0.024 / 0.976 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_NORTH_CAROLINA                      |                -0.002 / 0.998 |               0.027** / 1.027 |                 0.014 / 1.014 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_NORTH_DAKOTA                        |              0.223*** / 1.250 |              0.222*** / 1.248 |              0.220*** / 1.246 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_OHIO                                |              0.096*** / 1.100 |              0.093*** / 1.098 |              0.090*** / 1.094 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_OKLAHOMA                            |              0.130*** / 1.139 |              0.130*** / 1.139 |              0.120*** / 1.128 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_OREGON                              |               0.074** / 1.077 |               0.070** / 1.073 |               0.071** / 1.074 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_PENNSYLVANIA                        |              0.131*** / 1.140 |              0.132*** / 1.141 |              0.122*** / 1.130 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_SOUTH_CAROLINA                      |              0.055*** / 1.056 |              0.054*** / 1.055 |              0.056*** / 1.057 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_SOUTH_DAKOTA                        |              0.077*** / 1.081 |              0.076*** / 1.079 |              0.071*** / 1.074 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_TENNESSEE                           |              0.170*** / 1.186 |              0.170*** / 1.185 |              0.174*** / 1.190 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_TEXAS                               |              0.169*** / 1.184 |              0.167*** / 1.182 |              0.164*** / 1.179 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_VERMONT                             |              0.077*** / 1.080 |              0.066*** / 1.068 |              0.067*** / 1.069 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_VIRGINIA                            |                 0.018 / 1.019 |                 0.015 / 1.015 |                 0.011 / 1.011 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_WASHINGTON                          |              0.116*** / 1.123 |              0.114*** / 1.121 |              0.131*** / 1.140 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_WEST_VIRGINIA                       |              -0.037** / 0.964 |             -0.041*** / 0.960 |             -0.050*** / 0.951 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_WISCONSIN                           |              0.039*** / 1.040 |              0.037*** / 1.038 |              0.036*** / 1.036 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_RURAL_WYOMING                             |              0.147*** / 1.158 |              0.144*** / 1.155 |              0.138*** / 1.148 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_SACRAMENTO                                |                0.020* / 1.020 |                0.018* / 1.019 |               0.026** / 1.026 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_SALT_LAKE_CITY                            |              0.120*** / 1.128 |              0.114*** / 1.121 |              0.113*** / 1.120 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_SAN_ANTONIO                               |              0.138*** / 1.148 |              0.133*** / 1.142 |              0.129*** / 1.138 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_SAN_DIEGO                                 |                 0.012 / 1.012 |                 0.010 / 1.010 |                 0.015 / 1.015 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_SAN_FRANCISCO                             |              0.069*** / 1.071 |              0.066*** / 1.068 |              0.073*** / 1.076 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_SEATTLE                                   |              0.109*** / 1.116 |              0.101*** / 1.106 |              0.113*** / 1.120 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_ST_LOUIS                                  |              0.042*** / 1.042 |              0.038*** / 1.038 |              0.041*** / 1.042 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_SURBURBAN_NJ                              |                -0.008 / 0.992 |                -0.010 / 0.990 |                -0.021 / 0.979 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_SURBURBAN_NY                              |              0.101*** / 1.106 |              0.097*** / 1.102 |              0.097*** / 1.102 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_SYRACUSE                                  |              -0.035** / 0.966 |             -0.040*** / 0.961 |             -0.048*** / 0.953 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_TAMPA                                     |              0.103*** / 1.109 |              0.099*** / 1.104 |              0.100*** / 1.105 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_URBAN_NY                                  |              0.173*** / 1.189 |              0.172*** / 1.188 |              0.170*** / 1.186 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| market_WASHINGTON_DC                             |              0.097*** / 1.102 |              0.091*** / 1.095 |              0.085*** / 1.089 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_BUSCH_LIGHT                                |             -0.260*** / 0.771 |             -0.185*** / 0.831 |             -0.149*** / 0.862 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_COORS_LIGHT                                |              -0.005** / 0.995 |                 0.019 / 1.019 |               0.042** / 1.043 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_MILLER_LITE                                |             -0.013*** / 0.987 |              0.080*** / 1.083 |              0.111*** / 1.117 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_NATURAL_LIGHT                              |             -0.319*** / 0.727 |             -0.601*** / 0.548 |             -0.519*** / 0.595 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| container_CAN                                    |             -0.053*** / 0.948 |             -0.052*** / 0.949 |             -0.053*** / 0.948 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_BUSCH_LIGHT_*_log_beer_floz                |                               |                     -0.013*** |                     -0.021*** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_COORS_LIGHT_*_log_beer_floz                |                               |                        -0.004 |                      -0.008** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_MILLER_LITE_*_log_beer_floz                |                               |                     -0.017*** |                     -0.023*** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_NATURAL_LIGHT_*_log_beer_floz              |                               |                      0.052*** |                      0.037*** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_BUSCH_LIGHT_*_promo_True                   |                               |                               |             -0.253*** / 0.776 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_COORS_LIGHT_*_promo_True                   |                               |                               |             -0.227*** / 0.797 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_MILLER_LITE_*_promo_True                   |                               |                               |             -0.286*** / 0.751 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_NATURAL_LIGHT_*_promo_True                 |                               |                               |             -0.400*** / 0.671 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| promo_True_*_log_beer_floz                       |                               |                               |                     -0.008*** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_BUSCH_LIGHT_*_promo_True_*_log_beer_floz   |                               |                               |                      0.047*** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_COORS_LIGHT_*_promo_True_*_log_beer_floz   |                               |                               |                      0.037*** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_MILLER_LITE_*_promo_True_*_log_beer_floz   |                               |                               |                      0.052*** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| brand_NATURAL_LIGHT_*_promo_True_*_log_beer_floz |                               |                               |                      0.071*** |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| Intercept                                        |                     -2.117*** |                     -2.093*** |                     -2.111*** |
====================================================================================================================================================
| Observations                                     |                        48,115 |                        48,115 |                        48,115 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| R²                                               |                         0.547 |                         0.552 |                         0.560 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
| RMSE                                             |                         0.170 |                         0.169 |                         0.167 |
+--------------------------------------------------+-------------------------------+-------------------------------+-------------------------------+
----------------------------------------------------------------------------------------------------------------------------------------------------

Question 4 - RMSEs on Test Data

print(compare_rmse([dtest_1, dtest_2, dtest_3], "log_price_floz"))
+------+-----------+-----------+-----------+
|      |   Model 1 |   Model 2 |   Model 3 |
+======+===========+===========+===========+
| RMSE |     0.167 |     0.166 |     0.165 |
+------+-----------+-----------+-----------+

Question 5

  • Below provides a list of beta esimates and exponential function of those beta estimates from Model 3:
    • market_ALBANY (\(\hat{\beta}=0.022^*\) / \(e^{\hat{\beta}}=1.023\))
      Ceteris paribus, beer prices in Albany are 2.3 % higher than in the Buffalo–Rochester market.

    • market_EXURBAN_NY (\(\hat{\beta}=0.112^{***}\) / \(e^{\hat{\beta}}=1.119\))
      Ceteris paribus, beer prices in Exurban NY are 11.9 % higher than in the Buffalo–Rochester market.

    • market_RURAL_NEW_YORK (\(\hat{\beta}=-0.024\) / \(e^{\hat{\beta}}=0.976\))
      Ceteris paribus, beer prices in Rural NY are not statistically different than in the Buffalo–Rochester market.

    • market_SUBURBAN_NY (\(\hat{\beta}=0.097^{***}\) / \(e^{\hat{\beta}}=1.102\))
      Ceteris paribus, beer prices in Suburban NY are 10.2 % higher than in the Buffalo–Rochester market.

    • market_SYRACUSE (\(\hat{\beta}=-0.048^{***}\) / \(e^{\hat{\beta}}=0.953\))
      Ceteris paribus, beer prices in Syracuse are 4.7 % lower than in the Buffalo–Rochester market.

    • market_URBAN_NY (\(\hat{\beta}=0.170^{***}\) / \(e^{\hat{\beta}}=1.186\))
      Ceteris paribus, beer prices in Urban NY are 18.6 % higher than in the Buffalo–Rochester market.

Question 6

  • We should focus on the beta esimates for predictors with \(\log(beer floz)\) to calculate the inverse price elasticity of beer demand across brands.
Predictor Model 1 Model 2 Model 3
log_beer_floz -0.142*** -0.146*** -0.140***
brand_BUSCH_LIGHT_*_log_beer_floz -0.013*** -0.021***
brand_COORS_LIGHT_*_log_beer_floz -0.004 -0.008**
brand_MILLER_LITE_*_log_beer_floz -0.017*** -0.023***
brand_NATURAL_LIGHT_*_log_beer_floz 0.052*** 0.037***
promo_True_*_log_beer_floz -0.008***
brand_BUSCH_LIGHT_promo_True_log_beer_floz 0.047***
brand_COORS_LIGHT_promo_True_log_beer_floz 0.037***
brand_MILLER_LITE_promo_True_log_beer_floz 0.052***
brand_NATURAL_LIGHT_promo_True_log_beer_floz 0.071***
Model 1 Model 2 Model 3 (no Promo) Model 3 (with Promo)
BUD -0.142 -0.146 -0.140 -0.148 
= −0.140 − 0.008
BUSCH -0.142 -0.159 
= −0.146 − 0.013
-0.161 
= −0.140 − 0.021
-0.122 
= −0.140 − 0.021 − 0.008 + 0.047
COORS -0.142 -0.146 
= −0.146 − 0
-0.148 
= −0.140 − 0.008
-0.119 
= −0.140 − 0.008 − 0.008 + 0.037
MILLER -0.142 -0.163 
= −0.146 − 0.017
-0.163 
= −0.140 − 0.023
-0.119 
= −0.140 − 0.023 − 0.008 + 0.052
NATURAL -0.142 -0.094 
= −0.146 + 0.052
-0.103 
= −0.140 + 0.037
-0.040 
= −0.140 + 0.037 − 0.008 + 0.071
  • Model 1
    • A 1% increase in sales volume (across any of the five brands) is associated with a 0.142% decrease in price.
  • Model 2
    • A 1% increase in BUD sales volume is associated with a 0.146% decrease in its price.
    • A 1% increase in BUSCH sales volume is associated with a 0.159 decrease in its price.
    • A 1% increase in COORS sales volume is associated with a 0.146% decrease in its price.
    • A 1% increase in MILLER sales volume is associated with a 0.163% decrease in its price.
    • A 1% increase in NATURAL sales volume is associated with a 0.094% decrease in its price.
  • Model 3 (no Promo)
    • A 1% increase in BUD sales volume is associated with a 0.140% decrease in its price.
    • A 1% increase in BUSCH sales volume is associated with a 0.161% decrease in its price.
    • A 1% increase in COORS sales volume is associated with a 0.148% decrease in its price.
    • A 1% increase in MILLER sales volume is associated with a 0.163% decrease in its price.
    • A 1% increase in NATURAL sales volume is associated with a 0.103% decrease in its price.
  • Model 3 (with Promo)
    • A 1% increase in BUD sales volume is associated with a 0.148% decrease in its price.
    • A 1% increase in BUSCH sales volume is associated with a 0.122% decrease in its price.
    • A 1% increase in COORS sales volume is associated with a 0.119% decrease in its price.
    • A 1% increase in MILLER sales volume is associated with a 0.119% decrease in its price.
    • A 1% increase in NATURAL sales volume is associated with a 0.040% decrease in its price.
# Slopes (inverse elasticities)
model1_slope = -0.142
model2_slopes = {
    'Bud': -0.146,
    'Busch': -0.159,
    'Coors': -0.146,
    'Miller': -0.163,
    'Natural': -0.094
}
model3_full_slopes = {
    'Bud': -0.140,
    'Busch': -0.161,
    'Coors': -0.148,
    'Miller': -0.163,
    'Natural': -0.103
}
model3_promo_slopes = {
    'Bud': -0.148,
    'Busch': -0.122,
    'Coors': -0.119,
    'Miller': -0.096,
    'Natural': -0.077
}

# Create range of log(sales) values
x = np.linspace(0, 10, 100)

# Set up a 2x2 grid of subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 10), sharex=True, sharey=True)

# Model 1: Common Elasticity
ax = axes[0, 0]
ax.plot(x, model1_slope * x, color='tab:blue', label='All Brands')
ax.set_title('Model 1: Common Elasticity')
ax.set_xlabel('log(sales)')
ax.set_ylabel('log(price)')
ax.grid(True)
ax.legend()

# Model 2: Brand-Specific Elasticity
ax = axes[0, 1]
for brand, slope in model2_slopes.items():
    ax.plot(x, slope * x, label=brand)
ax.set_title('Model 2: Brand-Specific')
ax.set_xlabel('log(sales)')
ax.grid(True)
ax.legend()

# Model 3: Full-Price (No Promo)
ax = axes[1, 0]
for brand, slope in model3_full_slopes.items():
    ax.plot(x, slope * x, label=brand)
ax.set_title('Model 3: Full-Price')
ax.set_xlabel('log(sales)')
ax.set_ylabel('log(price)')
ax.grid(True)
ax.legend()

# Model 3: Promo-Price
ax = axes[1, 1]
for brand, slope in model3_promo_slopes.items():
    ax.plot(x, slope * x, label=brand)
ax.set_title('Model 3: Promotional Price')
ax.set_xlabel('log(sales)')
ax.grid(True)
ax.legend()

plt.tight_layout()
plt.show()

  • Magnitude (|coefficient|): Larger absolute values mean more price sensitivity (higher elasticity).
  • Promotions: For most brands, promotions reduce inverse coefficients (in absolute terms), implying lower sensitivity of price to volume during deals—firms can sell more with smaller price cuts.
  • Brand differences: Miller and Busch tend to be the most elastic brands, while Natural Light is consistently the least elastic (i.e. its drinkers are relatively price‐insensitive, especially on promotion).
    • Given a fixed price increase, Natural Light experiences the largest decline in sales volume.
    • For the same price increase without promotion, Miller Lite and Busch Light exhibit the smallest declines in sales volume.
    • For the same price decrease with promotion, Bud Light shows the smallest increase in sales volume.

Question 7

Residual Plots

residual_plot(dtest_1, "log_price_floz", "Model 1")

residual_plot(dtest_2, "log_price_floz", "Model 2")

residual_plot(dtest_3, "log_price_floz", "Model 3")

Question 8

I prefer Model 3 for three simple reasons:

  1. Realistic Market Setting
  • It captures both the unique sensitivity of each brand and how that sensitivity changes when a beer is on promotion.
  1. Practical Pricing Strategies
  • By distinguishing full-price from promotional periods, it tells you exactly how much to adjust each brand’s price under each scenario.
  1. Better Fit with the lowest MSE on test data
  • Allowing elasticities to vary by brand and promotion status typically explains sales patterns more accurately than the cruder alternatives.

In short, Model 3 reflects realistic market settings with brand heterogeneity and promotion effect, along with the best prediction quality.

Back to top