Midterm Exam I

Spring 2025, DANL 210-01

Published

March 5, 2025

Descriptive Statistics

The distribution of scores for Midterm Exam I is shown below:

The distribution of raw scores for Section 1 in Midterm Exam I is shown below:

The distribution of raw scores for Section 2 in Midterm Exam I is shown below:

The following provides the descriptive statistics for each part of the Midterm Exam I:

Section 1. Python Basics - Multiple Choice Questions (Questions 1-6)

Question 1

What is the output of the following code?

name_lst = ['Alice', 'David', 'Ethan']
print(name_lst[-1][1])

A. a

B. v

C. t

D. h

E. d

Click to Check the Answer!

Question 2

How many times will the following loop execute the print() statement?

a = 2

while True:
  if a == 6:
    print("Number", a, "reached")
    break
  elif a < 6:
    a += 2
    print("Adding 2... Reached", a)
  else:
    a = a - 1
    print("Subtracting 1... Reached", a)

A. 1

B. 2

C. 3

D. 4

E. 5

Click to Check the Answer!

Question 3

What is the output of the following code?

numbers = [2, 3, 4, 11, 5]
i = 0
total = 0

while i < len (numbers) :
  if numbers[i] == 11:
    break
  else:
    total += numbers[i]
    i = i + 1
    
print (total)

A. 0

B. 8

C. 9

D. 10

E. 11

Click to Check the Answer!

Question 4

What is the output of the following code?

data = {"x": 10, "y": 20, "z": 30}
keys = ["y", "a", "x", "z"]
result = 10

for key in keys:
    try:
        if key == "x":
            result = result - data[key] * 2
            break
        elif key == "z":
            result = result + data[key] 
        else:
            result = result - data[key] 
    except KeyError:
        result = result + 10

print(result)

A. 0

B. 10

C. -10

D. 20

E. -20

Click to Check the Answer!

Question 5

What is the output of the following code?

result = []
for i in range(5):
    if i == 2:
        continue
    elif i == 4:
        pass
    result.append(i)

print(result)

A. [0, 1, 3, 4]

B. [0, 1, 2, 3, 4]

C. [0, 1, 3]

D. [0, 1, 3, 4, 5]

E. Error

Click to Check the Answer!

Question 6

result = []
for i in range(0, 5):
  if i ** 2 < 10:
    result.append(i * 2)

Which of the following list comprehensions produces the same output as the above code?

result = [i * 2 for i in range(5) if i * i < 9]

result = [i * 2 for i in range(5) if i in [0, 1, 2, 3, 4]]

result = [i * 2 for i in range(5) if i ** 2 != 32]

result = [i * 2 for i in range(5) if i ** 2 < 10]

result = [i * 2 for i in range(5) if i ** 2 <= 8]

Click to Check the Answer!

Section 2. Pandas Basics – Coding Questions (Questions 7-17)

import pandas as pd
import numpy as np

For this Section, use the above Python libraries and the following four DataFrames from Global Gourmet Selections, which represent the company’s operations:

customers– Contains customer-related information, including unique customer IDs, names, and contact details.
orders– Provides an overview of order details, including relevant transaction information.
order_details– Records individual order line items by linking each product to a specific order.
products– Stores product-related details, such as product IDs, names, and pricing.

Additionally, the categories DataFrame offers metadata on product categories, including unique identifiers, names, and descriptions.

DataFrames

categories = pd.read_csv('https://bcdanl.github.io/data/gourmet-Categories.csv')
customers = pd.read_csv('https://bcdanl.github.io/data/gourmet-Customers2.csv')
orders = pd.read_csv('https://bcdanl.github.io/data/gourmet-Orders2.csv')
order_details = pd.read_csv('https://bcdanl.github.io/data/gourmet-OrderDetails2.csv')
products = pd.read_csv('https://bcdanl.github.io/data/gourmet-Products2.csv')

0. `categories` DataFrame (containing 8 observations)

CategoryID	CategoryName	Description
1	Beverages	Soft drinks coffees teas beers and ales
2	Condiments	Sweet and savory sauces relishes spreads and seasonings
3	Confections	Desserts candies and sweet breads
4	Dairy Products	Cheeses
5	Grains/Cereals	Breads crackers pasta and cereal
6	Meat/Poultry	Prepared meats
7	Produce	Dried fruit and bean curd
8	Seafood	Seaweed and fish

CategoryID (int64): Unique identifier for each product category in the categories DataFrame.
CategoryName (object): The name of the category.
Description (object): A textual description providing details about the category.

1. `customers` DataFrame with its first 15 observations

CustID	CustName	ContactName	City	PostalCode	Country
1	Alfreds Futterkiste	Maria Anders	Berlin	12209	Germany
2	Ana Trujillo Emparedados y helados	Ana Trujillo	Mexico D.F.	NA	Mexico
3	Antonio Moreno Taqueria	Antonio Moreno	Mexico D.F.	NA	Mexico
4	Around the Horn	Thomas Hardy	London	WA1 1DP	UK
5	Berglunds snabbkop	Christina Berglund	Lulea	S-958 22	Sweden
6	Blauer See Delikatessen	Hanna Moos	Mannheim	68306	Germany
7	Blondel pere et fils	Frederique Citeaux	Strasbourg	67000	France
8	Bolido Comidas preparadas	Martin Sommer	Madrid	28023	Spain
9	Bon app’	Laurence Lebihans	Marseille	13008	France
10	Bottom-Dollar Marketse	Elizabeth Lincoln	Tsawassen	T2F 8M4	Canada
11	B’s Beverages	Victoria Ashworth	London	EC2 5NT	UK
12	Cactus Comidas para llevar	Patricio Simpson	Buenos Aires	1010	Argentina
13	Centro comercial Moctezuma	Francisco Chang	Mexico D.F.	5022	Mexico
14	Chop-suey Chinese	Yang Wang	Bern	3012	Switzerland
15	Comercio Mineiro	Pedro Afonso	Sao Paulo	05432-043	Brazil

CustID (int64): Unique identifier for each customer in the customers DataFrame.
CustName (object): The name of the customer’s company.
ContactName (object): The name of the primary contact person at the company.
City (object): The city where the customer is located.
PostalCode (object): The postal or ZIP code of the customer’s address.
Country (object): The country of the customer.

2. `orders` DataFrame with its first 15 observations

OrderID	CustID	EmployeeID	OrderDate
10248	90	5	7/4/96
10249	81	6	7/5/96
10250	34	4	7/8/96
10251	84	3	7/8/96
10252	76	4	7/9/96
10253	34	3	7/10/96
10254	14	5	7/11/96
10255	68	9	7/12/96
10256	88	3	7/15/96
10257	35	4	7/16/96
10258	20	1	7/17/96
10259	13	4	7/18/96
10260	55	4	7/19/96
10261	61	4	7/19/96
10262	65	8	7/22/96

OrderID (int64): Unique identifier for each order in the orders DataFrame.
CustID (int64): Identifier linking to the customer who placed the order in the customers DataFrame.
EmployeeID (int64): Identifier for the employee handling the order.
OrderDate (object): The date when the order was placed.

3. `order_details` DataFrame with its first 15 observations

OrderDetail_ID	OrderID	ProductID	Quantity
1	10248	11	12
2	10248	42	10
3	10248	72	5
4	10249	14	9
5	10249	51	40
6	10250	41	10
7	10250	51	35
8	10250	65	15
9	10251	22	6
10	10251	57	15
11	10251	65	20
12	10252	20	40
13	10252	33	25
14	10252	60	40
15	10253	31	20

OrderDetail_ID (int64): Unique identifier for each record in the order_details DataFrame.
OrderID (int64): Identifier linking each record to a specific order in the orders DataFrame.
ProductID (int64): Identifier linking the record to a product in the products DataFrame.
Quantity (int64): The number of units ordered.

4. `products` DataFrame with its first 15 observations

ProductID	ProductName	CategoryID	Unit	Price
1	Chais	1	10 boxes x 20 bags	18.00
2	Chang	1	24 - 12 oz bottles	19.00
3	Aniseed Syrup	2	12 - 550 ml bottles	10.00
4	Chef Anton’s Cajun Seasoning	2	48 - 6 oz jars	22.00
5	Chef Anton’s Gumbo Mix	2	36 boxes	21.35
6	Grandma’s Boysenberry Spread	2	12 - 8 oz jars	25.00
7	Uncle Bob’s Organic Dried Pears	7	12 - 1 lb pkgs.	30.00
8	Northwoods Cranberry Sauce	2	12 - 12 oz jars	40.00
9	Mishi Kobe Niku	6	18 - 500 g pkgs.	97.00
10	Ikura	8	12 - 200 ml jars	31.00
11	Queso Cabrales	4	1 kg pkg.	21.00
12	Queso Manchego La Pastora	4	10 - 500 g pkgs.	38.00
13	Konbu	8	2 kg box	6.00
14	Tofu	7	40 - 100 g pkgs.	23.25
15	Genen Shouyu	2	24 - 250 ml bottles	15.50

ProductID (int64): Unique identifier for each product in the products DataFrame.
ProductName (object): The name of the product.
CategoryID (int64): Identifier linking the product to a category in the categories DataFrame.
Unit (object): The packaging or standard unit in which the product is sold
Price (float): The cost of a single unit of the product (in USD)

Question 7

Write a three-line code snippet to display the following summaries of the orders DataFrame:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   OrderID     196 non-null    int64 
 1   CustID      196 non-null    int64 
 2   EmployeeID  196 non-null    int64 
 3   OrderDate   196 non-null    object
dtypes: int64(3), object(1)
memory usage: 6.3+ KB

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   OrderID     196 non-null    int64         
 1   CustID      196 non-null    int64         
 2   EmployeeID  196 non-null    int64         
 3   OrderDate   196 non-null    datetime64[ns]
dtypes: datetime64[ns](1), int64(3)
memory usage: 6.3 KB

Click to Check the Answer!

orders.info()
orders['OrderDate'] = orders['OrderDate'].astype('datetime64[ns]')
orders.info()

Question 8

Write a code snippet to produce the following output of descriptive statistics:

       ProductName       Price
count           77   77.000000
unique          77         NaN
top          Chais         NaN
freq             1         NaN
mean           NaN   28.866364
std            NaN   33.815111
min            NaN    2.500000
25%            NaN   13.250000
50%            NaN   19.500000
75%            NaN   33.250000
max            NaN  263.500000

Click to Check the Answer!

products[['ProductName', 'Price']].describe(include="all")

Question 9

Fill in the blank to count the number of observations from the customers DataFrame where the customer’s Country is not USA.

customers[_______________________________________________].shape[0]

Click to Check the Answer!

customers[customers['Country'] != 'USA'].shape[0]

Question 10

Fill in the blank to count the number of observations from the customers DataFrame where the customer’s ContactName starts with B or P.

customers[_______________________________________________]

Click to Check the Answer!

customers[customers['ContactName'].between('B', 'C') | customers['ContactName'].between('P', 'Q') ]

Question 11

Write a code snippet to identify the country or countries with the highest number of missing values in the PostalCode variable of the customers DataFrame.

Click to Check the Answer!

(
  customers[customers['PostalCode'].isna()]['Country']
  .value_counts()
  .reset_index()
  .nlargest(1, 'count', keep = 'all')
)

Question 12

Write a code snippet using the order_details DataFrame to count the number of unique products ordered.

The total number of distinct products ordered is 77.

Click to Check the Answer!

order_details['ProductID'].nunique()

Question 13

Write a code snippet to reshape the order_details DataFrame so that each OrderID is a row and each ProductID becomes a column with values equal to the corresponding Quantity.

     OrderID   1     2   3   4   5   6   7  ...  70  71   72  73    74  75  76  77
0      10248 NaN   NaN NaN NaN NaN NaN NaN  ... NaN NaN  5.0 NaN   NaN NaN NaN NaN
1      10249 NaN   NaN NaN NaN NaN NaN NaN  ... NaN NaN  NaN NaN   NaN NaN NaN NaN
2      10250 NaN   NaN NaN NaN NaN NaN NaN  ... NaN NaN  NaN NaN   NaN NaN NaN NaN
3      10251 NaN   NaN NaN NaN NaN NaN NaN  ... NaN NaN  NaN NaN   NaN NaN NaN NaN
4      10252 NaN   NaN NaN NaN NaN NaN NaN  ... NaN NaN  NaN NaN   NaN NaN NaN NaN
..       ...  ..   ...  ..  ..  ..  ..  ..  ...  ..  ..  ...  ..   ...  ..  ..  ..
191    10439 NaN   NaN NaN NaN NaN NaN NaN  ... NaN NaN  NaN NaN  30.0 NaN NaN NaN
192    10440 NaN  45.0 NaN NaN NaN NaN NaN  ... NaN NaN  NaN NaN   NaN NaN NaN NaN
193    10441 NaN   NaN NaN NaN NaN NaN NaN  ... NaN NaN  NaN NaN   NaN NaN NaN NaN
194    10442 NaN   NaN NaN NaN NaN NaN NaN  ... NaN NaN  NaN NaN   NaN NaN NaN NaN
195    10443 NaN   NaN NaN NaN NaN NaN NaN  ... NaN NaN  NaN NaN   NaN NaN NaN NaN

[196 rows x 78 columns]

The following displays the first 10 rows and the first 6 columns of the resulting DataFrame:

Click to Check the Answer!

(
  order_details
  .pivot(index='OrderID', 
         columns='ProductID', 
         values='Quantity')
  .reset_index()
)

Question 14

Write a code snippet to display the following new DataFrame showing each category along with the number of unique products in the products DataFrame.

CategoryName	count
Confections	13
Beverages	12
Condiments	12
Seafood	12
Dairy Products	10
Grains/Cereals	7
Meat/Poultry	6
Produce	5

Click to Check the Answer!

(
  products
  .merge(categories, on="CategoryID", how="left")[['CategoryName']]
  .value_counts()
  .reset_index()
)

Question 15

Write a code snippet to create the following new DataFrame, ProductID_11, showing the Quantity ordered for ProductID 11. Then, write a one-line Pandas code to compute the total quantity of this product ordered using the ProductID_11 DataFrame.

OrderDetail_ID	OrderID	ProductID	ProductName	Quantity
1	10248	11	Queso Cabrales	12
130	10296	11	Queso Cabrales	12
211	10327	11	Queso Cabrales	50
281	10353	11	Queso Cabrales	12
314	10365	11	Queso Cabrales	24
426	10407	11	Queso Cabrales	30
492	10434	11	Queso Cabrales	6
514	10442	11	Queso Cabrales	30
517	10443	11	Queso Cabrales	6

The total quantity ordered for ProductID 11 is 182.

Note that 182 is calculated as follows: \[ 182 = 12 + 12 + 50 + 12 + 24 + 30 + 6 + 30 + 6 \]

Click to Check the Answer!

ProductID_11 = (
    order_details
    .merge(products, 
           on="ProductID", 
           how="left")
    .query('ProductID == 11')
    [['OrderDetail_ID', 'OrderID', 'ProductID', 'ProductName', 'Quantity']]
)

ProductID_11['Quantity'].sum()

Question 16

Write a code snippet to determine how many unique CustomerID-EmployeeID pairs exist in the orders DataFrame. Then, count how many times each EmployeeID appears in these unique pairs and return the resulting DataFrame as below:

EmployeeID	count
1	23
2	18
3	26
4	31
5	10
6	17
7	14
8	25
9	6

What does this count variable above tell us?

Click to Check the Answer!

(
  orders[['EmployeeID', 'CustID']]
  .value_counts()
  .reset_index()[['EmployeeID']]
  .value_counts()
  .reset_index()
)

# `count` variable tells us the number of customers whose orders are handled by each employee.

Question 17

Write a code snippet to compute the total order cost for customers in Mexico, expressed in Mexican Pesos (MXN).
- Begin by adding a new variable, PriceMXN, to the products DataFrame, which converts the Price to pesos by multiplying it by 20.604.

Total cost of orders made by customers in Mexico: 120771.58 MXN

Click to Check the Answer!

products['PriceMXN'] = products['Price'] * 20.604

q17 = (
  order_details
  .merge(products, on="ProductID", how="left")
  .merge(orders, on="OrderID", how="left")
  .merge(customers, on="CustID", how="left")
  .query('Country == "Mexico"')
)

q17['Cost'] = q17['PriceMXN'] * q17['Quantity']
q17['Cost'].sum()

Descriptive Statistics

Section 1. Python Basics - Multiple Choice Questions (Questions 1-6)

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Section 2. Pandas Basics – Coding Questions (Questions 7-17)

DataFrames

0. categories DataFrame (containing 8 observations)

1. customers DataFrame with its first 15 observations

2. orders DataFrame with its first 15 observations

3. order_details DataFrame with its first 15 observations

4. products DataFrame with its first 15 observations

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Question 16

Question 17

0. `categories` DataFrame (containing 8 observations)

1. `customers` DataFrame with its first 15 observations

2. `orders` DataFrame with its first 15 observations

3. `order_details` DataFrame with its first 15 observations

4. `products` DataFrame with its first 15 observations