= pd.read_csv('https://bcdanl.github.io/data/cereal_oatmeal.csv') cereal
Classwork 7
PySpark Basics - Group Operations
Direction
The dataset ,cereals_oatmeal.csv
,(with its pathname https://bcdanl.github.io/data/cereal_oatmeal.csv
) is a listing of 76 popular breakfast cereals and oatmeal.
Use PySpark to solve this classwork.
Question 1
Group the cereal
DataFrame, using the Manufacturer
variable.
Answer:
Question 2
Determine the total number of groups, and the number of cereals per group.
Answer:
Question 3
Extract the cereals that belong to the manufacturer "Kellogg's"
.
Answer:
Question 4
Calculate the average of values in the Calories
, Fiber
, and Sugars
variables for each manufacturer.
Answer:
Question 5
Find the maximum value in the Sugars
variable for each manufacturer.
Answer:
Question 6
Find the minimum value in the Fiber
variable for each manufacturer.
Answer:
Question 7
- Calculate a ‘
Normalized_Sugars
’ variable for each product byManufacturer
, where the normalization formula is
\[ \text{Normalized\_Sugars} = \frac{\text{Sugars} - \text{mean(Sugars)}}{\text{std(Sugars)}} \]
for each Manufacturer
group. This formula adjusts the sugar content of each product by subtracting the mean sugar content of its manufacturer and then dividing by the standard deviation of the sugar content within its manufacturer.
Answer:
Discussion
Welcome to our Classwork 7 Discussion Board! 👋
This space is designed for you to engage with your classmates about the material covered in Classwork 7.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 7 materials or need clarification on any points, don’t hesitate to ask here.
All comments will be stored here.
Let’s collaborate and learn from each other!