import pandas as pd
from pyspark.sql import SparkSession
= pd.read_csv("https://bcdanl.github.io/data/nfl.csv")
df = spark.createDataFrame(df) nfl
Classwork 5
PySpark Basics - Loading, Summarizing, Selecting, Counting, and Sorting Data
Direction
The nfl.csv
file contains a list of players in the National Football League with similar Name
, Team
, Position
, Birthday
, and Salary
variables in the nba.csv
file.
Question 1
- How can we read the nfl.csv file, and assign it to a PySpark
DataFrame
object,nfl
?
Answer:
Question 2
- How many observations are in
nfl
? - What are the mean, standard deviation, minimum, and maximum of
Salary
innfl
?
Answer:
Question 3
- How can we count the number of players per team in
nfl
? - How many unique teams are in
nfl
?
Answer:
Question 4
- What is an effective way to convert the values in its
Birthday
variable todate
?- The format of
Birthday
is “M/d/yy”
- The format of
Answer:
Question 5
- Who are the five highest-paid players?
- Who is the oldest player?
Answer:
Question 6
How can we sort the DataFrame
first by Team
in alphabetical order and then by Salary
in descending order?
Answer:
Question 7
Who is the oldest player on the Kansas City Chiefs roster, and what is his birthday?
Answer:
Question 8
- What is the median of
Salary
innfl
?
Answer:
Discussion
Welcome to our Classwork 5 Discussion Board! 👋
This space is designed for you to engage with your classmates about the material covered in Classwork 5.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 5 materials or need clarification on any points, don’t hesitate to ask here.
All comments will be stored here.
Let’s collaborate and learn from each other!