import pandas as pd
Homework 4
Pandas Basics - Sorting and Filtering Data
Direction
Please submit your Jupyter Notebook for Homework 4 to the Brightspace with the name below:
danl-m1-hw4-LASTNAME-FIRSTNAME.ipynb
( e.g.,danl-m1-hw4-choe-byeonghak.ipynb
)
The due is March 5, 2024, 7:00 P.M.
Please send Byeong-Hak an email (
bchoe@geneseo.edu
) if you have any questions.Please prepare a Jupyter/Python Notebook (
*.ipynb
) to address all questions.Make at least some simple comment (
# ...
) in each question.Make one text cell to explain things in each question.
Load Libraries
Import the pandas
library.
Part 1
Below is nyc_payroll
DataFrame
that reads the file nyc_payroll.csv
containing data of how the New York City’s budget is being spent on salary and overtime pay for all municipal employees (Source: NYC OpenData).
= pd.read_csv('https://bcdanl.github.io/data/nyc_payroll_2022_2023.csv') nyc_payroll
Variable Description
Fiscal Year
: Fiscal Year;Payroll Number
: Payroll Number;Agency Name
: The Payroll agency that the employee works for;Last Name
: Last name of employee;First Name
: First name of employee;Mid Init
: Middle initial of employee;Agency Start Date
: Date which employee began working for their current agency;Work Location Borough
: Borough of employee’s primary work location;Title Description
: Civil service title description of the employee;Leave Status as of June 30
: Status of employee as of the close of the relevant fiscal year;Base Salary
: Base Salary assigned to the employee;Pay Basis
: Lists whether the employee is paid on an hourly, per diem or annual basis;Regular Hours
: Number of regular hours employee worked in the fiscal year;Regular Gross Paid
: The amount paid to the employee for base salary during the fiscal year;OT Hours
: Overtime Hours worked by employee in the fiscal year;Total OT Paid
: Total overtime pay paid to the employee in the fiscal year;Total Other Pay
: Includes any compensation in addition to gross salary and overtime pay, ie Differentials, lump sums, uniform allowance, meal allowance, retroactive pay increases, settlement amounts, and bonus pay, if applicable.
Question 1
- How many employees have a “Base Salary” within the top 10% of the DataFrame?
- Hint: The
.quantile()
method can be useful.
- Hint: The
Answer
Question 2
Filter the DataFrame for employees who have “OT Hours” greater than 0 but less than 100, and their “Leave Status as of June 30” is “ACTIVE”.
Answer
Question 3
Find the unique job titles in the “DEPARTMENT OF EDUCATION ADMIN” agency and count how many there are.
Answer
Question 4
- Identify the employee(s) with the highest “Total OT Paid” in the DataFrame.
- Include their “First Name”, “Last Name”, and “Total OT Paid”.
Answer
Part 2
Below is spotify
DataFrame
that reads the file spotify_all.csv
containing data of Spotify users’ playlist information (Source: Spotify Million Playlist Dataset Challenge).
= pd.read_csv('https://bcdanl.github.io/data/spotify_all.csv') spotify
Variable Description
pid
: playlist ID; unique ID for playlistplaylist_name
: a name of playlistpos
: a position of the track within a playlist (starting from 0)artist_name
: name of the track’s primary artisttrack_name
: name of the trackduration_ms
: duration of the track in millisecondsalbum_name
: name of the track’s album
Question 5
- Find the five most popular song.
- A value of a song is defined as a combination of a
artist_name
value and atrack_name
value. - Who are artists for those five most popular song?
- A value of a song is defined as a combination of a
Answer
Question 6
- Create a DataFrame that only contains observations from playlists featuring the song “One Dance” by Drake.
- Next, identify the song with the highest popularity after Drake’s “One Dance” in the newly created DataFrame.
Answer