import pandas as pd
Homework 3
Pandas Basics - Sorting, Indexing, and Locating Data
Direction
Please submit your Jupyter Notebook for Homework 3 to the Brightspace with the name below:
danl-m1-hw3-LASTNAME-FIRSTNAME.ipynb
( e.g.,danl-m1-hw3-choe-byeonghak.ipynb
)
The due is February 27, 2024, 7:00 P.M.
Please send Byeong-Hak an email (
bchoe@geneseo.edu
) if you have any questions.Please prepare a Jupyter/Python Notebook (
*.ipynb
) to address all questions.Make at least some simple comment (
# ...
) in each question.Make one text cell to explain things in each question.
Load Libraries
Import the pandas
library.
Below is nyc_payroll
DataFrame
that reads the file nyc_payroll.csv
containing data of how the New York City’s budget is being spent on salary and overtime pay for all municipal employees (Source: NYC OpenData).
= pd.read_csv('https://bcdanl.github.io/data/nyc_payroll_2022_2023.csv') nyc_payroll
Variable Description
Fiscal Year
: Fiscal Year;Payroll Number
: Payroll Number;Agency Name
: The Payroll agency that the employee works for;Last Name
: Last name of employee;First Name
: First name of employee;Mid Init
: Middle initial of employee;Agency Start Date
: Date which employee began working for their current agency;Work Location Borough
: Borough of employee’s primary work location;Title Description
: Civil service title description of the employee;Leave Status as of June 30
: Status of employee as of the close of the relevant fiscal year;Base Salary
: Base Salary assigned to the employee;Pay Basis
: Lists whether the employee is paid on an hourly, per diem or annual basis;Regular Hours
: Number of regular hours employee worked in the fiscal year;Regular Gross Paid
: The amount paid to the employee for base salary during the fiscal year;OT Hours
: Overtime Hours worked by employee in the fiscal year;Total OT Paid
: Total overtime pay paid to the employee in the fiscal year;Total Other Pay
: Includes any compensation in addition to gross salary and overtime pay, ie Differentials, lump sums, uniform allowance, meal allowance, retroactive pay increases, settlement amounts, and bonus pay, if applicable.
Question 1
Select “First Name”, “Last Name”, “Base Salary”, and “Total OT Paid”, then sort these selected variables by “Base Salary” in descending order and display the top 10 entries.
Answer
Question 2
Create a new DataFrame that includes employees from the “DEPARTMENT OF EDUCATION ADMIN” agency where the variables are “First Name”, “Last Name”, “Title Description”, “Base Salary”, and “Total OT Paid”. Additionally, include a new variable “Total Compensation” which is the sum of “Base Salary” and “Total OT Paid”.
Answer
Question 3
Using set_index()
, change the DataFrame’s index to “Last Name”, then locate the data for a specific last name, say “BROWN”, and display their “Agency Name”, “Base Salary”, and “Total OT Paid”.
Answer
Question 4
Find the 5 employees with the highest “Regular Gross Paid” and calculate their average “OT Hours”. Also, reset the index if you have changed it previously.
Answer
Question 5
Sort the DataFrame by “Fiscal Year” and “Total Other Pay” in descending order, then set “First Name” as the index and use the loc accessor to retrieve the “Total Other Pay” for a specific first name, say “MICHAEL”.
Answer
Question 6
- Select employees who have “OT Hours” greater than 0, calculate their “OT Rate” (Total OT Paid / OT Hours), and then find the employee with the highest “OT Rate”. Display their full name and OT Rate.
- Hint: Start with filtering observations in
nyc_payroll
with the following lines:
- Hint: Start with filtering observations in
# Boolean Series of whether or not 'OT Hours' is greater than 0.
= nyc_payroll['OT Hours'] > 0
pos_OT_hours
# Keeping only the observations with pos_OT_hours == True
= nyc_payroll[ pos_OT_hours ] with_ot
Answer
Question 7
Sort the DataFrame first by “Work Location Borough” alphabetically, and then by “Total Compensation” (sum of “Base Salary” and “Total OT Paid”) in descending order within each borough.
Answer