Homework 3

Pandas Basics - Sorting, Indexing, and Locating Data

Author

Byeong-Hak Choe

Published

February 20, 2024

Modified

February 27, 2024

Direction

  • Please submit your Jupyter Notebook for Homework 3 to the Brightspace with the name below:

    • danl-m1-hw3-LASTNAME-FIRSTNAME.ipynb
      ( e.g., danl-m1-hw3-choe-byeonghak.ipynb )
  • The due is February 27, 2024, 7:00 P.M.

  • Please send Byeong-Hak an email (bchoe@geneseo.edu) if you have any questions.

  • Please prepare a Jupyter/Python Notebook (*.ipynb) to address all questions.

  • Make at least some simple comment (# ...) in each question.

  • Make one text cell to explain things in each question.

Load Libraries

Import the pandas library.

import pandas as pd

Below is nyc_payroll DataFrame that reads the file nyc_payroll.csv containing data of how the New York City’s budget is being spent on salary and overtime pay for all municipal employees (Source: NYC OpenData).

nyc_payroll = pd.read_csv('https://bcdanl.github.io/data/nyc_payroll_2022_2023.csv')

Variable Description

  • Fiscal Year: Fiscal Year;
  • Payroll Number: Payroll Number;
  • Agency Name: The Payroll agency that the employee works for;
  • Last Name: Last name of employee;
  • First Name: First name of employee;
  • Mid Init: Middle initial of employee;
  • Agency Start Date: Date which employee began working for their current agency;
  • Work Location Borough: Borough of employee’s primary work location;
  • Title Description: Civil service title description of the employee;
  • Leave Status as of June 30: Status of employee as of the close of the relevant fiscal year;
  • Base Salary: Base Salary assigned to the employee;
  • Pay Basis: Lists whether the employee is paid on an hourly, per diem or annual basis;
  • Regular Hours: Number of regular hours employee worked in the fiscal year;
  • Regular Gross Paid: The amount paid to the employee for base salary during the fiscal year;
  • OT Hours: Overtime Hours worked by employee in the fiscal year;
  • Total OT Paid: Total overtime pay paid to the employee in the fiscal year;
  • Total Other Pay: Includes any compensation in addition to gross salary and overtime pay, ie Differentials, lump sums, uniform allowance, meal allowance, retroactive pay increases, settlement amounts, and bonus pay, if applicable.

Question 1

Select “First Name”, “Last Name”, “Base Salary”, and “Total OT Paid”, then sort these selected variables by “Base Salary” in descending order and display the top 10 entries.

Answer



Question 2

Create a new DataFrame that includes employees from the “DEPARTMENT OF EDUCATION ADMIN” agency where the variables are “First Name”, “Last Name”, “Title Description”, “Base Salary”, and “Total OT Paid”. Additionally, include a new variable “Total Compensation” which is the sum of “Base Salary” and “Total OT Paid”.

Answer



Question 3

Using set_index(), change the DataFrame’s index to “Last Name”, then locate the data for a specific last name, say “BROWN”, and display their “Agency Name”, “Base Salary”, and “Total OT Paid”.

Answer



Question 4

Find the 5 employees with the highest “Regular Gross Paid” and calculate their average “OT Hours”. Also, reset the index if you have changed it previously.

Answer



Question 5

Sort the DataFrame by “Fiscal Year” and “Total Other Pay” in descending order, then set “First Name” as the index and use the loc accessor to retrieve the “Total Other Pay” for a specific first name, say “MICHAEL”.

Answer



Question 6

  • Select employees who have “OT Hours” greater than 0, calculate their “OT Rate” (Total OT Paid / OT Hours), and then find the employee with the highest “OT Rate”. Display their full name and OT Rate.
    • Hint: Start with filtering observations in nyc_payroll with the following lines:
# Boolean Series of whether or not 'OT Hours' is greater than 0.
pos_OT_hours = nyc_payroll['OT Hours'] > 0 

# Keeping only the observations with pos_OT_hours == True
with_ot = nyc_payroll[ pos_OT_hours ] 

Answer



Question 7

Sort the DataFrame first by “Work Location Borough” alphabetically, and then by “Total Compensation” (sum of “Base Salary” and “Total OT Paid”) in descending order within each borough.

Answer



Back to top