tidyverse
For the questions in the R section, consider the data.frame nyc_payroll_new. For detailed descriptions of the variables in this data.frame, please refer to the following link: Citywide Payroll Data (Fiscal Year).
nyc_payroll_new
library(tidyverse) library(skimr) nyc_payroll_new <- read_csv("https://bcdanl.github.io/data/nyc_payroll_2024.csv")
1. How can you filter the data.frame nyc_payroll_new to calculate descriptive statistics (mean and standard deviation) of Base_Salary for workers in the Work_Location_Borough "MANHATTAN"? Similarly, how can you filter the data.frame nyc_payroll_new to calculate these statistics for workers in the Work_Location_Borough "QUEENS"?
Base_Salary
Work_Location_Borough
Provide the R code for performing these calculations and then report the mean and standard deviation of Base_Salary for workers in both "MANHATTAN" and "QUEENS".
2. How can you filter the data.frame nyc_payroll_new to show only the records where the Base_Salary is greater than or equal to $100,000?
Complete the code by filling in the blank:
nyc_payroll_new |> filter(__BLANK__)
3. How can you select only distinct combinations of Agency_Name and Title_Description?
Agency_Name
Title_Description
nyc_payroll_new |> __BLANK__
4. How would you arrange the data by Regular_Gross_Paid in descending order, showing the highest paid employees first?
Regular_Gross_Paid
5. How can you select and rename the Title_Description variable to Title?
Title
nyc_payroll_new |> rename(__BLANK__)
6. How can you filter the data to show only records for the “POLICE DEPARTMENT” Agency_Name and arrange it by Total_OT_Paid in ascending order?
Total_OT_Paid
Complete the code by filling in the blanks:
nyc_payroll_new |> filter(__BLANK 1__) |> arrange(__BLANK 2__)
7. How can you filter the data to include only those records where the Pay_Basis is “per Annum” and then select only the First_Name, Last_Name, and Base_Salary variables?
Pay_Basis
First_Name
Last_Name
nyc_payroll_new |> filter(__BLANK 1__) |> select(__BLANK 2__, __BLANK 3__, __BLANK 4__)
8. How would you arrange the data.frame by Work_Location_Borough in ascending order and Base_Salary in descending order?
9. How can you filter the nyc_payroll_new data.frame to remove observations where the Base_Salary variable has NA values? After filtering, how would you calculate the total number of remaining observations?
NA
Provide the R code needed to perform these data extraction tasks.