import pandas as pd
= pd.read_csv("https://bcdanl.github.io/data/fao_stat.csv",
fao = 'ISO-8859-1') encoding
Homework 5
Dealing with Missing Values; Group Operations
Direction
Please submit your Jupyter Notebook for Homework 5 to the Brightspace with the name below:
danl-m1-hw5-LASTNAME-FIRSTNAME.ipynb
( e.g.,danl-m1-hw5-choe-byeonghak.ipynb
)
The due is March 12, 2024, 7:00 P.M.
Please send Byeong-Hak an email (
bchoe@geneseo.edu
) if you have any questions.Please prepare a Jupyter/Python Notebook (
*.ipynb
) to address all questions.Make at least some simple comment (
# ...
) in each question.Make one text cell to explain things in each question.
Import the pandas
library, and read fao_stat.csv
as fao
:
Variable Description
The fao
DataFrame
contains the country-year level observation regarding variables below:
SSA
: A boolean indicating if the country is in Sub-Saharan Africa.Area
: The name of the country.Year
: The year of observation.gdp_per_capita
: GDP per capita.drinking_water
: The percentage of the population with access to safe drinking water.sanitation_service
: The percentage of the population with access to improved sanitation services.children_stunted
: The percentage of children under 5 years old who are stunted.children_overweight
: The percentage of children under 5 years old who are overweight.investment_pct
: The percentage of GDP invested in public health.
Question 1
What percentage of the values is missing for each variable?
Answer
Question 2
Fill missing values in the gdp_per_capita
variable with the mean value of that variable.
Answer
Question 3
Drop observations where drinking_water
or sanitation_service
information is missing.
Answer
Question 4
What is the average drinking_water
access percentage for each Area
grouped by SSA
status?
Hint: We can group a DataFrame
by a list of multiple variables. Then, each group corresponds to a unique combination of values across the specified variables.
(
fao'VARIABLE_1', 'VARIABLE_2'])
.groupby([ )
Answer
Question 5
Calculate the mean sanitation_service
percentage for each combination of SSA
status and Year
.
Answer
Question 6
For each year, find the 5 worst countries in terms of drinking_water
.
- Hint: It would be a good idea to start with the
DataFrame.sort_values()
method. - Note:
DataFrameGroupBy
does not support thesort_values()
method. That is,DataFrameGroupBy.sort_values()
will result in an error.
Answer
Question 7
For each year, find the 5 worst countries in terms of children_stunted
.
Answer