pandas
Basics - Mathematical & Vectorized Operations; Adding, Removing, & Renaming Variables; Data Types
February 17, 2025
nba
DataFramenba
:# Below is to import the pandas library as pd
import pandas as pd
# Below is for an interactive display of DataFrame in Colab
from google.colab import data_table
data_table.enable_dataframe_formatter()
# Below is to read nba.csv as nba DataFrame
nba = pd.read_csv("https://bcdanl.github.io/data/nba.csv",
parse_dates = ["Birthday"])
max()
method returns a Series
with the maximum value from each variable.min()
method returns a Series
with the minimum value from each variable.sum()
/mean()
/median()
method returns a Series
with the sum/mean/median of the values in each variable.quantile()
method returns a Series
with the percentile value of the values in each variable (e.g., 25th, 75th, 90th percentile).std()
method returns a Series
with the standard deviation of the values in each variable.True
to the sum()
/mean()
/median()
/std()
method’s numeric_only
parameter.nba["Salary_2x"] = nba["Salary"] + nba["Salary"]
nba["Name_w_Position"] = nba["Name"] + " (" + nba["Position"] + ")"
nba["Salary_minus_Mean"] = nba["Salary"] - nba["Salary"].mean()
pandas
performs a vectorized operation on Series
or a variable in DataFrame
.
drop(columns = ... )
nba.columns
rename( columns = { "Existing One" : "New One" } )
rename()
method renames the variable Date of Birth to Birthday.rename( index = { "Existing One" : "New One" } )
rename()
method renames the observation LeBron James to LeBron Raymone James.astype()
Methodastype()
Methodastype()
methodMgmt
variable?astype()
method converts a Series
’ values to a different data type.
astype()
methodMgmt
variable with our new Series
of Booleans.astype()
methodSalary
variable’s values to integers with the astype()
method.
NaN
values to integers.fillna()
methodfillna()
method replaces a Series
’ missing values with the argument we pass in.0
.
0
is passed solely for the sake of example.astype()
methodSalary
variable with our new Series
of integers.astype()
methodcategory
,
pd.to_datetime()
method# Below two are equivalent:
emp["Start Date"] = pd.to_datetime(emp["Start Date"])
emp["Start Date"] = emp["Start Date"].astype('datetime64[ns]')
pd.to_datetime()
function is used to convert a Series
, DataFrame
, or a single variable of a DataFrame
from its current data type into datetime
format.astype()
methodemp = pd.read_csv("https://bcdanl.github.io/data/employment.csv")
emp["Salary"] = emp["Salary"].fillna(0)
emp = emp.astype({'Mgmt': 'bool',
'Salary': 'int',
'Gender': 'category',
'Start Date': 'datetime64[ns]',
'Team': 'category'})
astype()
.Let’s do Question 1 in Classwork 6!