pandas Basics - Mathematical & Vectorized Operations; Adding, Removing, & Renaming Variables; Data Types
February 17, 2025
nba DataFramenba:# Below is to import the pandas library as pd
import pandas as pd
# Below is for an interactive display of DataFrame in Colab
from google.colab import data_table
data_table.enable_dataframe_formatter()
# Below is to read nba.csv as nba DataFrame
nba = pd.read_csv("https://bcdanl.github.io/data/nba.csv",
parse_dates = ["Birthday"])max() method returns a Series with the maximum value from each variable.min() method returns a Series with the minimum value from each variable.sum()/mean()/median() method returns a Series with the sum/mean/median of the values in each variable.quantile() method returns a Series with the percentile value of the values in each variable (e.g., 25th, 75th, 90th percentile).std() method returns a Series with the standard deviation of the values in each variable.True to the sum()/mean()/median()/std() method’s numeric_only parameter.nba["Salary_2x"] = nba["Salary"] + nba["Salary"]
nba["Name_w_Position"] = nba["Name"] + " (" + nba["Position"] + ")"
nba["Salary_minus_Mean"] = nba["Salary"] - nba["Salary"].mean()pandas performs a vectorized operation on Series or a variable in DataFrame.
drop(columns = ... )nba.columnsrename( columns = { "Existing One" : "New One" } )rename() method renames the variable Date of Birth to Birthday.rename( index = { "Existing One" : "New One" } )rename() method renames the observation LeBron James to LeBron Raymone James.astype() Methodastype() Methodastype() methodMgmt variable?astype() method converts a Series’ values to a different data type.
astype() methodMgmt variable with our new Series of Booleans.astype() methodSalary variable’s values to integers with the astype() method.
NaN values to integers.fillna() methodfillna() method replaces a Series’ missing values with the argument we pass in.0.
0 is passed solely for the sake of example.astype() methodSalary variable with our new Series of integers.astype() methodcategory,
pd.to_datetime() method# Below two are equivalent:
emp["Start Date"] = pd.to_datetime(emp["Start Date"])
emp["Start Date"] = emp["Start Date"].astype('datetime64[ns]')pd.to_datetime() function is used to convert a Series, DataFrame, or a single variable of a DataFrame from its current data type into datetime format.astype() methodemp = pd.read_csv("https://bcdanl.github.io/data/employment.csv")
emp["Salary"] = emp["Salary"].fillna(0)
emp = emp.astype({'Mgmt': 'bool',
'Salary': 'int',
'Gender': 'category',
'Start Date': 'datetime64[ns]',
'Team': 'category'})astype().Let’s do Question 1 in Classwork 6!