Pandas Fundamentals III: Data Types; Filtering by a Condition
April 3, 2026
emp DataFrameastype() MethodMgmt Before ConversionMgmt variable?astype() method converts a Seriesβ values to a different data type.
Mgmt to BooleanMgmt variable with our new Series of Booleans.Salary Cannot Yet Become IntegerSalary variable to integers with the astype() method.
NaN values to integers.fillna() Methodfillna() method replaces a Seriesβ missing values with a value that we specify.0 only to illustrate the method.
0 may distort the data.Salary After Filling Missing ValuesSalary variable with our new Series of integers.Gender to Categorycategory.
pd.to_datetime() Method# Below two are equivalent:
emp["Start Date"] = pd.to_datetime(emp["Start Date"])
emp["Start Date"] = emp["Start Date"].astype('datetime64[ns]')pd.to_datetime() function converts a Series, a DataFrame, or a single column into a proper datetime format.emp = pd.read_csv("https://bcdanl.github.io/data/employment.csv")
emp.info()
emp["Salary"] = emp["Salary"].fillna(0)
emp = emp.astype({'Mgmt': 'bool',
'Salary': 'int',
'Gender': 'category',
'Start Date': 'datetime64[ns]',
'Team': 'category'})astype().We often do not know the exact index labels or positions of the observations we want to extract.
Instead, we can target observations by a Boolean condition.
Series with a constant value, we place the Series on one side of the equality operator (==) and the value on the other.
Series == valueFirst Name value with βDonnaβ.
Series.Series between square brackets following the DataFrame.
DataFrame[ Boolean_Series ]Series to an object and then pass it into the square brackets instead.Marketingβ team?True denotes that the Team value for a given index is not βMarketingβ, and False indicates the Team value is βMarketingβMgmt Is TrueTrue in the Mgmt variable.emp["Mgmt"] == True, but we do not need to.|sales = emp["Team"] == "Sales"
legal = emp["Team"] == "Legal"
fnce = emp["Team"] == "Finance"
emp[ sales | legal | fnce ] # '|' means 'or'We could provide three separate Boolean Series inside the square brackets and add the | symbol to declare OR criteria.
What if our next report asked for employees from 30 teams instead of three?
isin() Methodstar_teams = ["Sales", "Legal", "Finance"]
on_star_teams = emp["Team"].isin(star_teams)
emp[ on_star_teams ]isin() method, which accepts an iterable such as a list, tuple, array, or Series and returns a Boolean Series.&higher_than_90k = emp["Salary"] >= 90000
lower_than_100k = emp["Salary"] < 100000
emp[ higher_than_90k & lower_than_100k ] # '&' means 'and'We can create two Boolean Series, one to declare the lower bound and one to declare the upper bound.
Then we can use the & operator to mandate that both conditions are True.
between()between().
True denotes that an observationβs value falls between the specified interval.between() on Stringsbetween() method to string variables.
query() method!emp.query("Salary >= 100000 & Team == 'Finance'")
emp.query("Salary >= 100000 & `First Name` == 'Douglas'")query() method filters observations using a concise, string-based syntax.
query() accepts a string that describes the filtering condition(s).`).np.where()np.where from NumPy to add a new variable to a DataFrame based on a condition.import numpy as np
# Using np.where to add the 'pass_fail' column
emp['high_salary'] = np.where(emp['Salary'] >= 100000, 'Yes', 'No')high_salary:
Salary is greater than or equal to 100,000.Letβs do Classwork 12!