pandas
Basics - Filtering by a Condition
February 19, 2025
We may often not know the index labels and positions of the observations we want to target.
We may want to target observations not by an index label but by a Boolean condition.
Series
with a constant value, we place the Series
on one side of the equality operator (==
) and the value on the other.
Series == value
First Name
value with “Donna”.
Series
.Series
between square brackets following the DataFrame
.
DataFrame[ Boolean_Series ]
Series
to an object and then pass it into the square brackets instead.Marketing
” team?True
denotes that the Team
value for a given index is not “Marketing
”, and False
indicates the Team
value is “Marketing
”True
in the Mgmt
variable.emp["Mgmt"] == True
, but we do not need to.sales = emp["Team"] == "Sales"
legal = emp["Team"] == "Legal"
fnce = emp["Team"] == "Finance"
emp[ sales | legal | fnce ] # '|' is 'or' opeartor
We could provide three separate Boolean Series
inside the square brackets and add the |
symbol to declare OR
criteria.
What if our next report asked for employees from 30 teams instead of three?
isin()
methodstar_teams = ["Sales", "Legal", "Finance"]
on_star_teams = emp["Team"].isin(star_teams)
emp[ on_star_teams ]
isin()
method, which accepts an iterable (e.g., list
, tuple
, array
, Series
) and returns a Boolean Series
.higher_than_90k = emp["Salary"] >= 90000
lower_than_100k = emp["Salary"] < 100000
emp[ higher_than_90k & lower_than_100k ] # '&' is 'and' opeartor
We can create two Boolean Series
, one to declare the lower bound and one to declare the upper bound.
Then we can use the &
operator to mandate that both conditions are True
.
between()
methodbetween()
.
True
denotes that an observation’s value falls between the specified interval.between()
methodbetween()
method to string variables.
query()
method!emp.query("Salary >= 100000 & Team == 'Finance'")
emp.query("Salary >= 100000 & `First Name` == 'Douglas'")
query()
method filters observations using a concise, string-based query syntax.
query()
accepts a string value that describes filtering conditions.query()
method, if we have variable names with spaces, we can wrap the variable names in backtick (`)np.where()
np.where
from NumPy to add a new variable to a DataFrame based on a condition.import numpy as np
# Using np.where to add the 'pass_fail' column
emp['high_salary'] = np.where(emp['Salary'] >= 100000, 'Yes', 'No')
high_salary
:
score
is greater than or equal to 100,000.Let’s do Questions 2-6 in Classwork 6!