pandas Basics - Filtering by a Condition
February 19, 2025
We may often not know the index labels and positions of the observations we want to target.
We may want to target observations not by an index label but by a Boolean condition.
Series with a constant value, we place the Series on one side of the equality operator (==) and the value on the other.
Series == valueFirst Name value with “Donna”.
Series.Series between square brackets following the DataFrame.
DataFrame[ Boolean_Series ]Series to an object and then pass it into the square brackets instead.Marketing” team?True denotes that the Team value for a given index is not “Marketing”, and False indicates the Team value is “Marketing”True in the Mgmt variable.emp["Mgmt"] == True, but we do not need to.sales = emp["Team"] == "Sales"
legal = emp["Team"] == "Legal"
fnce = emp["Team"] == "Finance"
emp[ sales | legal | fnce ] # '|' is 'or' opeartorWe could provide three separate Boolean Series inside the square brackets and add the | symbol to declare OR criteria.
What if our next report asked for employees from 30 teams instead of three?
isin() methodstar_teams = ["Sales", "Legal", "Finance"]
on_star_teams = emp["Team"].isin(star_teams)
emp[ on_star_teams ]isin() method, which accepts an iterable (e.g., list, tuple, array, Series) and returns a Boolean Series.higher_than_90k = emp["Salary"] >= 90000
lower_than_100k = emp["Salary"] < 100000
emp[ higher_than_90k & lower_than_100k ] # '&' is 'and' opeartorWe can create two Boolean Series, one to declare the lower bound and one to declare the upper bound.
Then we can use the & operator to mandate that both conditions are True.
between() methodbetween().
True denotes that an observation’s value falls between the specified interval.between() methodbetween() method to string variables.
query() method!emp.query("Salary >= 100000 & Team == 'Finance'")
emp.query("Salary >= 100000 & `First Name` == 'Douglas'")query() method filters observations using a concise, string-based query syntax.
query() accepts a string value that describes filtering conditions.query() method, if we have variable names with spaces, we can wrap the variable names in backtick (`)np.where()np.where from NumPy to add a new variable to a DataFrame based on a condition.import numpy as np
# Using np.where to add the 'pass_fail' column
emp['high_salary'] = np.where(emp['Salary'] >= 100000, 'Yes', 'No')high_salary:
score is greater than or equal to 100,000.Let’s do Questions 2-6 in Classwork 6!