Lecture 7

Pandas Fundamentals II: Sorting, Indexing, and Locating Data

Byeong-Hak Choe

SUNY Geneseo

March 30, 2026

πŸ€ nba DataFrame

  • Let’s read the nba.csv file as nba:
import pandas as pd
import numpy as np

# Below is for an interactive display of DataFrame in Colab
from google.colab import data_table
data_table.enable_dataframe_formatter()

# Below is to read nba.csv as nba DataFrame
nba = pd.read_csv("https://bcdanl.github.io/data/nba.csv",
                  parse_dates = ["Birthday"])

πŸ”’ Sorting Methods

πŸ”€ Sorting by One Column with sort_values()

# The two lines below are equivalent
nba.sort_values(["Name"])
nba.sort_values(by = ["Name"])
  • The sort_values() method’s first parameter, by, specifies the column(s) pandas should use to sort the rows.

πŸ”½ Sorting in Descending Order with sort_values(ascending = False)

nba.sort_values(["Name"], ascending = False)
  • The ascending argument controls the sort direction.
    • Its default value is True.
    • By default, pandas sorts:
      • numeric columns in increasing order;
      • string columns in alphabetical order;
      • datetime columns in chronological order.

πŸ”— Pandas Fundamentals: Method Chaining

(
    nba
    .sort_values(['Salary'])
    .head(5)
)
  • A DataFrame has many useful methods that we can combine in sequence.
    • Method chaining lets us apply multiple steps without saving intermediate objects.

πŸ”ΌπŸ”½πŸ” Finding Extremes with nsmallest() and nlargest()

nba.nsmallest(5, 'Salary')
nba.nlargest(5, 'Salary')
  • nsmallest() is useful for getting the first n rows ordered by a column in ascending order.

  • nlargest() is useful for getting the first n rows ordered by a column in descending order.

🧺 nsmallest() and nlargest() with keep = "all"

nba.nsmallest(4, 'Salary', keep = "all")
nba.nlargest(4, 'Salary', keep = "all")
  • keep = "all" keeps all tied values, even if that means returning more than n rows.

πŸ”€πŸ”’ Sorting by Multiple Columns with sort_values()

nba.sort_values(["Team", "Name"])
nba.sort_values(by = ["Team", "Name"])
  • We can sort a DataFrame by multiple columns by passing a list to the by argument.

πŸ”½ Sorting Multiple Columns with One ascending Value

nba.sort_values(by = ["Team", "Name"], 
                ascending = False)
  • We can pass a single Boolean to ascending to apply the same sort direction to every column in by.

πŸ”½πŸ”ΌοΈ Sorting Multiple Columns with Different Orders

nba.sort_values(by = ["Team", "Name"], 
                ascending = [False, True])
  • If we want different sort directions for different columns, we can pass a Boolean list to ascending.

  • Q. Which players on each team are paid the most?

🧾 Sorting by Row Index with sort_index()

# Below lines are equivalent
nba.sort_index()
nba.sort_index(ascending = True)
nba.sort_index(ascending = False)
  • Suppose we sorted nba by "Name" and assigned the result back to nba. How could we restore the original row order?
    • The nba DataFrame still keeps its numeric index labels.
    • sort_index() sorts rows by their index labels.

πŸ—‚οΈ Indexing Methods

🏷️ Setting a New Index

  • We can use set_index() when we want to replace the current index of a DataFrame with one or more existing columns.
    • This is especially useful when:
      • a column uniquely identifies each row (for example, an ID);
      • we want to use that identifier as the index for easier data manipulation.

🏷️ Setting a New Index with set_index()

# The two lines below are equivalent
nba.set_index(keys = "Name")
nba.set_index("Name")
  • set_index() returns a new DataFrame in which the chosen column becomes the index.
    • Its first parameter, keys, takes the column name.

♻️ Resetting an Index with reset_index()

nba2 = nba.set_index("Name")
nba2.reset_index(inplace=True) 
  • We use reset_index():
    • when we want to turn the index back into a regular DataFrame column;
    • when we want to restore the default integer index.
  • Note: With inplace=True, the operation modifies the original DataFrame directly.

πŸ“ Locating Observations

πŸ”Ž Locating Observations or Values

  • We can extract rows, columns, and individual values from a DataFrame by using the loc[] and iloc[] accessors.

    • These accessors work especially well when we know the index labels or positions we want to target.

🏷️ Locating Rows with .loc[Index Labels]

  • Let’s use nba with Name as the index.
# The two lines below are equivalent
nba = nba.set_index("Name")
nba.set_index("Name", inplace = True)
  • Below extracts observations:
nba.loc[ "LeBron James" ]
nba.loc[ ["Kawhi Leonard", "Paul George"] ]
  • .loc extracts rows by index label.

βœ‚οΈ Slicing Rows with .loc[:] 1

(
    nba
    .sort_index()
    .loc["Otto Porter":"Patrick Beverley"]
)
  • What is the code above doing?
    • Note: With .loc, both the starting label and ending label are inclusive.

βœ‚οΈ Slicing Rows with .loc[:] 2

(
    nba
    .sort_index()
    .loc["Zach Collins":]
)
(
    nba
    .sort_index()
    .loc[:"Al Horford"]
)
  • We can use loc[:] to pull rows:
    • from the middle of the DataFrame to the end;
    • from the beginning of the DataFrame to a specific index label.

πŸ”’ Locating Rows with .iloc[Index Positions]

nba.iloc[ 300 ]
nba.iloc[ [100, 200, 300, 400] ]
nba.iloc[400:404]
nba.iloc[:2]
nba.iloc[447:]
nba.iloc[-10:-6]
nba.iloc[0:10:2] # every other rows
  • .iloc (index location) locates rows by their index position.
    • This is useful when row positions matter.
    • We pass integers.
  • .iloc[:] follows regular Python slicing rules.
    • The ending position is NOT inclusive.

🎯 Locating Values

🧭 Locating Values with loc[Rows, Columns]

nba.loc[
    "LeBron James",
    "Team"
]

nba.loc[
     "James Harden", 
      ["Position", "Birthday"] 
]
nba.loc[
    ["Russell Westbrook", "Anthony Davis"],
     ["Team", "Salary"]
]

nba.loc[
    "Joel Embiid", 
    "Position":"Salary"
]
  • Both .loc and .iloc accept a second argument for the column(s) to extract.
    • With .loc, we provide column names.

🧭 Locating Values with loc[Rows, Columns] or iloc[Rows, Columns] with Integers

nba.iloc[
    57, 
    3
]

nba.iloc[
    100:104, 
    :3
]
  • Both .loc and .iloc accept a second argument for the column(s) to extract.
    • With .iloc, we provide column positions.

πŸš€ Classwork 11: Pandas Fundamentals

Let’s work on Classwork 11!