Lecture 7

Pandas Fundamentals II: Sorting, Indexing, and Locating Data

Byeong-Hak Choe

SUNY Geneseo

March 30, 2026

🏀 `nba` DataFrame

Let’s read the nba.csv file as nba:

import pandas as pd
import numpy as np

# Below is for an interactive display of DataFrame in Colab
from google.colab import data_table
data_table.enable_dataframe_formatter()

# Below is to read nba.csv as nba DataFrame
nba = pd.read_csv("https://bcdanl.github.io/data/nba.csv",
                  parse_dates = ["Birthday"])

🔢 Sorting Methods

🔤 Sorting by One Column with `sort_values()`

# The two lines below are equivalent
nba.sort_values(["Name"])
nba.sort_values(by = ["Name"])

The sort_values() method’s first parameter, by, specifies the column(s) pandas should use to sort the rows.

🔽 Sorting in Descending Order with `sort_values(ascending = False)`

nba.sort_values(["Name"], ascending = False)

The ascending argument controls the sort direction.
- Its default value is True.
- By default, pandas sorts:
  - numeric columns in increasing order;
  - string columns in alphabetical order;
  - datetime columns in chronological order.

🔗 Pandas Fundamentals: Method Chaining

(
    nba
    .sort_values(['Salary'])
    .head(5)
)

A DataFrame has many useful methods that we can combine in sequence.
- Method chaining lets us apply multiple steps without saving intermediate objects.

🔼🔽🔍 Finding Extremes with `nsmallest()` and `nlargest()`

nba.nsmallest(5, 'Salary')
nba.nlargest(5, 'Salary')

nsmallest() is useful for getting the first n rows ordered by a column in ascending order.
nlargest() is useful for getting the first n rows ordered by a column in descending order.

🧺 `nsmallest()` and `nlargest()` with `keep = "all"`

nba.nsmallest(4, 'Salary', keep = "all")
nba.nlargest(4, 'Salary', keep = "all")

keep = "all" keeps all tied values, even if that means returning more than n rows.

🔤🔢 Sorting by Multiple Columns with `sort_values()`

nba.sort_values(["Team", "Name"])
nba.sort_values(by = ["Team", "Name"])

We can sort a DataFrame by multiple columns by passing a list to the by argument.

🔽 Sorting Multiple Columns with One `ascending` Value

nba.sort_values(by = ["Team", "Name"], 
                ascending = False)

We can pass a single Boolean to ascending to apply the same sort direction to every column in by.

🔽🔼️ Sorting Multiple Columns with Different Orders

nba.sort_values(by = ["Team", "Name"], 
                ascending = [False, True])

If we want different sort directions for different columns, we can pass a Boolean list to ascending.
Q. Which players on each team are paid the most?

🧾 Sorting by Row Index with `sort_index()`

# Below lines are equivalent
nba.sort_index()
nba.sort_index(ascending = True)

nba.sort_index(ascending = False)

Suppose we sorted nba by "Name" and assigned the result back to nba. How could we restore the original row order?
- The nba DataFrame still keeps its numeric index labels.
- sort_index() sorts rows by their index labels.

🗂️ Indexing Methods

🏷️ Setting a New Index

We can use set_index() when we want to replace the current index of a DataFrame with one or more existing columns.
- This is especially useful when:
  - a column uniquely identifies each row (for example, an ID);
  - we want to use that identifier as the index for easier data manipulation.

🏷️ Setting a New Index with `set_index()`

# The two lines below are equivalent
nba.set_index(keys = "Name")
nba.set_index("Name")

set_index() returns a new DataFrame in which the chosen column becomes the index.
- Its first parameter, keys, takes the column name.

♻️ Resetting an Index with `reset_index()`

nba2 = nba.set_index("Name")
nba2.reset_index(inplace=True)

We use reset_index():
- when we want to turn the index back into a regular DataFrame column;
- when we want to restore the default integer index.
Note: With inplace=True, the operation modifies the original DataFrame directly.

📍 Locating Observations

🔎 Locating Observations or Values

We can extract rows, columns, and individual values from a DataFrame by using the loc[] and iloc[] accessors.
- These accessors work especially well when we know the index labels or positions we want to target.

🏷️ Locating Rows with `.loc[Index Labels]`

Let’s use nba with Name as the index.

# The two lines below are equivalent
nba = nba.set_index("Name")
nba.set_index("Name", inplace = True)

Below extracts observations:

nba.loc[ "LeBron James" ]
nba.loc[ ["Kawhi Leonard", "Paul George"] ]

.loc extracts rows by index label.

✂️ Slicing Rows with `.loc[:]` 1

(
    nba
    .sort_index()
    .loc["Otto Porter":"Patrick Beverley"]
)

What is the code above doing?
- Note: With .loc, both the starting label and ending label are inclusive.

✂️ Slicing Rows with `.loc[:]` 2

(
    nba
    .sort_index()
    .loc["Zach Collins":]
)

(
    nba
    .sort_index()
    .loc[:"Al Horford"]
)

We can use loc[:] to pull rows:
- from the middle of the DataFrame to the end;
- from the beginning of the DataFrame to a specific index label.

🔢 Locating Rows with `.iloc[Index Positions]`

nba.iloc[ 300 ]
nba.iloc[ [100, 200, 300, 400] ]

nba.iloc[400:404]
nba.iloc[:2]
nba.iloc[447:]
nba.iloc[-10:-6]
nba.iloc[0:10:2] # every other rows

.iloc (index location) locates rows by their index position.
- This is useful when row positions matter.
- We pass integers.
.iloc[:] follows regular Python slicing rules.
- The ending position is NOT inclusive.

🎯 Locating Values

🧭 Locating Values with `loc[Rows, Columns]`

nba.loc[
    "LeBron James",
    "Team"
]

nba.loc[
     "James Harden", 
      ["Position", "Birthday"] 
]

nba.loc[
    ["Russell Westbrook", "Anthony Davis"],
     ["Team", "Salary"]
]

nba.loc[
    "Joel Embiid", 
    "Position":"Salary"
]

Both .loc and .iloc accept a second argument for the column(s) to extract.
- With .loc, we provide column names.

🧭 Locating Values with `loc[Rows, Columns]` or `iloc[Rows, Columns]` with Integers

nba.iloc[
    57, 
    3
]

nba.iloc[
    100:104, 
    :3
]

Both .loc and .iloc accept a second argument for the column(s) to extract.
- With .iloc, we provide column positions.

🚀 Classwork 11: Pandas Fundamentals

Let’s work on Classwork 11!

Lecture 7

🏀 nba DataFrame

🔢 Sorting Methods

🔤 Sorting by One Column with sort_values()

🔽 Sorting in Descending Order with sort_values(ascending = False)

🔗 Pandas Fundamentals: Method Chaining

🔼🔽🔍 Finding Extremes with nsmallest() and nlargest()

🧺 nsmallest() and nlargest() with keep = "all"

🔤🔢 Sorting by Multiple Columns with sort_values()

🔽 Sorting Multiple Columns with One ascending Value

🔽🔼️ Sorting Multiple Columns with Different Orders

🧾 Sorting by Row Index with sort_index()

🗂️ Indexing Methods

🏷️ Setting a New Index

🏷️ Setting a New Index with set_index()

♻️ Resetting an Index with reset_index()

📍 Locating Observations

🔎 Locating Observations or Values

🏷️ Locating Rows with .loc[Index Labels]

✂️ Slicing Rows with .loc[:] 1

✂️ Slicing Rows with .loc[:] 2

🔢 Locating Rows with .iloc[Index Positions]

🎯 Locating Values

🧭 Locating Values with loc[Rows, Columns]

🧭 Locating Values with loc[Rows, Columns] or iloc[Rows, Columns] with Integers

🚀 Classwork 11: Pandas Fundamentals

🏀 `nba` DataFrame

🔤 Sorting by One Column with `sort_values()`

🔽 Sorting in Descending Order with `sort_values(ascending = False)`

🔼🔽🔍 Finding Extremes with `nsmallest()` and `nlargest()`

🧺 `nsmallest()` and `nlargest()` with `keep = "all"`

🔤🔢 Sorting by Multiple Columns with `sort_values()`

🔽 Sorting Multiple Columns with One `ascending` Value

🧾 Sorting by Row Index with `sort_index()`

🏷️ Setting a New Index with `set_index()`

♻️ Resetting an Index with `reset_index()`

🏷️ Locating Rows with `.loc[Index Labels]`

✂️ Slicing Rows with `.loc[:]` 1

✂️ Slicing Rows with `.loc[:]` 2

🔢 Locating Rows with `.iloc[Index Positions]`

🧭 Locating Values with `loc[Rows, Columns]`

🧭 Locating Values with `loc[Rows, Columns]` or `iloc[Rows, Columns]` with Integers