pandas
Basics - Sorting Methods; Setting a New Index; Locating Observations/Values
February 14, 2025
nba
DataFramenba
:# Below is to import the pandas library as pd
import pandas as pd
# Below is for an interactive display of DataFrame in Colab
from google.colab import data_table
data_table.enable_dataframe_formatter()
# Below is to read nba.csv as nba DataFrame
nba = pd.read_csv("https://bcdanl.github.io/data/nba.csv",
parse_dates = ["Birthday"])
sort_values()
sort_values()
method’s first parameter, by
, accepts the variables that pandas should use to sort observations.sort_values()
sort_values()
method’s ascending
parameter determines the sort order.
ascending
has a default argument of True
.DataFrame
has various methods that modify the existing DataFrame
.nsmallest()
and nlargest()
nsmallest()
are useful to get the first n
observations ordered by a variable in ascending order.
nlargest()
are useful to get the first n
observations ordered by a variable in descending order.
nsmallest()
and nlargest()
keep = "all"
keeps all duplicates, even it means selecting more than n
observations.sort_values()
DataFrame
by multiple columns by passing a list to the by
parameter.sort_values()
ascending
parameter to apply the same sort order to each variable.sort_values()
ascending
parameter.sort_values()
Q. Which players on each team are paid the most?
sort_index()
nba
to nba
DataFrame sorted by “Name”, how can we return it to its original form of DataFrame
?
nba
DataFrame
still has its numeric index labels.sort_index()
sorts observations by their index labels (row names).sort_index()
sort_index()
method can also be used to change the order of variables in an alphabetical order.
axis
parameter and pass it an argument of "columns"
or 1
.set_index()
method when we want to change the current index of a DataFrame
to one or more existing columns.
set_index()
set_index()
method returns a new DataFrame
with a given column set as the index.
keys
, accepts the column name.reset_index()
reset_index()
method:
DataFrame
column;inplace=True
, the operation alters the original DataFrame
directly.We can extract observations, variables, and values from a DataFrame
by using the loc[]
and iloc[]
accessors.
.loc[Index Labels]
nba
with the Name
index.# The two lines below are equivalent
nba = nba.set_index("Name")
nba.set_index("Name", inplace = True)
.loc
attribute extracts an observation by index label (row name)..loc[Index Labels]
.loc[Index Labels]
loc[:]
to pull rows:
DataFrame
to its end;DataFrame
to a specific index label..iloc[Index Positions]
.iloc
(index location) attribute locates rows by index position.
.iloc[:]
is similar to the slicing syntax with strings/lists.
Let’s do Questions 4-7 in Classwork 5!
loc[Rows, Columns]
or iloc[Rows, Columns]
.loc
and .iloc
attributes accept a second argument representing the column(s) to extract.
.loc
, we have to provide the column names.loc[Rows, Columns]
or iloc[Rows, Columns]
.loc
and .iloc
attributes accept a second argument representing the column(s) to extract.
.iloc
, we have to provide the column position.