Pandas Fundamentals I: Loading, Summarizing, and Counting Data
March 23, 2026
DataFrame with read_csv()info() and describe()[]value_counts(), nunique(), and count()sort_values() and sort_index()set_index() and reset_index()loc[] and iloc[].astype()DataFrames with .melt() and .pivot()DataFrames with .merge() Series and DataFrameSeries and DataFrame
Series: A one-dimensional object containing a sequence of values (like a list).
DataFrame: A two-dimensional table made of multiple Series columns sharing a common index.
DataFrameDataFrame represent individual units or entities for which data is collected.DataFrameDataFrame represent attributes or characteristics measured across multiple observations.Name, Age, Grade, MajorEmployeeID, Name, Age, DepartmentCustomerID, Name, Age, Income, HousingTypeNote
DataFrame, a variable is a column of data.DataFrame
A DataFrame is tidy if it follows three rules:
A tidy DataFrame keeps your data organized, making it easier to understand, analyze, and share in any data analysis.
read_csv()A CSV (comma-separated values) is a plain-text file that uses a comma to separate values (e.g., nba.csv).
The CSV is widely used for storing data, and we will use this throughout the module.
We use the read_csv() function to load a CSV data file.
read_csv() with parse_datesparse_dates parameter to coerce the values into datetimes.drive.mount('/content/drive')
files.upload()
drive ➡️ MyDrive …from google.colab import data_table
data_table.enable_dataframe_formatter() # Enabling an interactive DataFrame display
nbaDataFrames into interactive displays.nba DataFramenba:# Below is to import the pandas library as pd
import pandas as pd
# Below is to import the numpy library as np
import numpy as np
# Below is for an interactive display of DataFrame in Colab
from google.colab import data_table
data_table.enable_dataframe_formatter()
# Below is to read nba.csv as nba DataFrame
nba = pd.read_csv("https://bcdanl.github.io/data/nba.csv",
parse_dates = ["Birthday"])DataFrame.) is used for an attribute or a method on objects.DataFrame.METHOD()) is a function that we can call on a DataFrame to perform operations, modify data, or derive insights.
df.info()DataFrame.ATTRIBUTE) is a property that provides information about the DataFrame’s structure or content without modifying it.
df.columnsDataFrameDataFrame object has a .info() method that provides a summary of a DataFrame:
.columns).shape).count())
NaN.DataFrame with .describe().describe() method generates descriptive statistics that summarize the central tendency, dispersion, and distribution of each variable.
string-type variables if specified explicitly (include='all').max() method returns a Series with the maximum value from each variable.min() method returns a Series with the minimum value from each variable.sum()/mean()/median() method returns a Series with the sum/mean/median of the values in each variable.quantile() method returns a Series with the percentile value of the values in each variable (e.g., 25th, 75th, 90th percentile).std() method returns a Series with the standard deviation of the values in each variable.True to the sum()/mean()/median()/std() method’s numeric_only parameter.nba["Salary"] + nba["Salary"]
nba["Name"] + " (" + nba["Position"] + ")"
nba["Salary"] - nba["Salary"].mean()pandas performs a vectorized operation on Series or a variable in DataFrame.
drop(columns = ...)nba.columnsrename( columns = { "Existing One" : "New One" } )rename() method renames the variable Date of Birth to Birthday.DataFrame, we can access the variable with its name using squared brackets, [ ].
DataFrame[ 'var_1' ]DataFrame[ ['var_1'] ]DataFrame[ ['var_1', 'var_2', ... ] ].count()Series.count() counts the number of non-missing values in a single value.DataFrame.count() counts the number of non-missing values in a Series..value_counts().value_counts() counts the number of occurrences of each unique value in a Series..nunique()Series.nunique() counts the number of unique values in a single value integer.DataFrame.nunique() counts the number of unique values in each variable in a DataFrame, returning a Series.Let’s do Classwork 10!