pandas Basics - Getting a Summary of Data; Selecting Variables; Counting Methods; Sorting Methods
February 12, 2025
nba DataFramenba:# Below is to import the pandas library as pd
import pandas as pd
# Below is for an interactive display of DataFrame in Colab
from google.colab import data_table
data_table.enable_dataframe_formatter()
# Below is to read nba.csv as nba DataFrame
nba = pd.read_csv("https://bcdanl.github.io/data/nba.csv",
parse_dates = ["Birthday"])
DataFrame.) is used for an attribute or a method on objects.DataFrame.METHOD()) is a function that we can call on a DataFrame to perform operations, modify data, or derive insights.
nba.info()DataFrame.ATTRIBUTE) is a property that provides information about the DataFrame’s structure or content without modifying it.
nba.dtypeDataFrame with .info()DataFrame object has a .info() method that provides a summary of a DataFrame:
.columns).shape).dtypes).count())
NaN.DataFrame with .describe().describe() method generates descriptive statistics that summarize the central tendency, dispersion, and distribution of each variable.
string-type variables if specified explicitly (include='all').nba_player_name_s = nba['Name'] # Series
nba_player_name_s
nba_player_name_df = nba[ ['Name'] ] # DataFrame
nba_player_name_dfDataFrame, we can access the variable with its name using squared brackets, [ ].
DataFrame[ 'var_1' ]DataFrame[ ['var_1'] ]DataFrame[ ['var_1', 'var_2', ... ] ]select_dtypes()# To include only string variables
nba.select_dtypes(include = "object")
# To exclude string and integer variables
nba.select_dtypes(exclude = ["object", "int"])select_dtypes() method to select columns based on their data types.
include and exclude..count().count() counts the number of non-missing values in a Series/DataFrame..value_counts().value_counts() counts the number of occurrences of each unique value in a Series/DataFrame..nunique().nunique() counts the number of unique values in each variable in a DataFrame.Let’s do Questions 1-3 in Classwork 5!