Classwork 3

Scrapping Web-tables with pd.read_html()

Author

Byeong-Hak Choe

Published

February 11, 2026

Modified

February 13, 2026

For Classwork 3, import os and pandas libraries:

import os  
import pandas as pd

Question 1. Load a CSV file into pandas

In this question, you will download a CSV file, put it into a data folder, and then load it into Python using pandas.

Step 1) Download the file

  1. Go to Brightspace β†’ Course Files.
  2. Download custdata_rev.csv to your computer.

Step 2) Create a data folder

In your course project folder (your working directory), create a folder named:

  • data

Example:

  • If your course folder is DANL-210 and this folder is in your Documents folder, then you should have:
    • Mac: '/Users/YOUR_USERNAME/Documents/DANL-210/data'
    • Windows: 'C:\\Users\\YOUR_USERNAME\\Documents\\DANL-210\\data'

Step 3) Move the CSV file into data

Move custdata_rev.csv into the data folder so it looks like:

  • Mac: DANL-210/data/custdata_rev.csv
  • Windows: DANL-210\\data\\custdata_rev.csv

Step 4) Set your working directory in Python

Replace the path below with the folder that contains your project (the folder that contains data).

import os
import pandas as pd

# Replace this with YOUR project folder path (the folder that contains the "data" folder)
wd_path = "ABSOLUTE_PATH_TO_YOUR_WORKING_DIRECTORY"

os.chdir(wd_path)      # set working directory
os.getcwd()            # check current working directory

Step 5) Read the CSV in two ways

(a) Using a relative path

If your working directory is set to the project folder, you can load the file like this:

path_relative = "data/custdata_rev.csv"

# Read the CSV file into a pandas DataFrame.
# pd.read_csv(...) loads the file and creates a table-like object (a DataFrame) in Python.
# After this line runs, df_rel will contain all rows and columns from the CSV.
df_rel = pd.read_csv(path_relative)

(b) Using an absolute path

You can also load the file using the full path to the file:

path_absolute = "ABSOLUTE_PATHNAME_OF_custdata_rev.csv"
df_abs = pd.read_csv(path_absolute)
# Set the working directory path
wd_path = 'YOUR_ABSOLUTE_PATHNAME_FOR_DANL-210' # e.g., '/Users/bchoe/Documents/DANL-210'
os.chdir(wd_path)  # Change the current working directory to wd_path
os.getcwd()  # Retrieve and return the current working directory

path_relative = "data/custdata_rev.csv"
df_rel = pd.read_csv(path_relative)

path_absolute = "YOUR_ABSOLUTE_PATHNAME_FOR_DANL-210/data/custdata_rev.csv"
df_abs = pd.read_csv(path_absolute)



Question 2. Scrapping a Web-table with pd.read_html()

url_eia = "https://www.eia.gov/petroleum/gasdiesel/gaspump_hist.php"
  • Export the DataFrame as a CSV file.
url_eia = "https://www.eia.gov/petroleum/gasdiesel/gaspump_hist.php"
df_eia = pd.read_html(url_eia)
df_eia = df_eia[0]

df_eia.to_csv('data/eia_table.csv', index=False)

# Storing 'eia_table_new.csv' directly in the working directory.
df_eia.to_csv('eia_table_new.csv', index=False)

# Storing 'eia_table_abs.csv' using the absolte pathname
df_eia.to_csv('/Users/bchoe/Documents/DANL-210/eia_table_abs.csv',
              index=False)



Question 3. Scrapping Multiple Web-tables with pd.read_html()

url_geneseo = "https://www.geneseo.edu/business/student%20outcomes"
  • Export the DataFrame as a CSV file.
url_sob = 'https://www.geneseo.edu/business/student-outcomes/'

df_list = pd.read_html(url_sob)

df_sob_0 = df_list[0]
df_sob_1 = df_list[1]
df_sob_2 = df_list[2]
df_sob_3 = df_list[3]
df_sob_4 = df_list[4]


df_sob_1.columns = df_sob_1.iloc[0] + df_sob_1.iloc[1]
df_sob_1 = df_sob_1.iloc[2:]
df_sob_1.columns = ['Program', 'Percent (%)2015-16',
                   'Percent (%)2016-17', 'Percent (%)2017-18',
                   'Percent (%)2018-19', 'Percent (%)2019-20',
                   'Percent (%)5-Year % Change']



Discussion

Welcome to our Classwork 3 Discussion Board! πŸ‘‹

This space is designed for you to engage with your classmates about the material covered in Classwork 3.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 3 materials or need clarification on any points, don’t hesitate to ask here.

All comments will be stored here.

Let’s collaborate and learn from each other!

Back to top