Classwork 9

Scrapping Data with Python selenium

Author

Byeong-Hak Choe

Published

March 24, 2025

Modified

March 14, 2025

Below is to set up the web scrapping environment with Python selenium:

import pandas as pd
import os
import time
import random
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

wd_path = 'PATHNAME_FOR_YOUR_DATA_DIRECTORY'   # 'YOUR_DATA_DIRECTORY'
os.chdir(wd_path)


Question 1

for i in range(1, 10):
    xpath = f'/html/body/div[1]/div[2]/div/div[4]/div/div[1]/div/table/tbody/tr[{i}]/td[1]'
    print(xpath)

Answer:



Question 2

  • Provide your Python Selenium code to scrape all the quotes in this website.

    • You should create the two DataFrames with the following variables:
    1. DataFrame about each quote with the following variables:
    • quote
    • author
    • tags
    • about, URL for description about each author.
    1. DataFrame about each author with the following variables:
    • about, URL for description about each author.
    • born_date
    • born_location
    • author_description
  • Find the top 15 most frequently occurred tags.

  • Save the two DataFrames in the CSV files.

Answer:



Discussion

Welcome to our Classwork 9 Discussion Board! 👋

This space is designed for you to engage with your classmates about the material covered in Classwork 9.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 9 materials or need clarification on any points, don’t hesitate to ask here.

All comments will be stored here.

Let’s collaborate and learn from each other!

Back to top