Classwork 9
Scrapping Data with Python selenium
Below is to set up the web scrapping environment with Python selenium
:
import pandas as pd
import os
import time
import random
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
= 'PATHNAME_FOR_YOUR_DATA_DIRECTORY' # 'YOUR_DATA_DIRECTORY'
wd_path os.chdir(wd_path)
Question 1
Use Python Selenium to scrap the table in the following webpage as a
pandas
’DataFrame
:Export the
DataFrame
as a CSV file.Tip: Consider using loop over a XPath f-string:
for i in range(1, 10):
= f'/html/body/div[1]/div[2]/div/div[4]/div/div[1]/div/table/tbody/tr[{i}]/td[1]'
xpath print(xpath)
Answer:
Question 2
Provide your Python Selenium code to scrape all the quotes in this website.
- You should create the two DataFrames with the following variables:
- DataFrame about each quote with the following variables:
quote
author
tags
about
, URL for description about each author.
- DataFrame about each author with the following variables:
about
, URL for description about each author.born_date
born_location
author_description
Find the top 15 most frequently occurred tags.
Save the two DataFrames in the CSV files.
Answer:
Discussion
Welcome to our Classwork 9 Discussion Board! 👋
This space is designed for you to engage with your classmates about the material covered in Classwork 9.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 9 materials or need clarification on any points, don’t hesitate to ask here.
All comments will be stored here.
Let’s collaborate and learn from each other!