Web-scrapping with Python selenium
March 14, 2025
selenium
WebDriver is an wire protocol that defines a language-neutral interface for controlling the behavior of web browsers.
The purpose of WebDriver is to control the behavior of web browsers programmatically, allowing automated interactions such as:
Selenium WebDriver refers to both the language bindings and the implementations of browser-controlling code.
pip
:
pip install selenium
webdriver.Chrome()
webdriver
from selenium
and (2) the By
and Options
classes.
webdriver.Chrome()
opens the Chrome browser that is being controlled by automated test software, selenium
.# Import the necessary modules from the Selenium package
from selenium import webdriver # Main module to control the browser
from selenium.webdriver.common.by import By # Helps locate elements on the webpage
from selenium.webdriver.chrome.options import Options # Allows setting browser options
# Create an instance of Chrome options
options = Options()
options.add_argument("window-size=1400,1200") # Set the browser window size to 1400x1200
# Initialize the Chrome WebDriver with the specified options
driver = webdriver.Chrome(options=options) # Correct implementation
# Now you can use 'driver' to control the Chrome browser
get()
Method in WebDriverget(url)
from webdriver
opens the specified URL in a web browser.webdriver
in Google Chrome, you may see the message:
form_url = "https://qavbox.github.io/demo/webtable/"
driver.[?](form_url)
driver.close()
driver.quit()
close()
terminates the current browser window.quit()
completely exits the webdriver
session, closing a browser window.find_element()
Elements
panel, hover over the DOM structure to locate the desired element.find_element()
<input>
, <button>
, <div>
) used for the element.id
, class
, name
) that define the element.find_element()
& find_elements()
find_element()
find_element(By.ID, "id")
find_element(By.CLASS_NAME, "class name")
find_element(By.NAME, "name")
find_element(By.CSS_SELECTOR, "css selector")
find_element(By.TAG_NAME, "tag name")
find_element(By.LINK_TEXT, "link text")
find_element(By.PARTIAL_LINK_TEXT, "partial link text")
find_element(By.XPATH, "xpath")
Selenium provides the find_element()
method to locate elements in a page.
To find multiple elements (these methods will return a list):
find_elements()
find_element(By.ID, "")
find_element(By.ID, "")
& find_elements(By.ID, "")
:
form1
:find_element(By.CLASS_NAME, "")
find_element(By.CLASS_NAME, "")
& find_elements(By.CLASS_NAME, "")
:
homebtn
class:find_element(By.NAME, "")
find_element(By.CSS_SELECTOR, "")
find_element(By.CSS_SELECTOR, "")
& find_elements(By.CSS_SELECTOR, "")
:
find_element(By.TAG_NAME, "")
find_element(By.LINK_TEXT, "")
find_element(By.PARTIAL_LINK_TEXT, "")
find_element(By.XPATH, "")
find_element(By.XPATH, "")
& find_elements(By.XPATH, "")
:
//
→ Selects elements anywhere in the document.tag_name
→ HTML tag (input
, div
, span
, etc.).@attribute
→ Attribute name (id
, class
, aria-label
, etc.).value
→ Attribute’s assigned value.<tr>
(rows) and <th>
(headers) without an easily identifiable ID or class.find_element(By.TAG_NAME, "")
is not reliable due to multiple <tr>
and <th>
tags.By.ID
, By.CLASS_NAME
, etc.) don’t work.get_attribute()
get_attribute()
extracts an element’s attribute value.selenium
Let’s do Classwork 9!
NoSuchElementException
and WebDriverWait
NoSuchElementException
and try-except
blocksfrom selenium.common.exceptions import NoSuchElementException
try:
elem = driver.find_element(By.XPATH, "element_xpath")
elem.click()
except NoSuchElementException:
pass
NoSuchElementException
.
try-except
can be used to avoid the termination of the selenium code.time.sleep()
import time
# example webpage
url = "https://qavbox.github.io/demo/delay/"
driver.get(url)
driver.find_element(By.XPATH, '//*[@id="one"]/input').click()
time.sleep(5)
element = driver.find_element(By.XPATH, '//*[@id="two"]')
element.text
The time.sleep()
method is an explicit wait to set the condition to be an exact time period to wait.
In general, a more efficient solution than time.sleep()
would be to make WebDriver()
wait only as long as required.
implicitly_wait()
driver.find_element(By.XPATH, '//*[@id="oneMore"]/input[1]').click()
driver.implicitly_wait(10) # Wait up to 10 seconds for elements to appear
element2 = driver.find_element(By.ID, 'delay')
element2.text
implicitly_wait()
directs the webdriver
to wait for a certain measure of time before throwing an exception.
webdriver
will wait for the element before the exception occurs.WebDriverWait
and EC
presence_of_element_located
:visibility_of_element_located
:selenium
Let’s do Classwork 10!