I am trying a fun project related to future Canadian Health and safety Industry career aspirations that will also help me build Python skills. There is a website that provides data on Canadian occupation health and safety statistics.
My goal is to use Selenium to interact with the drop-down menus. There are three: one for the year, one for the category the data is split into, e.g., Gender, Occupation, etc., and the other has two options: fatality or Lost Time Claim (LTC).
Then, there is a "generate" button that I need to click. The page loads, and a table is produced, which gives you the option to export the table as a .csv or .pdf file.
The other thing is I want to automate the downloads of all possible reports for every year, category and for both fatalities and LTCs.
I am a beginner in Python and have been struggling for the past few days, spending a few hours daily trying to get this to work.
So far, this is my code:
# First, let's import the Selenium library and the required modules for this project.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
import time
# Then lets specify the URL of interest and navigate to the webpage.
url = "https://awcbc.org/en/statistics/national-work-injury-disease-and-fatality-statistics-nwisp-year-at-a-glance/"
driver = webdriver.Chrome()
driver.get(url)
time.sleep(7)
# The following helps locate the iframe that contains the drop-down menus needed.
try:
iframe = driver.find_element(By.XPATH, "//*[@id='res-inventry-page']/div/div/div/div/p[7]/iframe")
driver.switch_to.frame(iframe)
except:
print("Failed to locate iframe")
else:
print("iframe successfully switched.")
time.sleep(7)
wait = WebDriverWait(driver, 10) # Wait for a maximum of 10 seconds
# Define all the drop-down menus within the iframe and try and select one option in each.
ddyear = driver.find_element(By.ID, "DropdownListYear")
Select(ddyear)
years = driver.find_elements(By.XPATH, "//select[@id='DropdownListYear']/option")
years = years[1:]
yr2022 = driver.find_element(By.XPATH, '//*[@id="DropdownListYear"]/option[2]')
Select(yr2022)
# for year in years:
# print(year.text)
category = driver.find_elements(By.ID, "DropdownReportType")
categories = driver.find_elements(By.XPATH, '//*[@id="DropdownReportType"]/option')
categories = categories[1:]
result = driver.find_element(By.ID, "DropdownScope")
results = driver.find_elements(By.XPATH, '//*[@id="DropdownScope"]/option')
results = results[1:]
generate_button = driver.find_element(By.XPATH, '//*[@id="generateReport"]')
export_csv = driver.find_element(By.XPATH, '//*[@id="ExportExcelLB"]')
When I try to select the first option I created in yr2022, I encounter the error UnexpectedTagNameException: Select only works on elements, not on option. I'm confident that the technical expertise of this community can help me resolve this issue.
I want help with that and also any other tips on using some type of loop to select all possible combinations of reports available and then automate the downloads. If there is an easier way to find the download link from the webpage, save it to a text file, and use wget, I'm open to that as well. This is something I would like to script myself, so just a point in the right direction is what I'm looking for, and I'd like to learn how to do it.
Thank you all in advance. I appreciate all the help this community provides.