1

I am trying a fun project related to future Canadian Health and safety Industry career aspirations that will also help me build Python skills. There is a website that provides data on Canadian occupation health and safety statistics.

My goal is to use Selenium to interact with the drop-down menus. There are three: one for the year, one for the category the data is split into, e.g., Gender, Occupation, etc., and the other has two options: fatality or Lost Time Claim (LTC).

Then, there is a "generate" button that I need to click. The page loads, and a table is produced, which gives you the option to export the table as a .csv or .pdf file.

The other thing is I want to automate the downloads of all possible reports for every year, category and for both fatalities and LTCs.

I am a beginner in Python and have been struggling for the past few days, spending a few hours daily trying to get this to work.

So far, this is my code:

# First, let's import the Selenium library and the required modules for this project.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
import time

# Then lets specify the URL of interest and navigate to the webpage.

url = "https://awcbc.org/en/statistics/national-work-injury-disease-and-fatality-statistics-nwisp-year-at-a-glance/"
driver = webdriver.Chrome()
driver.get(url)

time.sleep(7)


# The following helps locate the iframe that contains the drop-down menus needed.

try: 
    iframe = driver.find_element(By.XPATH, "//*[@id='res-inventry-page']/div/div/div/div/p[7]/iframe")
    driver.switch_to.frame(iframe)
except: 
    print("Failed to locate iframe")
else:
    print("iframe successfully switched.")
    
time.sleep(7)
wait = WebDriverWait(driver, 10)  # Wait for a maximum of 10 seconds
    # Define all the drop-down menus within the iframe and try and select one option in each.
ddyear = driver.find_element(By.ID, "DropdownListYear")
Select(ddyear)

years = driver.find_elements(By.XPATH, "//select[@id='DropdownListYear']/option")
years = years[1:]


yr2022 = driver.find_element(By.XPATH, '//*[@id="DropdownListYear"]/option[2]')
Select(yr2022)

# for year in years:
#     print(year.text)
    

category = driver.find_elements(By.ID, "DropdownReportType")
categories = driver.find_elements(By.XPATH, '//*[@id="DropdownReportType"]/option')
categories = categories[1:]

result = driver.find_element(By.ID, "DropdownScope")
results = driver.find_elements(By.XPATH, '//*[@id="DropdownScope"]/option')
results = results[1:]

generate_button = driver.find_element(By.XPATH, '//*[@id="generateReport"]')

export_csv = driver.find_element(By.XPATH, '//*[@id="ExportExcelLB"]')

When I try to select the first option I created in yr2022, I encounter the error UnexpectedTagNameException: Select only works on elements, not on option. I'm confident that the technical expertise of this community can help me resolve this issue.

I want help with that and also any other tips on using some type of loop to select all possible combinations of reports available and then automate the downloads. If there is an easier way to find the download link from the webpage, save it to a text file, and use wget, I'm open to that as well. This is something I would like to script myself, so just a point in the right direction is what I'm looking for, and I'd like to learn how to do it.

Thank you all in advance. I appreciate all the help this community provides.

1
  • Please edit your question and post the full error message as text, properly formatted and indicate on which line it was thrown. Commented May 4, 2024 at 14:40

2 Answers 2

0

Some feedback...

  1. time.sleep() should be avoided in the majority of cases. It slows your script down and doesn't really help with sporadic wait times. Best practice is to use WebDriverWait in all cases.

  2. Familiarize yourself with all the different wait options in expected_conditions. For example, there is EC.frame_to_be_available_and_switch_to_it() for waiting and switching into IFRAMEs that you could take advantage of.

  3. You found the Select class but never really used it. It makes interacting with SELECT HTML elements much easier and can be used here to simplify the code. For example, given the HTML below

    <select class="A" id="DropdownListYear" name="DropdownListYear" size="1">
        <option value="-1">Select one...</option>
        <option value="2022">2022</option>
        <option value="2021">2021</option>
    </select>
    

    You could use the code below to select "2022" three different ways.

    year_select = Select(driver.find_element(By.ID, "DropdownListYear"))
    year_select.select_by_index(1) # index starts at 0
    year_select.select_by_value("2022")
    year_select.select_by_visible_text("2022")
    

    where

    <option value="2022">2022</option>
                   ^^^^ .select_by_value("2022")
                         ^^^^ .select_by_visible_text("2022")
    
  4. I would suggest that you use try-except sparingly, if at all. While learning, you need to see every exception/error message and learn how to read stack traces. It will significantly increase your understanding of what failed and where and speed the process of understanding and fixing the issue. That said, they are useful but use them only to catch specific exceptions that you plan to handle.

    For example, instead of

    except:
        print("Failed to locate iframe")
    

    use

    from selenium.common.exceptions import NoSuchElementException
    
    except NoSuchElementException:
        print("Failed to locate iframe")
    

    That way if any other exception is thrown, you will see it. When you use just except: you are eating ALL exception types which can lead you to assume you are getting a NoSuchElementException when in fact it's something else which can lead to confusion and time wasted.

With all those suggestions in mind, I refactored the code to the below.

NOTE: Each file downloaded has the same generic name, report.xls. You may want to find that file after it's done downloading and rename it based on the current dropdown choices. I left label in there that contains each dropdown choice that you could repurpose for a filename, if you wanted.

# First, let's import the Selenium library and the required modules for this project
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

# Then lets specify the URL of interest and navigate to the webpage
url = "https://awcbc.org/en/statistics/national-work-injury-disease-and-fatality-statistics-nwisp-year-at-a-glance/"
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)

wait = WebDriverWait(driver, 10)  # Wait for a maximum of 10 seconds

# The following helps locate the iframe that contains the drop-down menus needed
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe.iframe-class")))

# Define the Year dropdown and loop
year_select = Select(driver.find_element(By.ID, "DropdownListYear"))

# for year in range(1, 2): # for debugging
for year in range(1, len(year_select.options)):
    year_select.select_by_index(year)
    year_label = year_select.first_selected_option.text

    # Define the NWISP Category dropdown
    category_select = Select(driver.find_element(By.ID, "DropdownReportType"))
    for category in range(1, len(category_select.options)):
    # for category in range(1, 2): # for debugging
        category_select.select_by_index(category)
        category_label = category_select.first_selected_option.text

        # Define the Data Type dropdown
        type_select = Select(driver.find_element(By.ID, "DropdownScope"))
        for type in range(1, len(type_select.options)):
            type_select.select_by_index(type)
            type_label = type_select.first_selected_option.text

            # print a combined label of dropdown selections
            label = f"{year_label} : {category_label} : {type_label}"
            print(label)

            driver.find_element(By.ID, 'generateReport').click()

            # switch into the Export IFRAME
            wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "iframe")))
            wait.until(EC.element_to_be_clickable((By.ID, "ExportExcelLB"))).click()

            # switch back to default and then into the first IFRAME
            driver.switch_to.default_content()
            wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe.iframe-class")))

This probably doesn't do everything you want it to do but it should be a framework that you can build on and learn from.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much for your help and guidance. Although I haven’t run the script yet (only have an hour or two to work on it every couple days now ) I have incorporated your suggestions and the code looks a lot cleaner. I’ve added chrome options to download to a directory I created but still trying to work around how to dynamically change the download file names as I go using the labels you’ve helped create. It will be fun to figure it out. I will keep you updated!
0

That's great you're skilling up in python and Selenium.

I'm going to offer an alternative solution here, since there is a more efficient way to fetch the data. If you go to Inspect (ctrl-alt-i), you'll see that data is fetched through a specific url and passed with certain parameters. This will return to you the html, and then you just need to parse the table (using Beautiful Soup - however pandas will do <table> tags for you and using Beautiful Soup under the hood). Here's an example:

import requests
import pandas as pd

url = 'https://aoc.awcbc.org/WebForms/ViewReport.aspx'

payload = {
    'report': 'NwispYearAtAGlance',
    'year': '2022',
    'reportType': 'Occupation',
    'scope': 'Fatality',
    'useTempData': 'false'
        }

response = requests.get(url, params=payload)
dfs = pd.read_html(response.text)

for df in dfs:
    if df.shape[-1] == 15:
        break 
    
print(df)

Output:

[34 rows x 15 columns]

print(df.to_string())
       0                                                                                               1    2    3    4    5    6    7    8    9    10   11     12   13      14
0     NaN                                                                                             NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN    NaN  NaN     NaN
1    Code                                                                                      Occupation   NL   PE   NS   NB   QC   ON   MB   SK   AB   BC  NT/NU   YT  Canada
2      00                                                                   Senior Management Occupations    X    X    X    X    X    X    X    X    X    X      X    X       X
3   01-05                                                       Specialized middle management occupations    X    X    X    X    X    X    X    X    X    X      X    X       7
4      06               Middle management occupations in retail and wholesale trade and customer services    X    X    X    X    X    X    X    X    X    5      X    X       8
5   07-09               Middle management occupations in trades, transportation, production and utilities    X    X    X    X    X    X    X    X    X    4      X    X      11
6      11                                                Professional Occupations in Business and Finance    X    X    X    X    X    X    X    X    X    X      X    X       X
7      14                                                                      Office support occupations    X    X    X    X    X    X    X    X    X    X      X    X       X
8      15                                 Distribution, tracking and scheduling co-ordination occupations    X    X    X    X    X    X    X    X    X    X      X    X       6
9      21                                        Professional Occupations in Natural and Applied Sciences    X    X    X    X    X    6    X    X    X    X      X    X      12
10     22                                   Technical Occupations Related to Natural and Applied Sciences    X    X    X    X    4    4    X    X    7    5      X    X      21
11     30                                                             Professional occupations in nursing    X    X    X    X    X    X    X    X    X    X      X    X       4
12     32                                                                 Technical occupations in health    X    X    X    X    X    X    X    X    X    X      X    X       4
13     34                                             Assisting Occupations in Support of Health Services    X    X    X    X    X    X    X    X    X    X      X    X       6
14     40                                                  Professional occupations in education services    X    X    X    X    X    X    X    X    X    X      X    X       4
15     42                 Paraprofessional occupations in legal, social, community and education services    X    X    X    X    X    X    X    X    X    X      X    X       X
16     43                                            Occupations in front-line public protection services    6    X    X    X   20   49    6    5   15   18      X    X     121
17     44                 Care providers and educational, legal and public protection support occupations    X    X    X    X    X    X    X    X    X    X      X    X       5
18     52                                     Technical occupations in art, culture, recreation and sport    X    X    X    X    X    X    X    X    X    X      X    X       X
19     62                                      Retail sales supervisors and specialized sales occupations    X    X    X    X    X    X    X    X    X    X      X    X       4
20     63                                         Service supervisors and specialized service occupations    X    X    X    X    X    X    X    X    X    X      X    X       9
21     64                             Sales representatives and salespersons - wholesale and retail trade    X    X    X    X    X    X    X    X    X    X      X    X       7
22     65                    Service representatives and other customer and personal services occupations    X    X    X    X    X    X    X    X    X    X      X    X       5
23     66                                                                       Sales support occupations    X    X    X    X    X    X    X    X    X    X      X    X       X
24     67                                           Service support and other service occupations, n.e.c.    X    X    X    X    X    7    X    X    4    X      X    X      17
25     72                                                  Industrial, electrical and construction trades    X    X    X    5   75   67    X   11   32   37      X    X     232
26     73                                                      Maintenance and equipment operation trades    X    X    X    X   25   23    X    X   10   18      X    X      83
27     74                                 Other installers, repairers and servicers and material handlers    X    X    X    X   15    7    X    X    X    4      X    X      31
28     75                     Transport and heavy equipment operation and related maintenance occupations    5    X    X    4   11   29    X    9   20   32      X    X     115
29     76                                  Trades Helpers, Construction Labourers and Related Occupations    X    X    X    X   10   24    X    X    9   12      X    X      58
30     82  Supervisors and technical occupations in natural resources, agriculture and related production    X    X    X    X    X    6    X    X    X    9      X    X      28
31     84                                Workers in natural resources, agriculture and related production    X    X    X    X    6    5    X    X    4    4      X    X      21
32     86                                         Harvesting, landscaping and natural resources labourers    4    X    X    X    4    7    X    X    X    X      X    X      18
33     92               Processing, manufacturing and utilities supervisors and central control operators    X    X    X    X    X   11    X    X    X    X      X    X      17

3 Comments

This is interesting. So pandas has beautifulsoup built in? My initial plan was to practice with pandas once I had downloaded all the excel files to help clean the data up and eventually combine it into one database.
Also do you have any books or tutorials you can recommend using beautiful soups and pandas or selenium? I am trying to learn a good amount but could not locate the urls you were mentioning. If I could find all of them than I can automate the process using Requests like you mentioned
You can find the urls by inspecting the page (ctrl-shft-i) and go to the network tab. You may need to refresh the page, but you can see where the requests are made and look to see if one of the requests is returning the data you want. As far as books or tutorials, theres tons on you tube and google and medium. They are more or less similar.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.