Hi,
So for a project i have to scrape data from an employee’s evaluation page and make a structered backup of it. It’s all fine and dandy however i can not scrape radio box → inputs from html. Here’s an example element:
<input type="radio" id="radiofield-9999-inputEl" name="rb433143333Answer" data-ref="inputEl" tabindex="0" class="x-form-cb-input x-hidden-clip" autocomplete="off" hidefocus="true" role="radio" aria-hidden="false" aria-disabled="false" aria-readonly="false" aria-invalid="false" aria-checked="true" aria-labelledby="radiofield-9999-boxLabelEl" data-componentid="radiofield-9999" checked="checked">
aria-checked=“true” means “Yes” and aria-checked=“false” means “No”. The id and name is dynamic so i cant enumerate over them. Ideally, i want to scrape questions to their left (like “said hello”) as well.
I can achieve this task in Python by enumerating over the html string and looking for aria-checked:
import selenium
from selenium import webdriver
from bs4 import BeautifulSoup
from iteration_utilities import grouper
url = "https://somewebsite.com"
driver = webdriver.Chrome()
driver.get(url)
page = driver.page_source
splitted = page.split("aria-checked")
check_list = []
for split in splitted[1:]:
key = split[2:7].strip('"')
check_list.append(key)
out_list = list(grouper(check_list, 2))
However this is not ideal either as there can be other buttons with aria-checked string in it.
Any ideas as to how achieve this task?