Scrape from Yes/No radio boxes


So for a project i have to scrape data from an employee’s evaluation page and make a structered backup of it. It’s all fine and dandy however i can not scrape radio box → radio boxes inputs from html. Here’s an example element:

<input type="radio" id="radiofield-9999-inputEl" name="rb433143333Answer" data-ref="inputEl" tabindex="0" class="x-form-cb-input x-hidden-clip" autocomplete="off" hidefocus="true" role="radio" aria-hidden="false" aria-disabled="false" aria-readonly="false" aria-invalid="false" aria-checked="true" aria-labelledby="radiofield-9999-boxLabelEl" data-componentid="radiofield-9999" checked="checked">

aria-checked=“true” means “Yes” and aria-checked=“false” means “No”. The id and name is dynamic so i cant enumerate over them. Ideally, i want to scrape questions to their left (like “said hello”) as well.

I can achieve this task in Python by enumerating over the html string and looking for aria-checked:

import selenium
from selenium import webdriver
from bs4 import BeautifulSoup
from iteration_utilities import grouper

url = ""
driver = webdriver.Chrome()
page = driver.page_source
splitted = page.split("aria-checked")

check_list = []
for split in splitted[1:]:
    key = split[2:7].strip('"')

out_list = list(grouper(check_list, 2))

However this is not ideal either as there can be other buttons with aria-checked string in it.

Any ideas as to how achieve this task?

Hi @Benoni

Based on UiExplorer are you able to identify the element

can you share the screenshot

Ashwin S

I can identify the entire box like follows:


However that element doesn’t (as far as i can see) include whether the selected box is Yes or No. Here’s the Uiexplorer screenshot:

Hi @Benoni

While pressing F2 or f3 are you able to identify the element yes

Ashwin S

I can identify the element by selecting region (F3), it selects the container div which doesn’t include any Yes/No information either.

<webctrl id='container-1387' tag='DIV' />

This container id is dynamic so i can not use it.

Ended up figuring out how to select the element with anchors and parsed instances of “aria-checked” from get attribute(“innerhtml”) then restructured the data.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.