Can't scrape data from website

Hi there,

I am trying to scrape information from a website which has an imbedded table in it. Essentially, I need to click each entry and then scrape additional information which appears.

I think that the website may have some sort of protection which prevents selectors (the bot seems to just select the entire table when looking for any entry) and I also can’t right click or drag to copy the information.

Has anyone experienced this before? Is there a way in which I can get around this? Thanks!

Hello dr,

Is the Website access to public? can you share the URL if possible? or if its possible can you share the screen shares to understand your problem.

Regards
Balram

Unfortunately it is not public and contains sensitive info so I cannot share.

The table is embedded into the website using Tableau; I think this is preventing it?

is the site in Chrome? Do you have the Chrome extension enabled?

Hi,
Try with a different browser like Internet Explorer. if it is chrome please check if the extension is enabled in your studio or not.
Kindly verify the below screenshot to check the chrome extension in your studio,


Thanks!

if unable to use selectors then my suggestion is use CV activities for get the solution, orelse you can use get ocr text or something,lets try for that
,
Veeraraj S

Possible, have you checked the element how its generated by browser for this table using browser developer tools?

If you found the elements are not shown as table, you are right you have to use CV option to extract data. In addition, you can also check how the website is connecting the data, if its api you can directly connect api instead of using the table scrapping.

Regards
Balram

Try with find children activity

Cheers @dr1992

1 Like

The table is generated as a series of images embedded into the illusion of the table, I discovered fro the dev tools.

I wasn’t able to use the CV options to get it as it means that I have to click specific non-existent lines to get all the data. It is an API (I think), but we can’t connect it as it isn’t our website.