Dynamic Web Scraping

Beere_Plays · December 8, 2022, 11:10am

Hello!

I’m currently sitting on a web scraping project and I’m getting desperate.

My problem: I have a long list of websites, from which I have to check with the help of keywords whether these keywords are present and if so, I have to read the text that stands by these keywords and store in a DataTable.

Now my problem is that I would have to program all pages individually with data scraping to get a clean picture, is it possible that the selectors pull themselves the respective data points?

Anil_G · December 8, 2022, 11:13am

Hi @Beere_Plays

Can you share that web site or how it looks?How much text you need to read after that or before that?

If its tablular then we can get whole data and then use filter tables

if its plain text then need to look how we can get the data using regex

cheers

Beere_Plays · December 8, 2022, 11:30am

Hello!

The data must be filtered out based on keywords and then analyzed, which means the pages all look different and I can’t program each one individually via the web scraping activity, as there are well over 3000.

So I need a way to dynamically filter out data using keywords.

Anil_G · December 8, 2022, 11:36am

Hi @Beere_Plays

Can you show atleast one sample on how the data is atleast…may be from there we can suggest something. now everything is in air and nothing that we can give with this.

We should see atleast a pattern or a sample of how one of it looks.

We can use get text to identify if data is there or not…and may be we can give whole chrome window as selector to identify the value on any website opened in chrome

cheers

Beere_Plays · December 8, 2022, 11:41am

Yes here you go:

Anil_G · December 8, 2022, 11:43am

Hi @Beere_Plays

So you want to search for a keyword and if it is present get whole text beside it?

cheers

Beere_Plays · December 8, 2022, 11:43am

But it also could look like this:

Beere_Plays · December 8, 2022, 11:44am

Yes! Thats what i need, but i also have to Check the Date, because the Data cant be older than a week

Anil_G · December 8, 2022, 12:06pm

Hi @Beere_Plays

So as all data is random and all the websites are random and you don’t want to do 3000 obviously.

One thing you can try is create a process where you will get all the children of the body element using find children activity

then loop each child and get the text of it and see if the text you needed is present in it or not… if presetn you ahve your text

but yes getting the date again might not be straight …unless you identify some pattern like the date would be with a specific tag on all web sites which is not very realistic but atleast if you can find a pattern from the children you get a good selector find its desendants and search for the date

Date part is really tough I believe but text can be worked out like this

cheers

Beere_Plays · December 8, 2022, 12:49pm

i Will try this, but ive already tried so much.

I even cant find anything in the Word Wide Web that matches my conditions

But Thank you alot for your help!

Topic		Replies	Views
User search --> Web scraper --> results Activities selector , uiautomation , studio , data_scraping , question	1	747	October 19, 2020
Dynamic Selector for Data Scraping from Multiple Websites Studio selector , activities , data_scraping , question	0	853	July 27, 2020
Html Activities uiautomation , studio	7	449	November 8, 2023
Dynamic Datatable scraping Activities activities , web , question	5	2006	May 12, 2023
Extracting dynamic values from websites Studio activities , data_scraping , question	3	1692	March 12, 2020

Dynamic Web Scraping

Related topics