Dynamic Web Scraping

Hello!

I’m currently sitting on a web scraping project and I’m getting desperate.

My problem: I have a long list of websites, from which I have to check with the help of keywords whether these keywords are present and if so, I have to read the text that stands by these keywords and store in a DataTable.

Now my problem is that I would have to program all pages individually with data scraping to get a clean picture, is it possible that the selectors pull themselves the respective data points?

Hi @Beere_Plays

Can you share that web site or how it looks?How much text you need to read after that or before that?

If its tablular then we can get whole data and then use filter tables

if its plain text then need to look how we can get the data using regex

cheers

Hello!

The data must be filtered out based on keywords and then analyzed, which means the pages all look different and I can’t program each one individually via the web scraping activity, as there are well over 3000.

So I need a way to dynamically filter out data using keywords.

Hi @Beere_Plays

Can you show atleast one sample on how the data is atleast…may be from there we can suggest something. now everything is in air and nothing that we can give with this.

We should see atleast a pattern or a sample of how one of it looks.

We can use get text to identify if data is there or not…and may be we can give whole chrome window as selector to identify the value on any website opened in chrome

cheers

Yes here you go:


Hi @Beere_Plays

So you want to search for a keyword and if it is present get whole text beside it?

cheers

But it also could look like this:


Yes! Thats what i need, but i also have to Check the Date, because the Data cant be older than a week

Hi @Beere_Plays

So as all data is random and all the websites are random and you don’t want to do 3000 obviously.

One thing you can try is create a process where you will get all the children of the body element using find children activity

then loop each child and get the text of it and see if the text you needed is present in it or not… if presetn you ahve your text

but yes getting the date again might not be straight …unless you identify some pattern like the date would be with a specific tag on all web sites which is not very realistic but atleast if you can find a pattern from the children you get a good selector find its desendants and search for the date

Date part is really tough I believe but text can be worked out like this

cheers

i Will try this, but ive already tried so much.

I even cant find anything in the Word Wide Web that matches my conditions

But Thank you alot for your help!

1 Like