I’m currently sitting on a web scraping project and I’m getting desperate.
My problem: I have a long list of websites, from which I have to check with the help of keywords whether these keywords are present and if so, I have to read the text that stands by these keywords and store in a DataTable.
Now my problem is that I would have to program all pages individually with data scraping to get a clean picture, is it possible that the selectors pull themselves the respective data points?
The data must be filtered out based on keywords and then analyzed, which means the pages all look different and I can’t program each one individually via the web scraping activity, as there are well over 3000.
So I need a way to dynamically filter out data using keywords.
Can you show atleast one sample on how the data is atleast…may be from there we can suggest something. now everything is in air and nothing that we can give with this.
We should see atleast a pattern or a sample of how one of it looks.
We can use get text to identify if data is there or not…and may be we can give whole chrome window as selector to identify the value on any website opened in chrome
So as all data is random and all the websites are random and you don’t want to do 3000 obviously.
One thing you can try is create a process where you will get all the children of the body element using find children activity
then loop each child and get the text of it and see if the text you needed is present in it or not… if presetn you ahve your text
but yes getting the date again might not be straight …unless you identify some pattern like the date would be with a specific tag on all web sites which is not very realistic but atleast if you can find a pattern from the children you get a good selector find its desendants and search for the date
Date part is really tough I believe but text can be worked out like this