I’m currently sitting on a web scraping project and I’m getting desperate.
My problem: I have a long list of websites, from which I have to check with the help of keywords whether these keywords are present and if so, I have to read the text that stands by these keywords and store in a DataTable.
Now my problem is that I would have to program all pages individually with data scraping to get a clean picture, is it possible that the selectors pull themselves the respective data points?
The data must be filtered out based on keywords and then analyzed, which means the pages all look different and I can’t program each one individually via the web scraping activity, as there are well over 3000.
So I need a way to dynamically filter out data using keywords.
So as all data is random and all the websites are random and you don’t want to do 3000 obviously.
One thing you can try is create a process where you will get all the children of the body element using find children activity
then loop each child and get the text of it and see if the text you needed is present in it or not… if presetn you ahve your text
but yes getting the date again might not be straight …unless you identify some pattern like the date would be with a specific tag on all web sites which is not very realistic but atleast if you can find a pattern from the children you get a good selector find its desendants and search for the date
Date part is really tough I believe but text can be worked out like this