Web Scraping loops several times on last page

datascraping
web

#1

Hi to everybody
I’m trying to web scrape an intranet web page. The robot scrapes all the pages and when arrives at the last one starts a finite kind of loop. It scrapes for 21 times the last page records.
The MaxNumberOfResult is set to 0 in ExtractData activity.
The NextLinkSelector looks like this:

"<webctrl aaname='&gt;&gt; Next' parentid='elencoContattiIds' tag='A' />"

Do someone have any idea about how to stop the scraping after the last page was reached?
Thanks


#2

Hi,

The problem that you have sounds strange, but you can do a workaround for this problem.
If you have the information on the bottom of the page with how many pages do you have, then use the the text activity to get the maximum number of page that you have to iterate. When you have this information you can create a loop to iterate to each page (click on next button) where you will make a data scraping for each page and you will merge each new DT, with the old one.

Thanks,
Ninett


#3

Hi Ninett
Thanks for the suggested workaround, but I find another one. When the scraping is done after the finite number of times I remove the duplicate rows from the DT. Works fine but I stil have a loose of time during scraping operation.
Still searching for a solution
Regards
Mugur