Data Scraping on Web never ends


#1

Hi All,

I have been trying to fetch some data from web page here
But it works fine for first 24 records. But if I specify “Next” its going through all the pages and never end. So its not stopping after it fetch eg: 100 records. Similar topic (here) is also in the forum but can’t get my issue fixed with that solution. Any help would be highly appreciated.
TestWebScrape.xaml (14.0 KB)


#2

Please change the default settings in the option panel of the activity of data scraping.

https://files.readme.io/f7ff964-image_174.png

Please change the value to 0 as suggested in the screenshot.

Regards,
Karthik


#3

Hi Karthik,

I had tried that as well But no luck. I believe its all about the selectors. Would be great if you could just look at my selectors of NextLinkSelector

image

Thanks,
Aby


#4

Okay. Let me look into your workflow and will update you on this.


#5

Hi Karthik,

Just wondering you have had any time to look in to this issue?

Thanks,
Aby


#6

Hi @abyvarghes

You have to tune your Data Scraping activity, as it doesn’t scrap all items. I let it run for a 1000 records and then closed the IE (which results in the scraping finishing and saving to file).

It only scraped 24 records out of the 1000 it saw, which is way below the cap of 100 and the reason it keeps running.


#7

Hi @loginerror,

Thanks for the response,

This is what I have tried after your reply,

\ Recreated the scraping part
\ Put a delay activity on web page loading and inside Data scraping to make sure the web-page completely loads

Still getting the same issue.

Also sent the same Workflow to one of my friend who works on UiPath and he ran it on his system (Licence Edition) and it works perfectly.

So it could be something to do with the Community edition?

Thanks,
Aby


#8

First of all, there is no functional difference between Community Edition and Enterprise Edition. So that is strange.

However, for me it also did not work initially as it should.

What I did to fix it.
I removed first few lines from the ExtractMetadata xml, see below:

Old:


New:

This seems to have fixed the issue. You do still get some duplicates in the results, however (for different colors of the models.

Could you try this attached project and see if it works for you?
ScrapingWbModified.zip (1.4 MB)


#9

@loginerror

Thanks a lot. That fixed the problem.

As I don’t have much knowledge on XML. May I what are those 2 line we removed from each block? What it does to the output / execution?

And do you suggest any learning documentation for XML if that helps?

Thanks,
Aby


#10

The documentation knowledge surrounding the xml code within the tool is a bit anecdotal and comes from the “forum experience” and playing around with the tool. It definitely requires some more input and I believe our documentation team is aware of it and will document it at one point :slight_smile:

Basically, the xml in that field is a literal “path” to your element on the page. If it happens to be too specific, it will only find the values that match the path 100%. In this case, it was only catching a few records for thousands it was exposed to.

I started removing 1 line at a time and rerunning the project to see if it works. By removing 2 lines I must have removed enough of the “too specific” path to allow it to catch all the needed values.


#11

@loginerror Thanks a lot again.


closed #12

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.