Data Scraping returning

datatable
scraping
studio
datascraping

#1

Hi everyone,

I have an issue while using the Data Scraping tool to extract datatables from a website.
I have several datatables to extract from the web, but even with the smallest one (190 entries overall), the ExtractData method returns a null Datatable (NullReferenceException) only when I try to get the whole table (parsing only a limited number of entries, e.g. 150, works perfectly).

Does anyone know this issue and could help me out?

Thanks in advance!


#2

Hi,

Please try with “ContinueOnError” property of “ExtractData” as “True” once.


#3

Hi umesh_desh,

thanks for the recommendation. However, it is already enabled as “True”.
There seems to be something happening after processing the last element of the table, as UiPath takes a long time to get out of the activity after this.


#4

Hello,

Can you share the link to the web page and also maybe your workflow? This way we can find the potential issue more quickly.


#5

Please check “MaxNumberofResults” property of “ExtractData”.


#6

Also make sure “Output” property of "ExtractData” is “ExtractDataTable”, please do not specify any variablename other than this. Try once.


#7

Thank you for your replies!

Unfortunately, I cannot share any link to the website as it is sensitive information.
I already tried all of your suggestions, put the MaxNumberofResults option to 0 but the datatable is empty at the end of the process.
I am not sure I understand the sentence "Also make sure “Output” property of “ExtractData” is “ExtractDataTable” though, do you mean to make sure that the variable “DataTable” of type “InOutArgument” of the output property is retrieved?

Please find here a sample of my code, I removed all the sensitive information so it shouldn’t work but the ExtractData activity is the one that I used. When I hard-code the field MaxNumberofResults to the right size of the Table, everything works fine but as soon as I put 0 (for dynamic tables), it doesn’t seem to work anymore.

ExtractDataTable.xaml (8.0 KB)

Maybe the issue is due to the way the table is coded on the website?

Thanks!


#9

Hi,

I checked the xaml file you have attached. Please replace “Gebouwen” with “ExtractDataTable” and try once.

Regards,

Umesh


#10

Hi,

I have tried with the variable name ExtractDataTable and as expected, it doesn’t change anything, any other clue?


#11

The only suggestion that comes to mind is to check the selector of the Extract Structured Data activity and modify it by adding or removing attributes and using wildcards. You can also try to make sure that when doing Data Scraping you select the first and last elements in the table (not the first and second). It would also be worth setting the MaxNumberOfResults property to a comfortably large number like 10000. I can only guess unfortunately.


#12

Hi,

Thank you for your suggestions. I actually tried to manipulate the selectors but it didn’t work.
My Data Scraping was good as well, I tested it several times. With regard to the MaxNumberOfResults property, I did try to put it at a large number but then it fails when encountering the end of the datatable on the website.

I found a way to circumvent the issue, not resolving it directly however and it is really annoying. Basically I do not use 0 as value for the MaxNumberofResults option.
There is a location on the webpage where the total number of entries is displayed, the selector associated with the element is not that good (it fails half the time to retrieve the value) but I found a way to get this number anyway by putting it inside a Try-catch and looping until it finds it.

Then I use this dynamic number as the MaxNumberOfResults and it works.

If any of you may have a clue about what is the inherent issue related to scraping my table - why does it fails when trying to read after the last element in the table, I would be glad to read it!

Thank you all again for your help!


#13

Hi @micksme, Try adding “New System.Data.DataTable” to the default value of the ExtractDataTable variable. I ran into similar issues before and this seems to fix it. Thanks!


#14

Ran into the same issue, scraping only worked when the exact number of items was specified on MaxNumberOfResults attribute. If I create the data table upfront (I used build data table) it works also without specifying a MaxNumberOfResults (or 0) - so thanks a lot @ehclarke! UIPath team: I consider this a bug (!)


Data Scrapping is not working properly while working with yellow pages sites
#15

Thank you both @ehclarke and @Marcel for your input, it seems to work for me as well!
I also believe it is a bug and should be fixed, thank you!


#17

This issue is still exists in studio 2018.2.2. Can we have the fix in upcoming updates


#18

Hi, I had the same issue and it seems to me that the reason is because the “NextLinkSelector” is not available in the end. For example we have 3pages of results, but you can select “Next Page” only on 1st and 2nd Page. On the last one usually there is no such option. So the selector does not work and it can mess with the scraping action.
The bug is fixed for me with use of limit the amount of results (<all) or set default value for output table variable to “New System.Data.DataTable”.