UL Tag Data Scraping for Social Site

datatable
activities
studio
datascraping

#1

Hi All,

I am having a bit of difficulty at the moment with the data scraping aspect.

I manage to datascrape successfully the name and URL (have blanked out the info for safe keeping)

I get to save it onto a csv and it works fine, as long as the page hasnt reloaded again. I understand that I can use the selectors and use wildcards i.e. parentid=“ember*” however this does not seem to work for the data scraper for some reason. I cannot indicate the element on screen as its an unstructured list and as far as I can see the metadata does not seem to change.

Its actually a linkedin page query I search for a company; I data scrape the name and URL (of employees). It works as long as the page is not reloaded or I do not work on it the next day but its just not there and I have ran out of ideas. Any helps with this would be hugely appreciated.

image

image

Thanks


#2

Hi

Are you getting blank or previous datatable after reloading page?
As you stated metadata is same so it should work fine, few suggestions are

  1. Make sure page is loaded completely when flow reaches data scraping activity.
  2. Initialize datatable as New Datatable before looping data scraping activity.

I have also tried linkedin automation, it’s a difficult dynamic website to automate.
Thanks


#3

@Bharat

Thanks for the advice.

Yes, the datatable appears to be blank once the page has reloaded (this is once datatable writes the range onto a csv to check).

Is this not something I am already doing as I have set the “WaitForReady” to be complete?

I will try this out but I think when I do the data scraping it generates it into a datatable by itself. Don’t know if this will resolve the issue but I can certainly try it.

It definitely is being a bit cumbersome - but I am adamant about it.

Thanks again for your help - if you have any other advice please let me know as well.


#4

Test your data scraping without loop, just scrape and execute with already loaded page.
Reload the page manually and then execute again.
Try find Element activity and which would find element which gets loaded with the table, put this activity before data scraping.


#5

I will try this out tonight and report back on it.

Hopefully its not too cumbersome. Let me have a go at it.

Thanks again @Bharat


#6

@Bharat unfortunately this is still an issue.

I am not sure what the problem is. I also realised when the page is reloaded

<webctrl parentid='ember1702' tag='UL' />' />

The number after ember changes so I did use a wildcard but this just does not work as the data is then not written onto the excel/csv file. I verified this by reloading and datascraping again. Tried it once it worked, afterwards replaced the number for a ‘*’. Ran this again and it would not work, reverted back to the number it worked. The problem is when refreshed that number definitely changes.

I am so confused about this. I do not want to scrape the data constantly everytime. Its frustrating…

Any further ideas/thoughts would help. I have attached the data scraper…not sure how useful this will be.

Thanks

Maintest.xaml (14.0 KB)


#7

Hello @xkarrox, does the page reload always happen?
And does it only appear once?

If it is so, is it possible to force reload then scraping it?


Data Scraping (Web) Spanning pages limit
#8

@whyyouandi @Bharat
I may have not explained myself correctly maybe… the robot does not actual loop at the moment. The page reload/refresh might happen if I am using a new session or using the bot in a new computer or actually manually refresh the page… so the page does not reload (or does not happen). I am only doing it to test to make sure if I use a brand new session or refresh the page or use the bot on another computer.

Does that make sense?


#9

Yes you are doing it right, you need to test data scraping with different page data.


#10

@Bharat not sure what you mean by that…I have tried data scraping and it is scraping the right thing as it does output the right info within excel (everything that I need is there). I am only hoping to automate it so when the page is refreshed or used within a new session it can still work autonomously. If you check out the xaml file I uploaded maybe you might get what I am trying to do?

Thanks


#11

https://www.linkedin.com/search/results/index/?keywords=abellio&origin=GLOBAL_SEARCH_HEADER

In case you want to see the exact URL I am trying to scrape data from. I basically get the name, job title and URL (via the name of employees). At the moment I do not ask it to span across further pages as I also need to set a limit of how many pages to go through…which is the next part. i will try and figure this bit out to see how to do it

I am also using the data scraping wizard, therefore I cannot dictate what selectors will be used.

Many thanks for your help in advance.


#12

I think I see the issue…
It is the metadata…maybe…

The ones which are highlighted thats the change…is there a way to use wildcards within the extractmetadata? Is it the same way we do in the selector.

Thanks