UL Tag Data Scraping for Social Site

xkarrox · August 19, 2018, 12:17pm

Hi All,

I am having a bit of difficulty at the moment with the data scraping aspect.

I manage to datascrape successfully the name and URL (have blanked out the info for safe keeping)

I get to save it onto a csv and it works fine, as long as the page hasnt reloaded again. I understand that I can use the selectors and use wildcards i.e. parentid=“ember*” however this does not seem to work for the data scraper for some reason. I cannot indicate the element on screen as its an unstructured list and as far as I can see the metadata does not seem to change.

Its actually a linkedin page query I search for a company; I data scrape the name and URL (of employees). It works as long as the page is not reloaded or I do not work on it the next day but its just not there and I have ran out of ideas. Any helps with this would be hugely appreciated.

Thanks

Bharat · August 20, 2018, 5:30am

Hi

Are you getting blank or previous datatable after reloading page?
As you stated metadata is same so it should work fine, few suggestions are

Make sure page is loaded completely when flow reaches data scraping activity.
Initialize datatable as New Datatable before looping data scraping activity.

I have also tried linkedin automation, it’s a difficult dynamic website to automate.
Thanks

xkarrox · August 20, 2018, 9:01am

@Bharat

Thanks for the advice.

Yes, the datatable appears to be blank once the page has reloaded (this is once datatable writes the range onto a csv to check).

Is this not something I am already doing as I have set the “WaitForReady” to be complete?

I will try this out but I think when I do the data scraping it generates it into a datatable by itself. Don’t know if this will resolve the issue but I can certainly try it.

It definitely is being a bit cumbersome - but I am adamant about it.

Thanks again for your help - if you have any other advice please let me know as well.

Bharat · August 20, 2018, 11:03am

Test your data scraping without loop, just scrape and execute with already loaded page.
Reload the page manually and then execute again.
Try find Element activity and which would find element which gets loaded with the table, put this activity before data scraping.

xkarrox · August 20, 2018, 3:25pm

I will try this out tonight and report back on it.

Hopefully its not too cumbersome. Let me have a go at it.

Thanks again @Bharat

xkarrox · August 20, 2018, 10:34pm

@Bharat unfortunately this is still an issue.

I am not sure what the problem is. I also realised when the page is reloaded

<webctrl parentid='ember1702' tag='UL' />' />

The number after ember changes so I did use a wildcard but this just does not work as the data is then not written onto the excel/csv file. I verified this by reloading and datascraping again. Tried it once it worked, afterwards replaced the number for a ‘*’. Ran this again and it would not work, reverted back to the number it worked. The problem is when refreshed that number definitely changes.

I am so confused about this. I do not want to scrape the data constantly everytime. Its frustrating…

Any further ideas/thoughts would help. I have attached the data scraper…not sure how useful this will be.

Thanks

Maintest.xaml (14.0 KB)

whyyouandi · August 21, 2018, 1:55am

Hello @xkarrox, does the page reload always happen?
And does it only appear once?

If it is so, is it possible to force reload then scraping it?

xkarrox · August 21, 2018, 8:16am

@whyyouandi @Bharat
I may have not explained myself correctly maybe… the robot does not actual loop at the moment. The page reload/refresh might happen if I am using a new session or using the bot in a new computer or actually manually refresh the page… so the page does not reload (or does not happen). I am only doing it to test to make sure if I use a brand new session or refresh the page or use the bot on another computer.

Does that make sense?

Bharat · August 21, 2018, 8:44am

Yes you are doing it right, you need to test data scraping with different page data.

xkarrox · August 21, 2018, 9:03am

@Bharat not sure what you mean by that…I have tried data scraping and it is scraping the right thing as it does output the right info within excel (everything that I need is there). I am only hoping to automate it so when the page is refreshed or used within a new session it can still work autonomously. If you check out the xaml file I uploaded maybe you might get what I am trying to do?

Thanks

xkarrox · August 21, 2018, 10:10am

In case you want to see the exact URL I am trying to scrape data from. I basically get the name, job title and URL (via the name of employees). At the moment I do not ask it to span across further pages as I also need to set a limit of how many pages to go through…which is the next part. i will try and figure this bit out to see how to do it

I am also using the data scraping wizard, therefore I cannot dictate what selectors will be used.

Many thanks for your help in advance.

xkarrox · August 21, 2018, 7:14pm

I think I see the issue…
It is the metadata…maybe…

The ones which are highlighted thats the change…is there a way to use wildcards within the extractmetadata? Is it the same way we do in the selector.

Thanks

Myshudhage · August 20, 2019, 6:12am

@Bharat @xkarrox @whyyouandi what have u done i m not able to get that data out of the webite (linkedin) as i need info of how many employee’s are connected to the company page

can you guide me how to complete this process as i have tried DATA Scraping and Screen Scraping

Topic		Replies	Views
Scraping Social Media data Help datatable , csv , robot , studio , data_scraping	14	5702	August 21, 2019
Web Scraping not returning any data Help studio , data_scraping	35	20157	October 11, 2022
Blank datatable output after data scrapping Help	3	1666	May 13, 2019
Data Scraping with for each loop for multiple links showing UI attribute exception error Help	11	2757	August 6, 2019
Linkedin datascrapping Help studio	3	654	November 6, 2020

UL Tag Data Scraping for Social Site

Related topics