Proper data not getting captured under data scrapping

samya · October 23, 2017, 8:17am

In linkedin under jobs section, to perform a job search I had put location as “united states” and when i got a list of jobs as a result, I performed data scrapping on it and tried to retrieve data in a csv sheet. However I noticed that if i tried to scrap 50 results I do get them but the data is getting picked up randomly from the linkedin result page. Can anyone please check this workflow and help me out

test2.xaml (9.9 KB)

ddpadil · October 23, 2017, 10:56am

Hi,
Try with IE.

samya · October 23, 2017, 11:49am

@ddpadil In ie also i am facing the same issue

ddpadil · October 24, 2017, 6:28am

Hi,
When you say “picked randomly” means are they in different order or data scraping not failed to pick the data available on the scree?

samya · October 24, 2017, 6:52am

@ddpadil

Each page has 25 items, so if i am to scrap 50 data, it should ideally go till 2nd page but it goes through all the pages (eg if there are 70 pages it will go through 70 pages) and among them it will pick any data from any page but will have a total count of 50

ddpadil · October 24, 2017, 7:33am

o got it.
Tried the scenario with LinkedIn. Your right it’s not even taking 100 result though it set in property .
It just go to second page and stops but result i get is only 10.(It suppose to be 20)
Which version are you using?
mine :2016.2.6274

samya · October 24, 2017, 8:29am

@ddpadil
I am using 2017.1.6435

ddpadil · October 24, 2017, 8:45am

oh new one. I thought of telling you to use new version
@ovi need improvement on data scraping activity I guess !

qwerty123 · October 24, 2017, 8:53am

Hii
Can you try increasing the delay value in DelayBetweenPages property of Extract Structured Data activity?

florin.stan · October 24, 2017, 9:05am

Hello,

I’m not sure what the problem is exactly but here’s a working example that might be helpful: Main.xaml (14.1 KB)

You can also try to check if during scraping the next page button is selected properly. You have to select the surrounding button, not the arrow because they have different selectors.

samya · October 24, 2017, 9:25am

@florin.stan

I don’t want to put the url in the open browser activity. I wish to perform data scrapping only however i also tried this way but still not getting the desired result

I have checked next page button in all the possible ways that could be there.

samya · October 24, 2017, 9:26am

@qwerty123

Did that but still didn’t got desired result

florin.stan · October 24, 2017, 9:30am

You don’t have to use open browser if the page is already open, so you can remove the open browser from the example. The example that I provided doesn’t give the desired result?

samya · October 24, 2017, 11:47am

No it is not giving the desired result

florin.stan · October 24, 2017, 11:51am

Can you please help me understand what is the desired result? I thought it was to extract the first 50 jobs (meaning the first 2 pages) into a csv file.

samya · October 24, 2017, 11:54am

Yes the requirement is this only, i am getting total of 50 results but these results are getting picked up randomly from linkedin page. It is going through the all the pages and among them all it is picking 50 records

florin.stan · October 24, 2017, 12:01pm

Hmm, see that’s why I’m confused because my example gets only the first 50 jobs from the first 2 pages and then the data scraping stops on page 2. In the resulting csv file are the first 50 jobs in the order they are on the site. Are you sure you are looking at the most up-to-date data and that the jobs are sorted by “Relevance”? I don’t know what else could be the issue.

samya · October 24, 2017, 3:59pm

@florin.stan

Would like to ask one thing, in your sheet that you have uploaded, you have created “output data table” and stored its o/p in “str” however this o/p variable is not being used in the workflow, can you please tell the purpose of using it.

Now for the issue let me explain it in an elaborate manner

Each Linkedin job search page has 25 entries and suppose there are approximately 10 pages through which the data is spanning. Now lets suppose i need to scrap 50 results through data scrapping .

Now on running the process that you have provided and that i have created in both the cases the issue is that it is data scrapping is not ending on the 2nd page it goes through all the 10 pages (i.e 250 records ) and then it wil pick 3-4 records from first page, 5-6 records from second page, 2-3 records from third page… etc and like this it will collect total of 50 records and write these 50 records in csv sheet and now if you match the sheet records with the linkedin page. Looking at the beginning records it seems that the records are getting captured properly but as you go down the list you will see that the records are not matching

And yes i have the updated data and is sorted by releavance, but i don’t think this should matter because data scripting should capture what ever record it is pointed at

Hope i was able to clearify it now

florin.stan · October 25, 2017, 5:50am

I used the Output Data Table activity together with a Write Line activity to print the result of the data scraping to the console before writing to csv, just to make sure the data scraping works as expected. I then deleted the Write Line activity as it was not needed but I forgot to delete the Output Data Table activity. It can be safely deleted because it doesn’t affect the flow.

As for the problem, I am out of ideas. I know it’s working for me but I can’t explain why it’s not working for you and I can’t reproduce the behavior you get. Maybe someone else could help and I hope you figure it out.

qwerty123 · October 25, 2017, 7:53am

Hii
Can you pls try giving the second or third row while choosing the second set of data while Data scraping.
eg. choose the first row as First set and second/third row as the second set of data.

Topic		Replies	Views
How to scrapp data in multiple pages Studio uiautomation	13	1532	April 21, 2022
Data Scraping multiple page issue Help selector , uiautomation , activities , data_scraping , question	5	977	March 23, 2020
Data scraping only works for page one Help	16	9823	February 18, 2019
Data Scraping on Web never ends Help	11	3261	January 12, 2019
Scrape data from a website without "Next" Button -does not save all data Academy Feedback	12	3640	July 21, 2019

Proper data not getting captured under data scrapping

Related topics