Need help with data scraping from sub links

Hi!

I am new to this topic, but enoying trying new things! I have the following problem:

I need some data from job websites.
After entering a search term, all job titles from the main page and their corresponding work locations are to be extracted and displayed in an Excel table. I have managed this so far.

Where I am not getting anywhere: I would also like to see the full job advertisement and have the corresponding skills for the respective jobs there. To do this, I would have to click on the job title and another page will open, on which data will be extracted again. But I can’t do that, because this would have to happen for all the job titles.

Example website:
[https://www.enercon.de/en/job-portal/job-portal/?no_cache=1&tx_ccjobs_jobplugin%5Bid%5D=&tx_ccjobs_jobplugin%5Bsearchvalue%5D=DE108605&tx_ccjobs_jobplugin%5Baction%5D=list&tx_ccjobs_jobplugin%5Bcontroller%5D=Jobs&cHash=562f4e2f01a3a93dd28d4efce77135b7](Job portal)

Can anyone help me or recommend a tutorial?

Thank you very much!

Hi @tabbyd ,

Most sites when showing item details (in your case, job details) have URL consisting of item related details.

Let’s have a look at below url from your example which opens job details page.

https://www.enercon.de/en/karriere-portal/stellenangebote/stellenangebote-detail/?sid=DE115509

You already have Data table with job title, work locations etc.

In this case you can store Job ID too in your Data Table. Then loop over your Data table, navigate to
https://www.enercon.de/en/karriere-portal/stellenangebote/stellenangebote-detail/?sid={{ID}}, & scrape required info.

Note, some websites can have Job ID or Job Title (Slug) in their Job Details URL.

Hope this is helpful. :slightly_smiling_face:

1 Like

Thank you for helpful answer!

Unfortunately, I’m a complete newbie in this area - can you perhaps briefly explain how I manage to loop over the data, navigate to the URL?
I can get the job ID into the data table by myself!

Thank you once again!

For Each Row Activity starts

  1. Open/Attach Browser (Whichever you prefer)

  2. Supply URL = "https://www.enercon.de/en/karriere-portal/stellenangebote/stellenangebote-detail/?sid=" + row("ID").ToString

  3. Scrape Required details using Get Text/OCR (Whichever fits your requirement)

  4. Close Tab

End of For Each Row

@tabbyd ,
If you’re still unclear I can create a sample workflow .

Hi, I’ve attached a sample workflow. Modify it as per your requirements. :smiley:

sample.xaml (18.1 KB)

Hi!

I tried it with your explanation on my own and I think I´m almost done!

But it is still not completely working, because now it is saying during the activity open browser row(“ID”) is not declared. Where and how should I declare the row?

Thank you so much!

Hi @ting801215 ,

If you are storing Job ID in column named ID, then use row(“ID”).
Likewise, if you are storing in column named Reference code, then use row(“Reference code”).

Basically, It’s just syntax.

Check my workflow.

Hi, I’m accessing each row with row & have stored Job ID in Reference code column so I passed row("Reference code").ToString while creating Job Details Page URL.

Reference: