How to scrape article websites which have a depth 1(click on the title and redirects to article)

Hi I have a website Latest News Today: Latest News Headlines, Breaking News, Current News | Mint

and want to scrape the latest news section

I was able to get the URLs of the articles, title and timestamp from the page above but I have a requirement where I have to get the content of each article which we can get by clicking on the article title.


what activities should I use to fulfil the above requirement?

we would suggest:
1st step: Retreitrieve the article info and URLS - datascraping -dtInfo
2nd step:
Loop over dtInfo
Use Url to open the detail pages - Navigate To
Extract the details: get text, data scraping…

will I be able to map the content from the URL to the resultant table if I loop over as you mentioned(refer to the image below, I need to return it in this way)?

Hello @a.k

You can try Table Extraction. It will help to extract pattern-based data also.

Thanks

Thanks for your reply,

But the example which was discussed in the video you provided remains on the same page and won’t go into a particular product and get the specs or more details(like seller info) for you right?

And the work which I have done till now, like getting the URL, timestamp and title is done by pattern-based extraction method only.

You will have to modify the logic to navigate to other pages. If it similar extraction, it will ask for the navigation button.

Thanks

@a.k please follow these steps.
1- Get all info and URLs In DT1
2-Loop over URLs
3-Get data from each Url, if you are getting data in DataTable then convert datatable to row data and merge with Current row of DT1
4-Repeat step 3 for all url

I will try doing this

thanks

@a.k yes you can try and you can do. Maybe you get best solution. Happy Learning

Hi @a.k

Check out the XAML file

DataScrappingLive.xaml (19.3 KB)

Regards
Gokul

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.