Hi I have a website Latest News Today: Latest News Headlines, Breaking News, Current News | Mint
and want to scrape the latest news section
I was able to get the URLs of the articles, title and timestamp from the page above but I have a requirement where I have to get the content of each article which we can get by clicking on the article title.
what activities should I use to fulfil the above requirement?
we would suggest:
1st step: Retreitrieve the article info and URLS - datascraping -dtInfo
Loop over dtInfo
Use Url to open the detail pages - Navigate To
Extract the details: get text, data scraping…
will I be able to map the content from the URL to the resultant table if I loop over as you mentioned(refer to the image below, I need to return it in this way)?
You can try Table Extraction. It will help to extract pattern-based data also.
Thanks for your reply,
But the example which was discussed in the video you provided remains on the same page and won’t go into a particular product and get the specs or more details(like seller info) for you right?
And the work which I have done till now, like getting the URL, timestamp and title is done by pattern-based extraction method only.
You will have to modify the logic to navigate to other pages. If it similar extraction, it will ask for the navigation button.
@a.k please follow these steps.
1- Get all info and URLs In DT1
2-Loop over URLs
3-Get data from each Url, if you are getting data in DataTable then convert datatable to row data and merge with Current row of DT1
4-Repeat step 3 for all url
@a.k yes you can try and you can do. Maybe you get best solution. Happy Learning
Check out the XAML file
DataScrappingLive.xaml (19.3 KB)
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.