this should be an easy one for you, but since I’m completely new to this programme, I’m completely lost.
I already tried the search and watched some tutorials, but neither helped.
My problem: I’m conducting research on the European Union and wanted to automate the text parsing.
A first site in question is https://www.premier.gov.pl/en/news/news.html and on this site the unique news articles. Of each news article back to September 2014 I want to parse the URL, the date, heading and text.
Using data scraping didn’t work because I only was able to scrape the “first layer/ overview” (what one can see when clicking the above mentioned link.
Using Screen Scraping didn’t work because I could’t figure aut how to “loop” it for other articles than one specifically chosen.
I’d be so grateful and glad if someone could help me. I guess it is one of the most basic tasks to do, but I really tried for some hours and well, didn’t proceed.
Thank you in advance and sorry for possible spelling mistakes - english is not my first language.
I went to the initial URL and news_page_38 while using “Data Scraping”.
By that I think I managed to extract Date, Heading, URL - as you said.
2./3./4.: What exactly do you mean by"open each news text" and “read news text”?
Do you imply that I can only extract Date, Heading and URL and that I have to extract the news text content manually?
The first command was "Extract data ‘X0 to Xn’ and extract correlated data ‘Y0 to Yn’ and extract correlated data ‘Z0 to Zn’ ".
Isn’t it possible to aditionally tell UiPath "and extract correlated data ‘body text of URLs correlated to X0 to Xn’?
From the first step, you will not be able to retrieve the entire content of the news. You can retrieve that when you click on each news to open it and then have to read the content. To open each news, you can get the URL for the news in data scrapping itself.
Automation will work exactly the same way how a person does. In this case, even a person has to visit each page to read data. Same way is for BOT.