I have a pretty simple task I want to automate. The bot has a defined list of search terms. With this it should enter a online news search-machine and scrape - for each search term - the top three appearing news (Headline, publication platform, publication date, URL…). The results are to be entered into a second excel file.
All in all the automation flow is working. However, the data the bot scrapes do not cover all search-terms but only the first. So, the bot does perform the search for all (16) search terms. But the scraping result is always (for each of the 16 searches) the result for the first term only.
What could I have done wrong?
Check that if you have limited the results by 1 from the properties of Extract structured data
Hope this helps you
Thanks for the idea @ksrinu070184 but that is not the issue. I have set the number of results to be scraped to 3. The bot does scrape the first three data entries of what it has found after the search. But it just repeats writing these three entries into the results file when itering through the “for each row” section. If it were correct, it should provide 16x3 different results (three results for in total 16 searches).
give a try on setting the scope of the extractdata datatable variable to a higher scope
May we also to give us some more details to the part where the merge datatable is done (var names, implementation …) thanks
I am attaching the json of that automation to this message. Does it provide you with the information you are looking for?
Thanks Matthiasproject.json (1.0 KB)
the project.json will not give us the needed details to your flow and requirements.
As mentioned from above, share with us details on:
- source (url, screenshot)
- expected result / output
The data source is a public news page for Korean news (news.naver.com). The automation searches this page for a list of defined search terms (e.g. 지멘스 헬시니어스). It should scrape information from the first three news that appear on the respective search term (Headline, publication date, publication platform, URL of the platform) and write the result into an Excel list.
As said: all in all it works. The only error is that it does not write the results for the 16 defined search-terms into the excel but repeats the results of only the first search term 16 times.
perfect, we do progress for the analysis.
- check the scope of the extractdata datable var
- critical check the merge datatable part
do a debug and check if also the different results are retrieved
Being a beginner I have put the scope of every used variable for the whole flowchart. So all variables (incl. the variable for the extractdata datatable) span the whole process.
I did a debug, stopped the flow after every step (“step over”) and checked the values in the locals tab. The error already occurs at the datatable step. So the datatable of the extractdata part does always output the same data for each search term. It is not an error of writing the data from the datatable to the excel. Already the datatable is wrong.
Please find below a step-by-step description of how the automation is designed:
- Build DatTable (Var. “NaverSearch”) which contains the elements that should in the end be scraped by the bot
- within the Excel application scope: “read range” of an existing excel file with the different search terms to be searched for (resulting in the data table Var. “SearchTerms”)
- open browser and navigate to “news.naver.com”
- for each row in Var. “SearchTerms” type the search term into the search field of news.naver.com
- extract structured data (URL, Headline, PublicationPage, PublicationDate) in the tab that is opening when the search is being performed. The result of the data scrape is a new datatable (Var. DataExtraction)
- Merge datatable DataExtraction into the datatable NaverSearch
- write range NaverSearch into an Excel file
- close the browser