Hi,
I have a requirement to web scrape the data from multiple URLs(saved in an excel).
I am extracting the data using DataWebscraping feature in UiPath Studio.
I have URLs saved in an excel having > 2000 rows. Each row is iterated to extract some data in the webpage and write (append) to an excel sheet.
Structure of the data extracted is same for every URL. So when the data is extracted, I don’t have unique identifier to know from which URL the extracted data belongs to.
Ex:
Read Excel sheet:
TopicName, TopicURL
x,URL1
y,URL2
z,URL3
d,URL4
I am iterating the above table in the UiPath, for each row(URL1,URL2 and so on), I will open a new browser, extract data and write it to an excel sheet.
Write Excel sheet::
TopicDetail,TopicID,TopicAmount,TopicDuration
A,B,C,D
L,M,N,O
A,B,C,D
L,M,N,O
A,B,C,D
L,M,N,O
A,B,C,D
L,M,N,O
if you see the Write excel sheet, I do not know which row is belonging to which URL.
I want to leverage the URL variable used in the looping to be added in the ExtractDataTable as an additional column.
Something like this:
URL,TopicDetail,TopicID,TopicAmount,TopicDuration
URL1,A,B,C,D
URL1,L,M,N,O
URL1,A,B,C,D
URL2,L,M,N,O
URL2,A,B,C,D
URL2,L,M,N,O
URL2,A,B,C,D
URL3,L,M,N,O
URL4,A,B,C,D
URL4,L,M,N,O
URL4,A,B,C,D
No. of rows extracted for each URL may differ, hence adding the URL column to the DataTable will be useful to join.
How can I achieve this?