How to Scrape Secondary Data from a Table of URLs? (From multiple Websites? / Links?)

So. I know how to use the “Table Extraction” Activity in UiPath StudioX, what I now need help with, is scrubbing a single paragraph of data, from opening EVERY link that it previously scraped, and adding this paragraph into the next column, next to the link that was opened to to get that paragraph.

The “Go to URL” activity isn’t helping which means it is either not what I need to use, or I am using it incorrectly and need to be shown how it can be used the way I intend it to be used.

I will check the “How to extract from multiple websites” forum, but please help me if you can. Thank you so much.

Hi @jezaia.vanderwatt

Can you try this-

  1. Create a new column in your table to store the extracted paragraph data.

  2. Add a For Each Row activity to loop through each row in your table.

  3. Inside the For Each Row activity, add a Go to URL activity and set the URL to the value in the URL column for the current row.

  4. Add an activity to extract the desired paragraph of data from the webpage. Depending on the structure of the webpage and the location of the desired paragraph, you may need to use different activities such as “Find Element” or “Get Text”.

  5. Add an activity to write the extracted paragraph data to the new column in your table for the current row.

  6. After the loop is complete, you should have the desired paragraph data in the new column next to the corresponding URL.

I hope this helps you achieve your goal.

@jezaia.vanderwatt

  1. After extraction use add data column activity and add a new column
  2. Use for each row in datatable and loop through extracted table
  3. Use go to url with currentrow("urlcolumn").ToString…and try scrapping data and use assign with currentrow("NewColumnName") = variablefromextraction

For scrapping as websites are different try finding a reliable selector…try using only tags in selector or any othe rwhich might be common…

Does the urls belong to same website or different?

If same then try using fuzzy selector if different then we need to make sure to remove any non static part in selector

Cheers

1 Like

I have already done as you have mentioned.

The thing I keep getting wrong, is figuring out HOW to use the “Go To URL” activity.
With Anil_G’s solution, which almost worked, it types the literal “variablename.Tostring” in the browser as apposed to the Value of the URL variable. I should complete that spreadsheet tutorial… basically, I do not know how to add “Spread sheet column names” in such a way that it knows I’m referring to the column.

Thank you for your help. I’m glad that you are helping me get on the right track.

I think I mainly need help with this part, I am mainly getting other errors now that I messed with the text in the expression editor.


String
RightStringMaybe

@jezaia.vanderwatt

Errors says your datatable used is empty

Cheers

Um. Yes. But what do I need to do with the Expression Editor in the “Go to URL” activity?
I know for a fact that the Spreadsheet with the table of URL’s is there. I don’t think I need to worry about that right now. I’m far more concerned about the use of the “Go to URL” activity. Is there a particular tutorial you would recommend?

@jezaia.vanderwatt

If its inside for each row in datatable and column name is apply url then pass currentrow("Appy URL").ToString

For go to url basically we have to provide a url and it would go to that url it is like oepning a new site in already open window

Cheers

Ah. I figured it out! Thank you for all the help. It got me much closer to an answer.

So. Basically. In the “Go to URL” activity. Switch from (If you are using Excel) Chrome to Excel, and than in the dropdown menu, (assuming the “Go to URL” activity is within a “For Each Excel Row” etc activity) - select the URL column, it will be an option that looks like the screenshot I provided - “[CurrentRow] URL”.

What this does is Use the current row’s cell, in the URL column, as the source of the URL the “Go to URL” activity opens. Don’t worry about adding a “Navigate Browser” activity to close the Tab afterwards (at least not within the “For Each Excel Row” block) - it only uses 1 tab, and switches the link it uses within that 1 tab.

I really hope this helps a lot of people. The tutorials out there, that are about 8 months old even, are just barely too old to be helpful for this particular case.

This is how to use a Table of Links / URLs to scrub extra / secondary data from those Links / URLs.
Have a great day
Jezaia

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.