Table extraction with UIAutomation is now available in Studio Web

Hello, awesome community!

We are launching the table extraction/web scraping via UI automation experience in Studio Web. Please take a moment to use it, enjoy it and give us feedback.

You can now create browser automation that extracts data from web pages and stores it for further processing.

The new wizard can extract standard HTML tables and structured data, along with related information like href or src attributes. Snippets of the data can be previewed and sorted by defined columns. Also, the scraping data from multiple pages option is available, to allow you to scrape data spanning multiple plag.

Basic steps you need to take to create an automation that scrapes data

  • Login to Studio Web and create a new project

  • Using your local browser, add a new tab and open the web application where the target data is rendered

  • Add a UseBrowser activity, and select the target tab/application

  • Within UseBrowser activity scope, add an ExtractTableData activity

  • Start the Table Extraction wizard by clicking the ‘Indicate target on screen’ button. Follow the instructions within the wizard to configure the columns containing the data.

  • If data spans multiple pages, configure the ‘Next button link

    image

Below are two videos demonstrating how the scraping wizard works:

How to scrape regular HTML tables

How to scrape data with UiPath Studio Web - standard HTML table - YouTube

How to scrape structured data

How to scrape data with UiPath Studio Web - structured data - YouTube

Enjoy and don’t forget to give us feedback!

Best regards,

Gheorghe

13 Likes

Hi @gheorghestan !

I will add another comment tonight to give feedback on its use with a concrete use case, but here is the first feedback on the launch of extract table:

It would be great if all the extracting tools (for studio web but also studio) would use and be compatible with already installed extensions :grin:
Example: I have the extension 22.4 thanks to UiPath Studio Enterprise, and Studio web wants me to use 22.5. Is it because of technical limits and thus studio web absolutely needs to work with 22.5 ? I am dreaming of a button that says “hey we detect that there is another extension (adds-on) available, and it is compatible, what do you want: install 22.5 or use 22.4 ?”
Because of GPO I am not able to install adds-on, so I have to wait to try on my personal laptop :smile:
I was waiting for this feature so I am in a hurry to see its first results !!

2 Likes

Okay here I’m back !

My use case steps by steps:

  • go on YouTube
  • get all the names, times and miniature urls of my recommended videos

:ok_hand: What I liked
a) Overall intuitive. On purpose I didn’t read your indications to see if I am able to catch up quickly, and yes we understand quickly what button does what.

b) The preview is really nice, as a classic experience user it refreshens

c) Indications on number of occurrences is useful

d) The fact that it detects the column has already been extracted and it suggests you another option and it places it strategically

:thinking: What I think is lacking
a) Not all the buttons are intuitive. When I hover on buttons, I am expecting to see a quick title to understand the purpose. For the trash I understand, but for the highlighted in red not really:


Now I know that it suggests to add the URL column by default which is very nice in one click

b) When I wanted to click on the button mentionned at (a) to see what it does, I wanted to cancel but I could not.
I had to click on extract url, confirm the column name to finally delete the column. It would be great to have a cancel button or when we click back on the grey line it understands that I changed my mind and I don’t need anymore the additionnal URL column.

c) I am not able to change the place of the columns

d) when running step by step, I am unable to get the content of the datatable as an immediate in the local would have helped (it’s a bit out of scope, but still linked). So I am obliged to use a write spreadsheet to see the result of my extracted datatable.

e) In my example, for an unknown reason extract datatable returns an empty datatable. I don’t know why, and I can’t troubleshoot because no error is thrown.

f) If I change the target of my “extract datatable”, and “use browser” target is not changed, no error is thrown nor comments before compiling to let me know that there are different targets

Hope it helps !

2 Likes