Extract Table from Web Page but some rows could have embedded links with information

Hi there…

I’m trying to extract a table that spans multiple pages from a website. I’m able to use data scraping to extract all data however some rows in the table have links that contain useful information… how can I extract the entire table plus the information inside these links?

I can’t share the table due to privacy reasons but it looks something like this:

image

Basically, the unique ID column has a combination of unique codes and links (the one labelled “Information ()”. so I’m looking to extract the unique codes inside the Information link with the entire table.

What’s the best way to approach this?

Thank you and I look forward to your responses :slight_smile:

Hi

Fine

Get the url or the link first and have it in a datatable we get as output
Then we can get the unique code from that specific column in that datatable

Cheers @ceceliaa34

Hi @ceceliaa34

Extract URL Option from datascraping will work!

Regards

I’m sorry I don’t quite understand. Can you please elaborate?

Hi,

when I try the data scraping activity, Extract url option doesn’t appear after selecting first element on the table :frowning:

Hi @ceceliaa34

Follow the steps below

select the first row from unique Id

image

select the second row from Unique Id

image

Check the extract Url column

image

Regards

Hi

Thanks for your reply :slight_smile:

I tried this, but it only brings back 2 columns. I need all columns in the table to be extracted

Hi @ceceliaa34

Try Extract Correlated data to extract the all other columns!

Regards

Here you go on how to use it

Cheers @ceceliaa34

Hi Pravin

I just tried this but the url doesn’t show in the extracted table. I highly suspect that it’s an issue with the elements on the website. for example, when I click on one of those Information links on the table, it opens like a mini dialog box displaying the unique IDs, so maybe it’s actually not a url item, if that makes sense… I even tried to inspect element, but there’s no information that displays the embedded link…

is there a way for me to do a Click & get full text row by row on this web table - just for the rows that contain “Information” instead? it may be a slower process but I’m willing to explore that option… please let me know what you think… thank you :slight_smile:

Hi @ceceliaa34

You may try with the get attribute activity and getting the attribute which contain the specified link and making the selector dynamic by table row or table column or Idx keep
incrementing!

Regards

@ceceliaa34
configure the correlated columns as described above, configure 1 additional column for the link
like
UNIQUE ID, UNIQUE ID URL, COUNTRY, CODE

Later postedit the extract data XML to following

extract>
	<row exact="1">
		<webctrl tag="tr"/>
	</row>
	<column exact="1" name="UNIQUE ID" attr="text">
		<webctrl tag="tr"/>
		<webctrl tag="td" idx="1"/>
	</column>
	<column exact="1" name="UNIQUE ID URL" attr="href">
		<webctrl tag="tr"/>
		<webctrl tag="td" idx="2"/>
        <webctrl tag="a" />

	</column>
..... other cols
</extract>

as in the firs col id and link is alternating, we can tell the wizard properly what is first, second element. But when indicating inital to the columns and do later the postedit then we should achieve it.

In case of URL is public then please share it with us. Thanks

Hi, thanks for your response.

Not an expert in web languages but I’ve checked the table source code, it looks like those “Information()” values in the table are not actual links. They don’t have any href elements. rather it’s nested in between the <a class if that makes any sense.

image

Clicking on the actual Information() blue links on the table only displays a popup window that has the code, i.e. it doesn’t open another web page, just a small pop up window that displays the code. there is no URL to be extracted hence why I was thinking of scraping each text row by row based on if statement

i.e. if UniqueID Contains “Multiple”, then click on Information(), scrape the codes and exit, and go to the next row
else just scrape every text in that row as is and add to Datatable

I’m just looking for the best way to implement this… any ideas?

Thank you!

Hi Pravin,

Not sure how to implement this… can you please elaborate?

You have done perfect :+1:

Currently i have not got all. Can you elaoborate mor on what you want to achieve within the Business Process goal and how would it be done, when the process is executed manually by a human. Thanks

hi, basically we’re just looking to copy all information in the web table.

When captured manually, the person has to go row by row and do a copy paste from web table to excel, if a particular row contains Information(), the person has to click into it, copy the unique IDs from the pop up window, exit and paste into excel, then proceed to next row.

I hope I made it clear, but please let me know if you want to me to explain better. thank you :slight_smile:

what about following idea

  • datascraping the table (Quick mode / full table)
  • adding an additional datacolumn to the datatable (holding later the code)
  • for each row loop
    • IF Activity - UNIQUE ID Value has (X) in Text X= any Number
      THEN: click and extract the Code, add it to the datatable using added column
      ELSE: do nothing

For clicking the link we use a dynamic selector incorporating the row index that we can retrieve from index output of for each row acitvity

Have a look also here:
Dynamic Selectors

Hi @ceceliaa34 ,

Here i’ve attached the sample workflow for you by creating sample html. Hope this may give some more clarification along with all others replies.

HtmlScrape.zip (154.9 KB)

Hi hi,

Thank you so much for your help. Can I just ask, what version of uiPath are you using?

I got the below after opening the workflow

image

I’m using Studio 2021.4.4

Try only with the xaml file

Main.xaml (15.7 KB)