Extract Specific column from a table spanning Multiple pages without Next Button

datatable
activities
datascraping
web

#1

I’m trying to extract URLs from a particular column of a table, and this table spans multiple pages. I’m using Data Scrapping activity. But to move to next page there is no next or arrow button. This is how the page looks:

How can I make my bot traverse all the pages and extract the required data from the column when there is no navigation button?
Note: Page Numbers aren’t limited to 2. They could be more than that.

Thank you.


#2

Hi @RishiVC1,

Use data scraping activity after selecting the table defaultly it will ask the page navigation so then select the page number…it will take the hole table.

Regards,
Arivu :slight_smile:


#3

Hi, I’m trying to extract Data from this website:

https://upload.umin.ac.jp/cgi-open-bin/ctr_e/index.cgi?function=02

Please enter REX for Free Keyword and search for results. The result in the obtained table needs to be extracted and saved. Even after selecting the page number, the bot remains stuck on the first page.

Thanks .


#4

Hi @RishiVC1,

Yes i checked, we are getting the first page result alone in this case because the data is binding from the server side.
So using Find Children activity loop through the page numbers and click the element after that using data scraping get the data table and append it untill the loop end.

Regards,
Arivu :slight_smile:


#5

@arivu96, I’m able to navigate to the required page(s) now using find children, but unfortunately, my bot is now not able to extract data from the required table. I have also tried updating selector of table and extractMetadata values. But the bot is still not able to extract the values. (Not even the first page which it was able to do before modifying my code for page navigation).

I’m attaching the Xml file : Extraction.xaml (17.7 KB)

Please have a look. @ddpadil and @aksh1yadav can you guys also give it a try.
Thanks :slight_smile:


#6

Here we go.
Extraction.xaml (20.1 KB)
Extracted Data.xlsx (46.4 KB)


#7

@ddpadil thanks, can you please tell me where I went wrong? Anything I should keep in mind while solving cases like these. BTW, I needed selective data from the table, column 2 and all the links under detail column.

This is ExtratMetadata code for the Data Scrapping, in my XML:

<extract>
	<row exact='1'>
		<webctrl tag='div' idx='1' text='[Studies searched:*] [*-*]&#10;&#10;&#10;&#10;&#10;&#10;&#10;[Studies searched:*] [*-*]' />
		<webctrl tag='table' idx='3' />
		<webctrl tag='tbody' idx='1' />
		<webctrl tag='tr' />
	</row>
	<column name='Details' attr='text' exact='1' attr2='href' name2='Url'>
		<webctrl tag='div' idx='1' text='[Studies searched:*] [*-*]&#10;&#10;&#10;&#10;&#10;&#10;&#10;[Studies searched:*] [*-*]' />
		<webctrl tag='table' idx='3' />
		<webctrl tag='tbody' idx='1' />
		<webctrl tag='tr' />
		<webctrl tag='td' idx='7' />
		<webctrl tag='a' idx='1' />
		<webctrl tag='font' idx='1' />
	</column>
	<column name='Unique ID' attr='text' exact='1'>
		<webctrl tag='div' idx='1' text='[Studies searched:*] [*-*]&#10;&#10;&#10;&#10;&#10;&#10;&#10;[Studies searched:*] [*-*]' />
		<webctrl tag='table' idx='3' />
		<webctrl tag='tbody' idx='1' />
		<webctrl tag='tr' />
		<webctrl tag='td' idx='2' />
		<webctrl tag='font' idx='1' />
	</column>
</extract>

Thanks :slight_smile:


#8

This question is still unsolved. if anyone here has a solution then please guide me.

Thank you


#9

In cases like these we can design the workflow in the following manner :
Use Screenscraping (Get Attribute) and incrementing the row value each time, by passing the selector dynamically.

  1. Use GET attribute for href for the link (details column)

    in our case the selector for that column were:

  2. Replace the 2 in tableRow=‘2’ with counter . Increment the counter till the end of row.

  3. Append the result to a file.

  4. Once all the rows are exhausted, navigate to the second page. Reset the Row counter to 2.