Challenge with Table Extraction

vitor.camargo · July 23, 2024, 4:55pm

Hi Everyone,

I am trying to extract data from a website to test my learning.

The website is this one: Best-Selling Books of All-Time: Top Selling Books by Ranking

The challenge is to get the “25 Best-Selling Books of All-Time” and the “25 Best-Selling Book Series of All-Time”. After that I need to create one folder for each one and save a txt file (named by the book/series) and inside the txt file, put the tiltle, copies sold and the author.

The HTML structure of this website is terrible and I’m facing some problems:

I cannot split the “25 Best-Selling Books of All-Time” and “25 Best-Selling Book Series of All-Time”.
When I select the “copies sold” the Table Extraction select some wierd text as part of the “pattern”.
I cannot select the author using Table Extraction.

I trying to figured out how to solve this, I don’t know if using Table Extraction is the best option for this challenge.

postwick · July 23, 2024, 7:24pm

Something is weird about how that page is coded. Table Extraction definitely doesn’t see it correctly.

I tried For Each UI Element and just designated the entire container for each book. Then as it loops over them you can use RegEx, Split, etc to get the individual pieces of data.

You’ll have to use some logic to know that it went from #25 back to #1 and that means you’re now at the second list.

vitor.camargo · July 23, 2024, 8:11pm

Hi @postwick, I tried to do the challenge with “For Each UI Element”, but I could not get the author name.

It seem to be more easy to split in two and organize the infos with your idea, but unfortunately I still have the problem with the author name.

Look what the activity extract from the website:

postwick · July 23, 2024, 9:26pm

This is a good example of a situation where I often find the classic activities - due to their simplicity - work better. So I used Find Children (set to descendants and filter tag=‘p’) to get all the paragraphs (which I figured out by inspecting the page in Chrome), looped through them and for the ones that start with # I used Get Text to get all the text of the paragraph, then used RegEx to split it up and extract rank, title, and author.

Main.xaml (22.5 KB)

UiPath.System.Activities 23.4.8
UiPath.UIAutomation.Activities 23.4.8

system · July 26, 2024, 9:26pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
For Each UI Element does not work with this website StudioX studiox , question	17	1558	November 17, 2022
Extract data table, not able to extract all pages data! Studio studio , question , activities_panel	4	178	March 25, 2024
Extract only required values from website - Table extraction not working Studio uiautomation , data_scraping	7	66	April 2, 2025
Data Scraping from multiple table in one page html Help	8	3294	April 9, 2019
While Scraping the data from web page two more headings are comming Activities uiautomation , orchestrator , activities , studio	5	150	April 22, 2024

Challenge with Table Extraction

Related topics