Challenge with Table Extraction

Hi Everyone,

I am trying to extract data from a website to test my learning.

The website is this one: Best-Selling Books of All-Time: Top Selling Books by Ranking

The challenge is to get the “25 Best-Selling Books of All-Time” and the “25 Best-Selling Book Series of All-Time”. After that I need to create one folder for each one and save a txt file (named by the book/series) and inside the txt file, put the tiltle, copies sold and the author.

The HTML structure of this website is terrible and I’m facing some problems:

  1. I cannot split the “25 Best-Selling Books of All-Time” and “25 Best-Selling Book Series of All-Time”.
  2. When I select the “copies sold” the Table Extraction select some wierd text as part of the “pattern”.
  3. I cannot select the author using Table Extraction.

I trying to figured out how to solve this, I don’t know if using Table Extraction is the best option for this challenge.

Something is weird about how that page is coded. Table Extraction definitely doesn’t see it correctly.

I tried For Each UI Element and just designated the entire container for each book. Then as it loops over them you can use RegEx, Split, etc to get the individual pieces of data.

You’ll have to use some logic to know that it went from #25 back to #1 and that means you’re now at the second list.

Hi @postwick, I tried to do the challenge with “For Each UI Element”, but I could not get the author name.

It seem to be more easy to split in two and organize the infos with your idea, but unfortunately I still have the problem with the author name.

Look what the activity extract from the website:

This is a good example of a situation where I often find the classic activities - due to their simplicity - work better. So I used Find Children (set to descendants and filter tag=‘p’) to get all the paragraphs (which I figured out by inspecting the page in Chrome), looped through them and for the ones that start with # I used Get Text to get all the text of the paragraph, then used RegEx to split it up and extract rank, title, and author.

image

Main.xaml (22.5 KB)

UiPath.System.Activities 23.4.8
UiPath.UIAutomation.Activities 23.4.8

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.