The challenge is to get the “25 Best-Selling Books of All-Time” and the “25 Best-Selling Book Series of All-Time”. After that I need to create one folder for each one and save a txt file (named by the book/series) and inside the txt file, put the tiltle, copies sold and the author.
The HTML structure of this website is terrible and I’m facing some problems:
I cannot split the “25 Best-Selling Books of All-Time” and “25 Best-Selling Book Series of All-Time”.
When I select the “copies sold” the Table Extraction select some wierd text as part of the “pattern”.
I cannot select the author using Table Extraction.
I trying to figured out how to solve this, I don’t know if using Table Extraction is the best option for this challenge.
Something is weird about how that page is coded. Table Extraction definitely doesn’t see it correctly.
I tried For Each UI Element and just designated the entire container for each book. Then as it loops over them you can use RegEx, Split, etc to get the individual pieces of data.
You’ll have to use some logic to know that it went from #25 back to #1 and that means you’re now at the second list.
This is a good example of a situation where I often find the classic activities - due to their simplicity - work better. So I used Find Children (set to descendants and filter tag=‘p’) to get all the paragraphs (which I figured out by inspecting the page in Chrome), looped through them and for the ones that start with # I used Get Text to get all the text of the paragraph, then used RegEx to split it up and extract rank, title, and author.