Scrapping unstructured data from a single web page

Hello All,

I am trying to get all the messages from this website into a single column in an excel. However I am facing challenges with scrapping the data as it seems to be unstructured. I also tried getting text and convert it into a datatable to be written in a excel and it was a no - go.

Any suggestions here?

1 Like

Hi @VirajN,

Try using generate data table activity to extract the unstructured data.

let us know if this helps.

Regards,
Pavan H

I just tried this, it only gets the 1st 11 entries. Rest are ignored.

Hi,

Is the website link you have given in question is the one where you need to fetch data?

Regards,
Pavan H

yes @pavanh003

You should use the Find Children activity and filter on items which match selectors related to a ‘message’. Then loop on each child and grab the aaname attribute. You can append each to a file.

This outputs to a text file but should give you the basics of what you need.

Main.xaml (9.6 KB)

2 Likes

This indeed works. Thanks @ronanpeter for the solution. Don’t mind be asking but how did you arrive at grabbing the aaname as the solution?

Every element on the webpage will have a selector. Those selectors will all have an aaname attribute and in this case that is where the ‘message’ is.

I first exported the full selectors for each message (child.selector.tostring) to a text file and it was evident aaname was the attribute that had the info you needed.

If ‘Extract Structured Data’ activity is not to your needs in future, always consider using Find Children as an alternative, it can often be more appropriate an option.

2 Likes

thank you :slight_smile:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.