Data Scraping and help with recognizing elements


#1

Hello,

I am attempting to extract data (I believe it is patterned) from this site https://soflbi.com/about/

However, the wizard is only recognizing the first 4 items. Is there any way of me being able to manipulate the selector properties for it to recognize the other line items?

Does anyone have any better alternatives in order to accomplish this:?

Thanks

Robert


#2

You can certainly manipulate the selector for other items, but what is happening after 4th item? are you running into errros?


#3

Hi vvaidya!

Seems like you’re always the first to respond. Much appreciated.

I do not run into any errors, it just isn’t extracting all patterned data.

If you navigate to this URL it will extract the first 4 line items under the ‘Officers’ section.

However, there are 6 total line items and it doesn’t navigate further down to the ‘Directors’ section.

Thus, I assume it is not recgonizing said pattern? Anyway around this?

Thanks again,

Robert


#4

Can you attach your xaml?


#5

Realistically this is more of a ‘conceptual’ inquiry.

Most examples of web data extraction are web pages that have vertical line items. This is one of a horizontal line item, thus I am interested in seeing how it should be manipulated.

I don’t have any .xaml for this example (though I can quickly do it).


#6

Hi AutomateWork,

I have had a go at scraping the data on this page and without trying really hard was having trouble using the scrape data table. However, what you can do is extract the full text from the website as per the example attached and use string manipulation to get the information out that you require.

I’m sure with a bit more effort you could actually get the required text from the specific tables based on the selector information.

Richard

Main.xaml (6.3 KB)
project.json (240 Bytes)


#7

The main problem here is that they are in DIV tags rather than tables so it’s harder to identify any specific properties or identifiers.


#8

I agree with @richarddenton. The reason for getting 4 items is the tags were not properly formed. Please run the attached code.

Main.xaml (7.7 KB)