I got a task to data scrape several thousands of web tables. Some of them have pictures in the columns that are not seen by the data scraping wizard.
When I searched for similar issues I figured out how to get the images from certain tabs if I knew in advance which columns have images. The problem is that all the tables look different, some have one column of images, some several. I don’t know in advance how many or in what order they come.
Is there a way to get the attributes of each column or something? The pictures in question are either an image of a checkmark or no image at all.
retrieving the image src can be done as described here
as the datatables are different / varying so one option could be to generate the extractdata XML config dynamicly for each table in advance and grab then the src.
Another option could be to restrict a find children to the datatable and filter for all img elements. Form the find children result items the src attribute can be retrieved in a post processing step
Thanks I will check it out. I will work more on this on Friday and see if I can figure it out. I don’t understand exactly what you mean in your suggestion but maybe the tutorial will clarify it for me.
I have checked out the tutorial now and it showed me what I already know about handling this when you know which column contains the images.
I don’t quite understand how to carry out your suggestions.
I’m trying to use Find Children on the table but I’m having trouble figuring out what the filter should be or how to use the resulting output. I will keep experimenting while I wait for a response.
I manage to solve it in a good enough way that I could handle the exceptions manually. And it was by following Peter’s second solution. It’s a pretty slow solution for these big tables but I’m sure it can be done in a better way than I did.