Data scraping - a whole table (p, div, table) - scraping issue

Hello! I need help in writing the proper xml code in the ExtractMetadata field. I can’t figure it out the combination to get the whole table and here the whole table consists, based on its DOM, from p, div and table as I will show you bellow in a screenshot. The website is private so I can only try whatever you suggest. I will put a txt file with the codes I’ve tried so you can modify quicker there. The content for Details will vary by product so the xml code must be dynamic. Thank you a lot for guiding me and helping me!

xmlClaudiaK.txt (865 Bytes)

Hi,

Use DataScraping option and select only capacity field. It will automatically extract entire table

I used DataScraping in various mode but did not get the whole information.

Can you share me. What information its captured in data scraping??

I have already extracted same type ui. It is working fine.

If I use “extract-table get_columns_name=“1” get_empty_columns=“1” columns_name_source=“Longest” /” I am getting the table under Technical details only.
And if I use “extract
column attr=‘text’ name=‘Column1’ exact=‘1’
webctrl class=‘section-label’ tag=‘p’ /
/column
/extract” then I am getting only what’s set for p tags: Technical details, Power, Weight&dimensions.

@claudia.kiss
just a brief introduction. Data scapping in you case can be done by combining different techniques:

(dynamized) datascraping
(dynamized) retrieval + optional find children
less known but very power full - HTML parsing with XML
…

the datascraping configuration requires a row iterator definition. And this we cannot express in the data extract definition to the scrap all three datatables.

In such a case we try to loop over the tables and do merge the part result into a consolidated.

Currently you have the tables grouped. What is youe expected result:
Key|Value
Capacity|0.45l



Power|600 W
… etc

or do you like to have the group info in the output as well like
Group|Key|Value
Technical Details|Capacity|0.45l
Technical…

Power|Power|600 W
Power| AC…
etc

I would like to have the group info as output (Group/Key/Value).

Do you know something similar (posts) where I can look over and learn how to do that? I read your article and I think I need more to go over, please!

your case is not realy beginner level (but solveable and learnable)

learn all things about selectors in Academy (learner portal), Scraping, Debugging & Error Handling
have a look here:

and explore some complex data scrapings



1 Like

@claudia.kiss

For your scenario, you can use an anchor base activity to get whole data

Create a String array having Technical details, power,weight&dimensions as values.

Use for each to get each and merge the table details

for item in String array
1.Use anchor base activity
2. drag find element activity and indicate power on screen. You will get the selector.
3. Change the selector with the dynamic value which is stored item.
4.capture the table
5.merge the table

end loop

1 Like

Hi @claudia.kiss

https://semantic-ui.com/collections/table.html
I have used the above website to get whole data in different tables.
Main.xaml (11.4 KB)

Please find the xaml.It will help you.

I’ll try it know and let you know.

So, I managed to make it work with your suggestion (I am heading somewhere, yay! ) following the steps mentioned above (the merge part was not needed as at the end the ExtractDataTable will contain all the rows from the previous DTs). I will upload a screenshot with what I’ve done in order to work (in my case), cause I added a counter for p tags and inserted in the selector and also, for the extract activity I modified it to be “webctrl tag=‘TABLE’ /” in the selector field. Thank you so much!
2020-09-02 (8)|690x317

Instead of counter you can use innertext.

I have shared a xaml there you can see dynamic selector in find element.

I don’t know what packages to install to see the activities you used in the Anchor Base so I can’t see in the Properties panel. If you can attach a screenshot with where and how this can be used, good to know this too.

i have created an array with p tag innertext(in your case )


looping the array

Each time it will pick different table because of dynamic selector
image

I tried it too and it worked. Thank you a lot for your guidance!!

Can you Mark it as a solution??

I am very new to this forum.This is the first post i have replied and thank you so much.

I need to mention before doing this, that for what I wanted through this post, to be complete, is that an additional data scraping is needed for p tags and do the necessaries to have that array populated before the for each. You were doing so, so great for your first replied post (lucky me)! Thank you again!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.