I need to extract the text and it's format using data scraping

I tried data scraping to extract the text and it’s format from webpage. I am not getting the bullet points. My requirement is to scrap the data from one webpage and to paste it in another webpage with same format. There is any other way to do this.

Hello @Rudhresh_AppInno

Is it in a table format? If yes, can you try Table extraction in uipath(Modern design).

Thanks for you reply @Rahul_Unnikrishnan

I can extract the data after the extraction i have written it in the text file. In that I am not getting the butten point e.g:( ., i), 1) ).

@Rudhresh_AppInno

During the extraction are you able to see the bullets also scrapped from web page??

In the Datascrap panel.

keep in mindy that for using also the formats you would grab more as only the text (HTML source will reflect it e.g. for instance). Depending on the website also other resources will be needed e.g. CSS styles …

May we ask you to elaborate more on the requirements and which imprtance the formats do have for the destination where you want to copy? Thanks

@ppr My requirement is to scrap a food product details from one website (whether it may contain Bold Headers, Italic fonts and bullet points) and post it in another website. That’s why we need the formats while extraction.

@Rahul_Unnikrishnan

No, It’s not showing. That’s why i am looking for another method.

you can’t scrap the bullet point , and you can use screen scrapping with different methods, if there number and symbol like 1) 2) like this you can extract but bullet mark can’t

ok thank you @Veera_Raj, There is any other method or packages available to do this ?. I got the bullet points in OCR screen scraping method but it cannot recognize the text correctly. That website is in another country language.

in addition to the post from above. When interested in the formatting also, then check if the HTML source code grab will help. We can do this e.g. with get Attribute and the attribute outerhtml.

The more important is what the destination is accepting when placing the information (e.g. source code mode pasting…). In some scenarios maybe a copy & paste will achieve it. So just have quick RnD at your end on this as well

Ocr still we don’t have that much advanced, may be you can try with by string manipulation or data manipulation after scrapping the data. Or if you good in HTML you can try on that way

@ppr I tried “innerhtml” from get attribute in output i am getting only code not a text.

as mentioned check also for outerhtml

yes, it is the html. Text only will not have the formats which you had requested.

Maybe you can share a screenshot of the destination (e.g. text field / text area) where you will use later for the text. Thanks

@ppr i tried Outerhtml also but no use. I don’t have access to the destination website. We used Uipath only for extraction.

Ok thanks. i will give a try.