Extract data in the same format

Hi . I need to extract a text from pdf in the same format, like if the text is in italics or bold with underline. Is there any method to do that?

1 Like

Hey @Amruta_George1

Try converting the PDF to word and extract word as XML which may help.


Hello @Amruta_George1 ,

Did you tried the Read PDF Text activity. Try giving preserve formatting as “True” and do the string manipulation(split or substring).


I want the output in RTF format

This is not working

Hello Amrutha,

May I know the requirement here to fetch the formatted data? Are you trying to copy this to some other document?

I am trying to extract data from a pdf in the exact format like with underlines and same font and output it as RTF format

Please refer to the Package in the link. Even if you copy data from PDF using Read Pdf and write to word, it will keep the alignment and not the underlines and all.

As per my understanding, you can convert pdf to Word and then word to rtf.

No my problem is that I want to extract the data in the same format as in the same font in the original pdf data

@ovi Hello Ovi, Any suggestions here?

hi @Amruta_George1 if there is a will there is a way.