Read PDF Text Activity should also return structured text

When we use Read PDF Text or Read PDF Text With OCR it always returns the text into string format however sometimes I feel it should also provide the structured format like if I look at the extracted text and pdf file, it should look same. I am not sure if you have seen or not but this feature is available in AA. I’m not comparing this tool right now however sometime I feel if UiPath includes that feature, it will become more awesome.

#HappyRobotics

hello there,

Read PDF Text now has a “PreserveFormatting” flag that tries to keep the layout of the machine readable text from within a PDF. Does this help?

1 Like

It would be a great help I believe. However I check it out and get back to you.

1 Like

Hi,
Thank you for your suggestion. I added it to our internal ideas tracker for our team to consider.

2 Likes

Hello @vikaskulhari,

Did you manage to check out the PreserveFormatting flag? Does it work for your use case?

Ioana

2 Likes

Though read pdf activity works in the background without requiring the pdf to be opened which is great but it didn’t preserve the structure.

I will check this out too since I have a similar problem of the structure being lost when I used read pdf activity. I have used get full text or native text activities as an alternative as these preserve the structure. But these activities require us to open the pdf.

1 Like

Do try the PreserveFormatting flag - it should help.

I have enabled the Preserve formatting Flag and unfortunately it does not work for my pdf which is semistructured.

Any chance you could share a sample file?

try out with different OCR engine,
my suggestion is to use Abbyy Cloud OCR it’ll give you the desired output!

Apologies it’s a spii document, so i cannot share it.

Hi @Pradeep_Shiv,

This is about pdf docs that are not scanned but native pdf documents so i have used read pdf text activity and not the one with ocr. Ocr is not needed ryt.

it your documents are not scanned ocr is not required