Issues with Read pdf text in UI path

Christodoulos · February 8, 2024, 10:55am

Greetings Community,

I am having trouble when reading some pdf documents in my process using the Get pdf text from pdf activities dependency as well as the document understanding activity.

The issue is that some pdf seems be just fine and other either read as random characters or nothing at all.

From what i gather it seems that the pdf with the PDF producer “Aspose” , as well as this one that i exported from excel work normally.

The ones i have issues with are have as producer “Microsoft print to PDF”, “PDF24” or Nitro

I am having trouble to understand what causes this issue and if its related at all with the producer or how the pdf was converted, or anything else unrelated.

Does Ui path only accept certain pdf versions (maybe something to do with .net)?

Note: I don’t want to use OCR, as i have tried it and it messed up my text.

Any input is welcome!

Thanks alot!

Janszen2 · February 8, 2024, 4:00pm

The Extract PDF Text activities works well with structured PDF’s. For example a scanned document can also be a PDF but if it’s a handwritten document or a picture of something the text recognition won’t work as well, so that’s probably why print to PDF won’t work well.

Christodoulos · February 8, 2024, 9:15pm

It’s a normal Digital invoice, it is not a scanned document, I imagine the creator of it probably used this method in Nitro or any other software to do it.

Janszen2 · February 9, 2024, 8:59am

I can’t find any documentation about it, but it would be an interesting experiment to test if different export methods within the same software can in impact how well UiPath can read it with the Read PDF Files activity.

Christodoulos · February 15, 2024, 9:18am

I haven’t found exactly the reason behind it but it seems that the PDFs that created an issue were saved from SharePoint using Print option and anything that was saved using download is okay.

I guess when you print different convertors come to play and may break the source code of the PDF making it unreadable to automations that rely on that.

system · February 18, 2024, 9:19am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Read PDF Text does not recognize special characters Studio studio , question , settings	2	951	January 17, 2023
Problem reading PDF Help pdf , activities , question	3	782	December 30, 2020
Determine if pdf is readable or not Activities pdf , activities , question	2	1049	November 28, 2022
How am I supposed to read a PDF if I'm in Windows compatibility? Activities pdf , activities , feedback	11	891	August 8, 2023
Read PDF Text Error Studio studio , question , output_panel	8	865	February 16, 2021

Issues with Read pdf text in UI path

Related topics