I am having trouble when reading some pdf documents in my process using the Get pdf text from pdf activities dependency as well as the document understanding activity.
The issue is that some pdf seems be just fine and other either read as random characters or nothing at all.
From what i gather it seems that the pdf with the PDF producer “Aspose” , as well as this one that i exported from excel work normally.
The ones i have issues with are have as producer “Microsoft print to PDF”, “PDF24” or Nitro
I am having trouble to understand what causes this issue and if its related at all with the producer or how the pdf was converted, or anything else unrelated.
Does Ui path only accept certain pdf versions (maybe something to do with .net)?
Note: I don’t want to use OCR, as i have tried it and it messed up my text.
The Extract PDF Text activities works well with structured PDF’s. For example a scanned document can also be a PDF but if it’s a handwritten document or a picture of something the text recognition won’t work as well, so that’s probably why print to PDF won’t work well.
It’s a normal Digital invoice, it is not a scanned document, I imagine the creator of it probably used this method in Nitro or any other software to do it.
I can’t find any documentation about it, but it would be an interesting experiment to test if different export methods within the same software can in impact how well UiPath can read it with the Read PDF Files activity.
I haven’t found exactly the reason behind it but it seems that the PDFs that created an issue were saved from SharePoint using Print option and anything that was saved using download is okay.
I guess when you print different convertors come to play and may break the source code of the PDF making it unreadable to automations that rely on that.