I have a PDF that opens a form with a large table. I need to extract one number from the PDF. The PDF is not encrypted but when I do a test “read PFD text” activity and print to a message box, the strings come out looking like encrypted chars with shapes.
Am I doing something wrong, or is this just a result of the formatting or font on the PDF? It shouldnt matter if I am reading PDF only as text right? It still should yield all strings in order, regardless of formatting right?
To isolate cause, can you try to copy text from the PDF file using some PDF viewer such as adobe reader, chrome browser etc., then paste it into notepad? If it’s not correct text, the pdf may be applied something font based copy protect.
it’s likely due to non-standard fonts or scanned content. In such cases, using the “Read PDF with OCR” activity in UiPath is more effective than “Read PDF Text.” Try different OCR engines like Tesseract or Microsoft OCR to improve results.
If the PDF has a complex layout or table, consider using Document Understanding and the “Digitize Document” activity, followed by regex or form extractors to pull the specific number you need.
If you found helpful please mark as a solution. Thanks
Happy Automation with UiPath