How to get a particular text in a PDF file using "Read PDF With OCR" Activity?

Good Day.

May I know how to get a particular text in a PDF file using Read PDF With OCR activity? For example, I just want to extract the text “The Foundation of Innovation” from the attached PDF file. I do not need to extract the entire pdf file.

Kindly see attached files for the PDF file and the .xaml file. Thank you.

Capture.pdf (74.9 KB)
Main.xaml (9.7 KB)

Best regards,
Robert

Hi @RobertRussell_Monsalud

The activity Read PDF with OCR would be extracting all the text from PDF.
Later you would need to perform certain String operations to extract the data.
In your example I can say that TM is the breaker.
Lets say all the data is stored after reading from OCR is in PDFData.

you can use the expression:

stringData= PDFData.Split({“TM”},StringSplitOptions.None)(0).Trim

This will give you the “The Foundation of Innovation

Thanks,

Happy Automation! :smiley:

Hi @adiijaiin

Thank you for the solution. May I also know what would be the possible solution for this one:

Best regards,
Robert

Hi @RobertRussell_Monsalud

Can you try re-writing the double quotes, sometimes while copying it doesn’t works as expected in the expression.

1 Like

Hi @adiijaiin

Apologies for the late reply. It already worked. However, the output turned out like this:

May I know what is the possible cause the said output?

Best regards,
Robert

Hi @RobertRussell_Monsalud

Try with other OCR
image

Hope it helps!!

1 Like

Hi @RobertRussell_Monsalud

please try with different OCR engines available in the studio.
For better results use the scaling property. and find the perfect value for it by experimenting.

Thanks

1 Like

Hi @lrtetala and @adiijaiin

The UiPath Screen OCR works best among the OCR engines. Kindly see the generated output below.

Thank you so much for your help.

Best regards,
Robert

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.