How to get a particular text in a PDF file using "Read PDF With OCR" Activity?

RobertRussell_Monsalud · June 19, 2023, 6:39am

Good Day.

May I know how to get a particular text in a PDF file using Read PDF With OCR activity? For example, I just want to extract the text “The Foundation of Innovation” from the attached PDF file. I do not need to extract the entire pdf file.

Kindly see attached files for the PDF file and the .xaml file. Thank you.

Capture.pdf (74.9 KB)
Main.xaml (9.7 KB)

Best regards,
Robert

adiijaiin · June 19, 2023, 6:44am

Hi @RobertRussell_Monsalud

The activity Read PDF with OCR would be extracting all the text from PDF.
Later you would need to perform certain String operations to extract the data.
In your example I can say that TM is the breaker.
Lets say all the data is stored after reading from OCR is in PDFData.

you can use the expression:

stringData= PDFData.Split({“TM”},StringSplitOptions.None)(0).Trim

This will give you the “The Foundation of Innovation”

Thanks,

Happy Automation!

RobertRussell_Monsalud · June 19, 2023, 7:40am

Hi @adiijaiin

Thank you for the solution. May I also know what would be the possible solution for this one:

Best regards,
Robert

adiijaiin · June 19, 2023, 7:49am

Hi @RobertRussell_Monsalud

Can you try re-writing the double quotes, sometimes while copying it doesn’t works as expected in the expression.

RobertRussell_Monsalud · June 19, 2023, 8:51am

Hi @adiijaiin

Apologies for the late reply. It already worked. However, the output turned out like this:

May I know what is the possible cause the said output?

Best regards,
Robert

lrtetala · June 19, 2023, 9:00am

Hi @RobertRussell_Monsalud

Try with other OCR

Hope it helps!!

adiijaiin · June 19, 2023, 9:22am

Hi @RobertRussell_Monsalud

please try with different OCR engines available in the studio.
For better results use the scaling property. and find the perfect value for it by experimenting.

Thanks

RobertRussell_Monsalud · June 19, 2023, 9:50am

Hi @lrtetala and @adiijaiin

The UiPath Screen OCR works best among the OCR engines. Kindly see the generated output below.

Thank you so much for your help.

Best regards,
Robert

system · June 22, 2023, 9:50am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to read the particular text from pdf Help excel , activities , studio	2	1427	October 8, 2018
Pdf Extract from OCR Text Task Capture	4	1667	August 15, 2020
Extracting text within an image (PDF) Help pdf , ocr , activities	16	12337	October 29, 2018
I want to extract specific data in Scanned pdf file Activities ocr , activities , question	6	247	April 27, 2024
Get text from a scanned PDF Studio	5	712	July 4, 2023

How to get a particular text in a PDF file using "Read PDF With OCR" Activity?

Related topics