Read Text from Specific Region

CC_Pet · November 14, 2022, 4:44am

Hi,

Currently facing an issue where I need to extract a value from a native PDF file. I used read text from PDF and wanted to manipulate the string via regex but due to some other values next to the one I need, the overall string becomes ambiguous. Sample below:

from the numbers 1 2 7 5 2 1 → 1 2 7 is actually referring to page count which is 1 / 27 (the “/” doesn’t get picked up) while 5 2 1 is the value I need and this value can differ. Is there a way to specify the specific region of the PDF to extract values as the position for this is fixed?

Gokul001 · November 14, 2022, 4:54am

HI @CC_Pet

In the PDF do you have the / ?

Have you Tried with Read PDF with OCR activity?

In the Read PDF Text Activity Just set properties as True for Preserve Formatting

Regards
Gokul

Tapan_Behera1 · November 14, 2022, 4:55am

Hi @CC_Pet use read pdf with OCR activity with microsoft ocr engine.

CC_Pet · November 14, 2022, 5:02am

Have tried using OCR but the values aren’t being picked up by the engine. Currently testing on the read pdf text again and will need to use it twice as using PreserveFormatting will interfere with a different value needed to be extracted. So will use one where PreserveFormatting is False to extract value A and with PreserveFormatting and some string manipulation to extract value B.

Anil_G · November 14, 2022, 5:05am

Hi @CC_Pet

Try checking the property of preserve formatting in read pdf that might help in keeping your value and the page number separately

Cheers

Gokul001 · November 14, 2022, 5:08am

In the PDF do you have the / ? @CC_Pet

CC_Pet · November 14, 2022, 5:14am

Sorry needed to edit my previous reply. I have made the switch to OCR using Microsoft engine and using the activity twice with different profile settings (one for each value needed to be extracted). When using preservedformat for read pdf text, it is still unable to pick up “/” and the spacing between the numbers make it difficult to do regex/string manipulation.

Gokul001 · November 14, 2022, 5:18am

You can try with all the OCR from the below image @CC_Pet , Can you check whether you get the desired output while using this OCR’s?

Regards
Gokul

Topic		Replies	Views
I want to extract the data from scaned pdf Forum question	10	872	June 9, 2022
Get Data from Document Document Understanding question	6	68	January 6, 2026
Pdf data extraction to excel file Help excel , pdf , activities	3	916	January 23, 2020
Getting the desired part from the PDF file Activities pdf , activities , question	5	463	July 11, 2023
How to parse PDF text read from "Read PDF Activity" Help	1	5225	August 30, 2017

Read Text from Specific Region

Related topics