Get Data from Document

Hi Everyone,

I want to get the amount in pdf file(1927.00 and 2100.00). The positioning is the same. but when i split the extracted text. The value I’m getting is not accurate. Please refer to the attached screenshot.

Thank you


Hi @rpaforum

Here you can try to use Read PDF activity as well Regex expression.

Hope this works !

Hi @rpaforum

You can use regex instead of split

\b\d{1,3}(?:,\d{3})*.\d{2}\b

Cheers

Hi @rpaforum

Use Regex to extract the amount directly, for example:

\d{1,3}(,\d{3})*.\d{2}

This will correctly return 1,927.00 and 2,100.00 regardless of formatting.
If needed, switch to Read PDF with OCR for more consistent text extraction.

@rpaforum

How did you try to extract?

when reading pdf you can use preserve formatting or uncheck it both gives two different output text layouts can use whichever suits you better

it would be helpful if you can provide the extracted text as is to check

cheers

@rpaforum

create a sample workflow,

  1. Use read pdf with ocr or text based on your pdf type.
  2. get the output of above activity, and just cross check whether you are getting same format or not for other pdf’s too.
  3. now use https://regex101.com/ website and prepare a regex pattern.
  4. once you are able to get what you are looking for then use those patterns in your workflow.

hello @rpaforum ,

if you have document data and location or not fixed if fixed you can use
Extract Document data activity and select generative predefined.


6xy.png)

and also in perticular filed you can put single line ptompt for 1927.00 and 2100.00
try this method you can get your result.

Regards
Dheerandra Vishwakarma