Extract specific data from PDF

Hello,

I’m new and I need help. I follow the vid samples in UiPath docs and already extracted other data except for the currency and amount from PDF file. Code is like this:

Read PDF Text output → ExtractedText
ArrayText = ExtractedText.Split(Environment.NewLine.ToArray,StringSplitOptions.RemoveEmptyEntries)

Currency and Amount is in ArrayText(7)
Any ideas is appreciated. Thank you

Payment Vouchers part 1_4.pdf (1.9 KB)
Payment Vouchers part 1_6.pdf (1.8 KB)

Hi @NewLearner ,
here I can easily get the whole text using pdf operation,
What is your desired output? Please share it
I here my understand is you need get only Currency and Amount
that’s right, please confirm it
I’ll help you
regards,

1 Like

Hi @Nguyen_Van_Luong1,
Yeah right I wanted to extract the Amount and Currency, the problem is in the amount section, the # of asterisk varies. The expected output for Currency would be 3 letter(EUR/USD) and in Amount would be the end part but without the asterisk sign bcoz each currency has diff formatting(20.548.500,20/109,801.57) Please refer to the image below and also I upload other pdf file above. Thanks

image

I see, I’ll try to get them
thanks @NewLearner
regards,

1 Like

Hi @NewLearner
=> Use Read PDF Text to read the PDF and store the output
=> Use below expressions in Assign activity:

str_Currency= System.Text.RegularExpressions.Regex.Match(str_Text,"(?<=\d*\/\d*\/\d*\s*)[A-Z]{2,}").Value
str_Amount= System.Text.RegularExpressions.Regex.Match(str_Text,"(?<=\*)\d+(\,?|\.?)\d+\.?\d+\,?\d+").Value

Refer the below image and workflow for better understanding:


Sequence5.xaml (8.1 KB)

You can remove the Write Text File as it is not really necessary.

Hope it helps!!

1 Like

@NewLearner

You can use following regex

Currency - "[A-Z a-z]+(?=\*+)"

Date - "(?<=\*+)[0-9,.]+"

Usage - System.Text.RegularExpressions.Regex.Match(str,RegexProvidedAbove).Value

Here str is the input string

Cheers

1 Like

Hi @NewLearner

Use the below regex expression

Currency = System.Text.RegularExpressions.Regex.Match(InputVariable.ToString,"(?<=\d+\/\d+\/\d+\s+)([A-Z]{3})")
Amount = System.Text.RegularExpressions.Regex.Match(InputVariable.ToString,"(\d+\,?\d*\.\d+.*\d+)")




Happy Automation !!

1 Like

Thanks everyone who give time in these. Now all ok. Problem solved! Thank you! :smiling_face_with_three_hearts:

1 Like

@NewLearner

If you find solution for the query please mark it as solution to close the loop.

Regards,

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.