Extract specific fields from PDF file

Dear All

My PDF is consisted of several pages of forms (all in same formats). Each page contains “Duty Amount”.

e.g.
(Page 1)
Registration No : AAB112233
Customer Name: ZZZKKY
Duty Amount: 10 Dollars
Sales Amount: 15 Dollars
Ref No: 098765

(Page 2)
Registration No : AAB115533
Customer Name: ZOOYYY
Duty Amount: 28 Dollars
Sales Amount: 91 Dollars
Ref No: 123456

(Page n)
Registration No : PPP113333
Customer Name: RRZYYY
Duty Amount: 65 Dollars
Sales Amount: 129 Dollars
Ref No: 123456

Would like to extract only 10, 28, 65. In this case, how should this be done? Regular Expressions, Split, combination of both or completely different way?

Thanks in advance!

Hi @u2018dem0528

You can try with Regex

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=Duty\sAmount:\s)\d*").Tostring

Regards
Gokul

HI @u2018dem0528

In the For each activity use this expression

Enumerable.Range(0,System.Text.RegularExpressions.Regex.Matches(YourString,"(?<=Duty\sAmount:\s)\d*").Count)

Try with this expression inside the For each activity

System.Text.RegularExpressions.Regex.Matches(YourString,"(?<=Duty\sAmount:\s)\d*")(CInt(currentItem))

Check out this XAML file

MatchesRegex1.xaml (10.9 KB)

image

Regards
Gokul

Hi @u2018dem0528

The simplest way would be to use a inbuilt activity Text to left/right. Just give your text that is to left and right (Duty Amount: and Dollars). This will give you the amounts. Or use expression as below

System.Text.RegularExpressions.Regex.Match(str,“(?<=Duty Amount: )\d*”).Tostring

Or

str.Split({“Duty Amount:”},2,StringSPlitOptions.TrimEntries)(1).Split({“Dollar”},2,StringSPlitOptions.TrimEntries)(0)

Use any of this in for loop and you will have your solution

cheers

Terrific! It worked! However, when I applied the same logic to the different but similar document containing SST Amount (MYR) : (many spaces) XX.XX, nothing was picked up.

image

Is this because of brackets? or many spaces before amounts?
I attempted to use the below, but nothing was returned.

(?<=Duty\sAmount\s(MYR)\s:\s*)\d*

Hi @u2018dem0528

You can try with this expression

System.Text.RegularExpressions.Regex.Match("SST Amount (MYR) :     73.69","(?<=SST\s*Amount\s*\WMYR\W\s:\s*)\d.*").Tostring

image

Thank you so much, it worked well!

1 Like

Thank you so much for suggesting 2 different ways, I will study both!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.